Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 00/17] Overlayfs NFS export support
@ 2018-01-04 17:20 Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 01/17] ovl: document NFS export Amir Goldstein
                   ` (16 more replies)
  0 siblings, 17 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Miklos,

This is the 2nd revision of series that implements NFS export support.
This series is based on a series of prep patches [1] that implement
the "verify" feature and were posted earlier to overlayfs list.
The complete work is available here [2].

NFS export is enabled for overlayfs mount when the opt-in "verify"
feature is enabled, because decoding file handles requires the full
index that "verify" feature creates.

The first patch in the series in the Documentation update with a brief
overview of the implementation, so reviewers can learn what to expect
further along the series.

The 1st revision copied up and indexed all directories on file handle
encode. This revision only copies up directories on encode with a
multiple lower layers setup. This is done to reduce the complexity of
decoding directory file handles and ensure that an encoded dir inode
is either upper or the top lower dir.

The current implementation does not support encoding connectable
non-dir file handles. When overlayfs is exported with the 'subtree_check'
exportfs option, NFS client will get lookup errors and NFS server logs will
have warnings like these:
  "overlayfs: connectable file handles not supported; \
              use 'no_subtree_check' exportfs option"

To unit test overlayfs file handles, I enhanced xfstest open_by_handle
test utility to encode/decode directories and check several other cases
that were not covered by the original xfstest test. On my xfstests NFS
export branch [3], there are two generic tests and two overlayfs specific
test on the 'exportfs' group - one for samefs setup and one for non-samefs
with two lower layers setup.

To sanity test NFS exported overlayfs, I made a hacky test patch to
unionmount-testsuite [4], which runs all the tests on an NFS mount to
localhost, while setting up and rotating the overlay mount on the exported
local share.

I also ran the NFSTest [5] nfstest_posix group on an exported overlayfs
mount, but that test only creates pure upper files in overlay upper dir,
so it is not much of a stress to the implementation. This test found one
problem with non uptodate overlay inode mtime. The last patch is the series
solves this problem in nfsd. It is reviewed by Jeff.

Amir.

[1] https://github.com/amir73il/linux/commits/ovl-index-all
[2] https://github.com/amir73il/linux/commits/ovl-nfs-export-v2
[3] https://github.com/amir73il/xfstests/commits/ovl-nfs-export
[4] https://github.com/amir73il/unionmount-testsuite/commits/ovl-nfs-export
[5] http://wiki.linux-nfs.org/wiki/index.php/NFStest

Amir Goldstein (17):
  ovl: document NFS export
  ovl: encode pure upper file handles
  ovl: decode pure upper file handles
  ovl: decode connected upper dir file handles
  ovl: encode non-indexed upper file handles
  ovl: copy up before encoding dir file handle when ofs->numlower > 1
  ovl: encode lower file handles
  ovl: decode lower non-dir file handles
  ovl: decode indexed non-dir file handles
  ovl: decode lower file handles of unlinked but open files
  ovl: decode indexed dir file handles
  ovl: decode pure lower dir file handles
  ovl: hash directory inodes for NFS export
  ovl: lookup connected ancestor of dir in inode cache
  ovl: lookup indexed ancestor of lower dir
  ovl: wire up NFS export support
  nfsd: encode stat->mtime for getattr instead of inode->i_mtime

 Documentation/filesystems/overlayfs.txt |  59 +++
 fs/locks.c                              |   6 +-
 fs/nfsd/nfsxdr.c                        |   1 +
 fs/overlayfs/Kconfig                    |   2 +
 fs/overlayfs/Makefile                   |   3 +-
 fs/overlayfs/export.c                   | 636 ++++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c                    |  29 +-
 fs/overlayfs/namei.c                    |  34 +-
 fs/overlayfs/overlayfs.h                |  16 +
 fs/overlayfs/super.c                    |  11 +
 fs/overlayfs/util.c                     |   4 +-
 11 files changed, 778 insertions(+), 23 deletions(-)
 create mode 100644 fs/overlayfs/export.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 01/17] ovl: document NFS export
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-11 16:06   ` Miklos Szeredi
  2018-01-04 17:20 ` [PATCH v2 02/17] ovl: encode pure upper file handles Amir Goldstein
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 Documentation/filesystems/overlayfs.txt | 59 +++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
index 00e0595f3d7e..9e21c14c914c 100644
--- a/Documentation/filesystems/overlayfs.txt
+++ b/Documentation/filesystems/overlayfs.txt
@@ -315,6 +315,65 @@ origin file handle that was stored at copy_up time.  If a found lower
 directory does not match the stored origin, that directory will not be
 merged with the upper directory.
 
+
+NFS export
+----------
+
+When the underlying filesystems supports NFS export and the "verify"
+feature is enabled, an overlay filesystem may be exported to NFS.
+
+With the "verify" feature, on copy_up of any lower object, an index
+entry is created under the index directory.  The index entry name is the
+hexadecimal representation of the copy up origin file handle.  For a
+non-directory object, the index entry is a hard link to the upper inode.
+For a directory object, the index entry has an extended attribute
+"trusted.overlay.origin" with an encoded file handle of the upper
+directory inode.
+
+When encoding a file handle from an overlay filesystem object, the
+following rules apply:
+
+1. For a non-upper object, encode a lower file handle from lower inode
+2. For an indexed object, encode a lower file handle from copy_up origin
+3. For a pure-upper object and for an existing non-indexed upper object,
+   encode an upper file handle from upper inode
+
+Encoding of a non-upper directory object is not supported when overlay
+filesystem has multiple lower layers.  In this case, the directory will
+be copied up first, and then encoded as an upper file handle.
+
+The encoded overlay file handle includes:
+ - Header including path type information (e.g. lower/upper)
+ - UUID of the underlying filesystem
+ - Underlying filesystem encoding of underlying inode
+
+This encoding is identical to the encoding of copy_up origin stored in
+"trusted.overlay.origin".
+
+When decoding an overlay file handle, the following steps are followed:
+
+1. Find underlying layer by UUID and path type information.
+2. Decode the underlying filesystem file handle to underlying dentry.
+3. For a lower file handle, lookup the handle in index directory by name.
+4. If a whiteout is found in index, return ESTALE. This represents an
+   overlay object that was deleted after its file handle was encoded.
+5. For a non-directory, instantiate a disconnected overlay dentry from the
+   decoded underlying dentry, the path type and index inode, if found.
+6. For a directory, use the connected underlying decoded dentry, path type
+   and index, to lookup a connected overlay dentry.
+
+The "verify" feature ensures, that a decoded overlay directory object will
+be equivalent to the object that was used to encode the file handle.
+
+Decoding a non-directory file handle may return a disconnected dentry.
+copy_up of that disconnected dentry will create an upper index entry with
+no upper alias.
+
+The overlay filesystem does not support non-directory connectable file
+handles, so exporting with the 'subtree_check' exportfs configuration will
+cause failures to lookup files over NFS.
+
+
 Testsuite
 ---------
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 02/17] ovl: encode pure upper file handles
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 01/17] ovl: document NFS export Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-18 10:31   ` Miklos Szeredi
  2018-01-04 17:20 ` [PATCH v2 03/17] ovl: decode " Amir Goldstein
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Encode overlay file handles as struct ovl_fh containing the file handle
encoding of the real upper inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/Makefile    |  3 +-
 fs/overlayfs/export.c    | 98 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/overlayfs.h |  6 +++
 3 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 fs/overlayfs/export.c

diff --git a/fs/overlayfs/Makefile b/fs/overlayfs/Makefile
index 99373bbc1478..30802347a020 100644
--- a/fs/overlayfs/Makefile
+++ b/fs/overlayfs/Makefile
@@ -4,4 +4,5 @@
 
 obj-$(CONFIG_OVERLAY_FS) += overlay.o
 
-overlay-objs := super.o namei.o util.o inode.o dir.o readdir.o copy_up.o
+overlay-objs := super.o namei.o util.o inode.o dir.o readdir.o copy_up.o \
+		export.o
diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
new file mode 100644
index 000000000000..58c4f5e8a67e
--- /dev/null
+++ b/fs/overlayfs/export.c
@@ -0,0 +1,98 @@
+/*
+ * Overlayfs NFS export support.
+ *
+ * Amir Goldstein <amir73il@gmail.com>
+ *
+ * Copyright (C) 2017 CTERA Networks. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/cred.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/xattr.h>
+#include <linux/exportfs.h>
+#include <linux/ratelimit.h>
+#include "overlayfs.h"
+
+int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
+{
+	struct dentry *upper = ovl_dentry_upper(dentry);
+	struct dentry *origin = ovl_dentry_lower(dentry);
+	struct ovl_fh *fh = NULL;
+	int err;
+
+	/*
+	 * Overlay root dir inode is encoded as an upper file handle upper,
+	 * because root dir dentry is born upper and not indexed.
+	 */
+	if (dentry == dentry->d_sb->s_root)
+		origin = NULL;
+
+	err = -EACCES;
+	if (!upper || origin)
+		goto fail;
+
+	/* TODO: encode non pure-upper by origin */
+	fh = ovl_encode_fh(upper, true);
+
+	err = -EOVERFLOW;
+	if (fh->len > buflen)
+		goto fail;
+
+	memcpy(buf, (char *)fh, fh->len);
+	err = fh->len;
+
+out:
+	kfree(fh);
+	return err;
+
+fail:
+	pr_warn_ratelimited("overlayfs: failed to encode file handle (%pd2, err=%i, buflen=%d, len=%d, type=%d)\n",
+			    dentry, err, buflen, fh ? (int)fh->len : 0,
+			    fh ? fh->type : 0);
+	goto out;
+}
+
+static int ovl_dentry_to_fh(struct dentry *dentry, u32 *fid, int *max_len)
+{
+	int res, len = *max_len << 2;
+
+	res = ovl_d_to_fh(dentry, (char *)fid, len);
+	if (res <= 0)
+		return FILEID_INVALID;
+
+	len = res;
+
+	/* Round up to dwords */
+	*max_len = (len + 3) >> 2;
+	return OVL_FILEID;
+}
+
+static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
+			       struct inode *parent)
+{
+	struct dentry *dentry;
+	int type;
+
+	/* TODO: encode connectable file handles */
+	if (parent)
+		return FILEID_INVALID;
+
+	dentry = d_find_any_alias(inode);
+	if (WARN_ON(!dentry))
+		return FILEID_INVALID;
+
+	type = ovl_dentry_to_fh(dentry, fid, max_len);
+
+	dput(dentry);
+	return type;
+}
+
+const struct export_operations ovl_export_operations = {
+	.encode_fh      = ovl_encode_inode_fh,
+};
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index f4b064023826..f6fd999cb98e 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -65,6 +65,9 @@ enum ovl_flag {
 #error Endianness not defined
 #endif
 
+/* The type returned by overlay exportfs ops when encoding an ovl_fh handle */
+#define OVL_FILEID	0xfb
+
 /* On-disk and in-memeory format for redirect by file handle */
 struct ovl_fh {
 	u8 version;	/* 0 */
@@ -333,3 +336,6 @@ int ovl_set_attr(struct dentry *upper, struct kstat *stat);
 struct ovl_fh *ovl_encode_fh(struct dentry *origin, bool is_upper);
 int ovl_set_origin(struct dentry *dentry, struct dentry *origin,
 		   struct dentry *upper, bool is_upper);
+
+/* export.c */
+extern const struct export_operations ovl_export_operations;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 01/17] ovl: document NFS export Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 02/17] ovl: encode pure upper file handles Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-18 14:09   ` Miklos Szeredi
  2018-01-04 17:20 ` [PATCH v2 04/17] ovl: decode connected upper dir " Amir Goldstein
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Decoding an upper file handle is done by decoding the upper dentry from
underlying upper fs, finding or allocating an overlay inode that is
hashed by the real upper inode and instantiating an overlay dentry with
that inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/namei.c     |  4 +--
 fs/overlayfs/overlayfs.h |  2 ++
 3 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 58c4f5e8a67e..5c72784a0b4d 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -93,6 +93,97 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
 	return type;
 }
 
+/*
+ * Find or instantiate an overlay dentry from real dentries.
+ */
+static struct dentry *ovl_obtain_alias(struct super_block *sb,
+				       struct dentry *upper,
+				       struct ovl_path *lowerpath)
+{
+	struct inode *inode;
+	struct dentry *dentry;
+	struct ovl_entry *oe;
+
+	/* TODO: obtain non pure-upper */
+	if (lowerpath)
+		return ERR_PTR(-EIO);
+
+	inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
+	if (IS_ERR(inode)) {
+		dput(upper);
+		return ERR_CAST(inode);
+	}
+
+	dentry = d_obtain_alias(inode);
+	if (IS_ERR(dentry) || dentry->d_fsdata)
+		return dentry;
+
+	oe = ovl_alloc_entry(0);
+	if (!oe) {
+		dput(dentry);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	dentry->d_fsdata = oe;
+	ovl_dentry_set_upper_alias(dentry);
+
+	return dentry;
+}
+
+static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
+					struct ovl_fh *fh)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+	struct dentry *dentry;
+	struct dentry *upper;
+
+	if (!ofs->upper_mnt)
+		return ERR_PTR(-EACCES);
+
+	upper = ovl_decode_fh(fh, ofs->upper_mnt);
+	if (IS_ERR_OR_NULL(upper))
+		return upper;
+
+	dentry = ovl_obtain_alias(sb, upper, NULL);
+	dput(upper);
+
+	return dentry;
+}
+
+static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
+				       int fh_len, int fh_type)
+{
+	struct dentry *dentry = NULL;
+	struct ovl_fh *fh = (struct ovl_fh *) fid;
+	int len = fh_len << 2;
+	unsigned int flags = 0;
+	int err;
+
+	err = -EINVAL;
+	if (fh_type != OVL_FILEID)
+		goto out_err;
+
+	err = ovl_check_fh_len(fh, len);
+	if (err)
+		goto out_err;
+
+	/* TODO: decode non-upper */
+	flags = fh->flags;
+	if (flags & OVL_FH_FLAG_PATH_UPPER)
+		dentry = ovl_upper_fh_to_d(sb, fh);
+	err = PTR_ERR(dentry);
+	if (IS_ERR(dentry) && err != -ESTALE)
+		goto out_err;
+
+	return dentry;
+
+out_err:
+	pr_warn_ratelimited("overlayfs: failed to decode file handle (len=%d, type=%d, flags=%x, err=%i)\n",
+			    len, fh_type, flags, err);
+	return ERR_PTR(err);
+}
+
 const struct export_operations ovl_export_operations = {
 	.encode_fh      = ovl_encode_inode_fh,
+	.fh_to_dentry	= ovl_fh_to_dentry,
 };
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index a69cedf06000..87d39384dc55 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -107,7 +107,7 @@ static int ovl_acceptable(void *ctx, struct dentry *dentry)
  * Return -ENODATA for "origin unknown".
  * Return <0 for an invalid file handle.
  */
-static int ovl_check_fh_len(struct ovl_fh *fh, int fh_len)
+int ovl_check_fh_len(struct ovl_fh *fh, int fh_len)
 {
 	if (fh_len < sizeof(struct ovl_fh) || fh_len < fh->len)
 		return -EINVAL;
@@ -171,7 +171,7 @@ static struct ovl_fh *ovl_get_origin_fh(struct dentry *dentry)
 	goto out;
 }
 
-static struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt)
+struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt)
 {
 	struct dentry *origin;
 	int bytes;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index f6fd999cb98e..c4f8e98e209e 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -258,6 +258,8 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
 
 
 /* namei.c */
+int ovl_check_fh_len(struct ovl_fh *fh, int fh_len);
+struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt);
 int ovl_verify_origin(struct dentry *dentry, struct dentry *origin,
 		      bool is_upper, bool set);
 int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (2 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 03/17] ovl: decode " Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-05 12:33   ` Amir Goldstein
  2018-01-15 11:33   ` Miklos Szeredi
  2018-01-04 17:20 ` [PATCH v2 05/17] ovl: encode non-indexed upper " Amir Goldstein
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Until this change, we decoded upper file handles by instantiating an
overlay dentry from the real upper dentry. This is sufficient to handle
pure upper files, but insufficient to handle merge/impure dirs.

To that end, if decoded real upper dir is connected and hashed, we
lookup an overlay dentry with the same path as the real upper dir.
If decoded real upper is non-dir, we instantiate a disconnected overlay
dentry as before this change.

Because ovl_fh_to_dentry() returns connected overlay dir dentries,
exportfs never need to call get_parent() and get_name() to reconnect an
upper overlay dir. Because connectable non-dir file handles are not
supported, exportfs will not be able to use fh_to_parent() and get_name()
methods to reconnect a disconnected non-dir to its parent. Therefore, the
methods get_parent() and get_name() are implemented just to print out a
sanity warning and the method fh_to_parent() is implemented to warn the
user that using the 'subtree_check' exportfs option is not supported.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 171 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 5c72784a0b4d..48ae02f3acb8 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -130,6 +130,145 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
 	return dentry;
 }
 
+/*
+ * Lookup a child overlay dentry whose real dentry is @real.
+ * If @is_upper is true then we lookup a child overlay dentry with the same
+ * name as the real dentry. Otherwise, we need to consult index for lookup.
+ */
+static struct dentry *ovl_lookup_real_one(struct dentry *parent,
+					  struct dentry *real, bool is_upper)
+{
+	struct dentry *this;
+	struct qstr *name = &real->d_name;
+	int err;
+
+	/* TODO: use index when looking up by lower real dentry */
+	if (!is_upper)
+		return ERR_PTR(-EACCES);
+
+	/* Lookup overlay dentry by real name */
+	this = lookup_one_len_unlocked(name->name, parent, name->len);
+	err = PTR_ERR(this);
+	if (IS_ERR(this)) {
+		goto fail;
+	} else if (!this || !this->d_inode) {
+		dput(this);
+		err = -ENOENT;
+		goto fail;
+	} else if (ovl_dentry_upper(this) != real) {
+		dput(this);
+		err = -ESTALE;
+		goto fail;
+	}
+
+	return this;
+
+fail:
+	pr_warn_ratelimited("overlayfs: failed to lookup one by real (%pd2, is_upper=%d, parent=%pd2, err=%i)\n",
+			    real, is_upper, parent, err);
+	return ERR_PTR(err);
+}
+
+/*
+ * Lookup an overlay dentry whose real dentry is @real.
+ * If @is_upper is true then we lookup an overlay dentry with the same path
+ * as the real dentry. Otherwise, we need to consult index for lookup.
+ */
+static struct dentry *ovl_lookup_real(struct super_block *sb,
+				      struct dentry *real, bool is_upper)
+{
+	struct dentry *connected;
+	int err = 0;
+
+	/* TODO: use index when looking up by lower real dentry */
+	if (!is_upper)
+		return ERR_PTR(-EACCES);
+
+	connected = dget(sb->s_root);
+	while (!err) {
+		struct dentry *next, *this;
+		struct dentry *parent = NULL;
+		struct dentry *real_connected = ovl_dentry_upper(connected);
+
+		if (real_connected == real)
+			break;
+
+		next = dget(real);
+		/* find the topmost dentry not yet connected */
+		for (;;) {
+			parent = dget_parent(next);
+
+			if (real_connected == parent)
+				break;
+
+			/*
+			 * If real file has been moved out of the layer root
+			 * directory, we will eventully hit the real fs root.
+			 */
+			if (parent == next) {
+				err = -EXDEV;
+				break;
+			}
+
+			dput(next);
+			next = parent;
+		}
+
+		if (!err) {
+			this = ovl_lookup_real_one(connected, next, is_upper);
+			if (!IS_ERR(this)) {
+				dput(connected);
+				connected = this;
+			} else {
+				err = PTR_ERR(this);
+			}
+		}
+
+		dput(parent);
+		dput(next);
+	}
+
+	if (err)
+		goto fail;
+
+	return connected;
+
+fail:
+	pr_warn_ratelimited("overlayfs: failed to lookup by real (%pd2, is_upper=%d, connected=%pd2, err=%i)\n",
+			    real, is_upper, connected, err);
+	dput(connected);
+	return ERR_PTR(err);
+}
+
+/*
+ * Get an overlay dentry from upper/lower real dentries.
+ */
+static struct dentry *ovl_get_dentry(struct super_block *sb,
+				     struct dentry *upper,
+				     struct ovl_path *lowerpath)
+{
+	/* TODO: get non-upper dentry */
+	if (!upper)
+		return ERR_PTR(-EACCES);
+
+	/*
+	 * Obtain a disconnected overlay dentry from a non-dir real upper
+	 * dentry.
+	 */
+	if (!d_is_dir(upper))
+		return ovl_obtain_alias(sb, upper, NULL);
+
+	/* Removed empty directory? */
+	if ((upper->d_flags & DCACHE_DISCONNECTED) || d_unhashed(upper))
+		return ERR_PTR(-ENOENT);
+
+	/*
+	 * If real upper dentry is connected and hashed, get a connected
+	 * overlay dentry with the same path as the real upper dentry.
+	 */
+	return ovl_lookup_real(sb, upper, true);
+}
+
 static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
 					struct ovl_fh *fh)
 {
@@ -144,7 +283,7 @@ static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
 	if (IS_ERR_OR_NULL(upper))
 		return upper;
 
-	dentry = ovl_obtain_alias(sb, upper, NULL);
+	dentry = ovl_get_dentry(sb, upper, NULL);
 	dput(upper);
 
 	return dentry;
@@ -183,7 +322,38 @@ static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
 	return ERR_PTR(err);
 }
 
+static struct dentry *ovl_fh_to_parent(struct super_block *sb, struct fid *fid,
+				       int fh_len, int fh_type)
+{
+	pr_warn_ratelimited("overlayfs: connectable file handles not supported; use 'no_subtree_check' exportfs option.\n");
+	return ERR_PTR(-EACCES);
+}
+
+static int ovl_get_name(struct dentry *parent, char *name,
+			struct dentry *child)
+{
+	/*
+	 * ovl_fh_to_dentry() returns connected dir overlay dentries and
+	 * ovl_fh_to_parent() is not implemented, so we should not get here.
+	 */
+	WARN_ON_ONCE(1);
+	return -EIO;
+}
+
+static struct dentry *ovl_get_parent(struct dentry *dentry)
+{
+	/*
+	 * ovl_fh_to_dentry() returns connected dir overlay dentries, so we
+	 * should not get here.
+	 */
+	WARN_ON_ONCE(1);
+	return ERR_PTR(-EIO);
+}
+
 const struct export_operations ovl_export_operations = {
 	.encode_fh      = ovl_encode_inode_fh,
 	.fh_to_dentry	= ovl_fh_to_dentry,
+	.fh_to_parent	= ovl_fh_to_parent,
+	.get_name	= ovl_get_name,
+	.get_parent	= ovl_get_parent,
 };
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 05/17] ovl: encode non-indexed upper file handles
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (3 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 04/17] ovl: decode connected upper dir " Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-15 11:58   ` Miklos Szeredi
  2018-01-04 17:20 ` [PATCH v2 06/17] ovl: copy up before encoding dir file handle when ofs->numlower > 1 Amir Goldstein
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

We only need to encode origin if there is a chance that the same object was
encoded pre copy up and then we need to stay consistent with the same
encoding also after copy up.

In case a non-pure upper is not indexed, then it was copied up before NFS
export support was enabled. In that case, we don't need to worry about
staying consistent with pre copy up encoding and we encode an upper file
handle.

This mitigates the problem that with no index, we cannot find an upper
inode from origin inode, so we cannot decode a non-indexed upper from
origin file handle.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 48ae02f3acb8..919d43aaa387 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -19,6 +19,28 @@
 #include <linux/ratelimit.h>
 #include "overlayfs.h"
 
+/*
+ * We only need to encode origin if there is a chance that the same object was
+ * encoded pre copy up and then we need to stay consistent with the same
+ * encoding also after copy up. If non-pure upper is not indexed, then it was
+ * copied up before NFS export was enabled. In that case we don't need to worry
+ * about staying consistent with pre copy up encoding and we encode an upper
+ * file handle.
+ */
+static bool ovl_should_encode_origin(struct dentry *dentry)
+{
+	/* Root dentry was born upper */
+	if (dentry == dentry->d_sb->s_root)
+		return false;
+
+	/* Decoding a non-indexed upper from origin is not implemented */
+	if (ovl_dentry_upper(dentry) &&
+	    !ovl_test_flag(OVL_INDEX, d_inode(dentry)))
+		return false;
+
+	return true;
+}
+
 int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
 {
 	struct dentry *upper = ovl_dentry_upper(dentry);
@@ -26,11 +48,7 @@ int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
 	struct ovl_fh *fh = NULL;
 	int err;
 
-	/*
-	 * Overlay root dir inode is encoded as an upper file handle upper,
-	 * because root dir dentry is born upper and not indexed.
-	 */
-	if (dentry == dentry->d_sb->s_root)
+	if (!ovl_should_encode_origin(dentry))
 		origin = NULL;
 
 	err = -EACCES;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 06/17] ovl: copy up before encoding dir file handle when ofs->numlower > 1
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (4 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 05/17] ovl: encode non-indexed upper " Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 07/17] ovl: encode lower file handles Amir Goldstein
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Decoding a merge dir, whose origin's parent may be on a different lower
layer then the overlay parent's origin is not implemented. As a simple
aproximation, we do not encode lower dir file handles when overlay has
multiple lower layers. Instead, we copy up the lower dir first and then
encode an upper dir file handle.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 64 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 919d43aaa387..149cfb5c967e 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -26,13 +26,48 @@
  * copied up before NFS export was enabled. In that case we don't need to worry
  * about staying consistent with pre copy up encoding and we encode an upper
  * file handle.
+ *
+ * The following table summarizes the different file handle encodings used for
+ * different overlay object types with overlay configuration of single and
+ * multiple lower layers:
+ *
+ *  Object type		| Single lower	| Multiple lower
+ * --------------------------------------------------------
+ *  Pure upper		|	U	|	U
+ *  Non-indexed upper	|	U	|	U
+ *  Indexed non-dir	|	L	|	L
+ *  Lower non-dir	|	L	|	L
+ *  Indexed directory	|	L	|	U
+ *  Lower directory	|	L	|	U (*)
+ *
+ * U = upper file handle
+ * L = lower file handle
+ *
+ * The important thing to note is that within the same overlay configuration
+ * an overlay object encoding is invariant to copy up (i.e. Lower->Indexed).
+ *
+ * (*) If decoding an overlay dir from origin is not implemented, we do not
+ * encode by lower inode, because if file gets copied up after we encoded it,
+ * we won't be able to decode the file handle. To mitigate this case, we copy
+ * up the lower dir first and then encode an upper dir file handle.
  */
 static bool ovl_should_encode_origin(struct dentry *dentry)
 {
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+
 	/* Root dentry was born upper */
 	if (dentry == dentry->d_sb->s_root)
 		return false;
 
+	/*
+	 * Decoding a merge dir, whose origin's parent may be on a different
+	 * lower layer then the overlay parent's origin is not implemented.
+	 * As a simple aproximation, we do not encode lower dir file handles
+	 * when overlay has multiple lower layers.
+	 */
+	if (d_is_dir(dentry) && ofs->numlower > 1)
+		return false;
+
 	/* Decoding a non-indexed upper from origin is not implemented */
 	if (ovl_dentry_upper(dentry) &&
 	    !ovl_test_flag(OVL_INDEX, d_inode(dentry)))
@@ -41,16 +76,43 @@ static bool ovl_should_encode_origin(struct dentry *dentry)
 	return true;
 }
 
+static int ovl_encode_maybe_copy_up(struct dentry *dentry)
+{
+	int err;
+
+	if (ovl_dentry_upper(dentry))
+		return 0;
+
+	err = ovl_want_write(dentry);
+	if (err)
+		return err;
+
+	err = ovl_copy_up(dentry);
+
+	ovl_drop_write(dentry);
+	return err;
+}
+
 int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
 {
-	struct dentry *upper = ovl_dentry_upper(dentry);
+	struct dentry *upper;
 	struct dentry *origin = ovl_dentry_lower(dentry);
 	struct ovl_fh *fh = NULL;
 	int err;
 
-	if (!ovl_should_encode_origin(dentry))
+	/*
+	 * If we should not encode a lower dir file handle, copy up and encode
+	 * an upper dir file handle.
+	 */
+	if (!ovl_should_encode_origin(dentry)) {
+		err = ovl_encode_maybe_copy_up(dentry);
+		if (err)
+			goto fail;
+
 		origin = NULL;
+	}
 
+	upper = ovl_dentry_upper(dentry);
 	err = -EACCES;
 	if (!upper || origin)
 		goto fail;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 07/17] ovl: encode lower file handles
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (5 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 06/17] ovl: copy up before encoding dir file handle when ofs->numlower > 1 Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 08/17] ovl: decode lower non-dir " Amir Goldstein
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

For indexed or lower non-dir, encode a non-connectable lower file handle
from origin inode. For indexed or lower dir, when ofs->numlower == 1,
encode a lower file handle from lower dir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 149cfb5c967e..0b4ad8693b29 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -95,7 +95,6 @@ static int ovl_encode_maybe_copy_up(struct dentry *dentry)
 
 int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
 {
-	struct dentry *upper;
 	struct dentry *origin = ovl_dentry_lower(dentry);
 	struct ovl_fh *fh = NULL;
 	int err;
@@ -112,13 +111,8 @@ int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
 		origin = NULL;
 	}
 
-	upper = ovl_dentry_upper(dentry);
-	err = -EACCES;
-	if (!upper || origin)
-		goto fail;
-
-	/* TODO: encode non pure-upper by origin */
-	fh = ovl_encode_fh(upper, true);
+	/* Encode an upper or origin file handle */
+	fh = ovl_encode_fh(origin ?: ovl_dentry_upper(dentry), !origin);
 
 	err = -EOVERFLOW;
 	if (fh->len > buflen)
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 08/17] ovl: decode lower non-dir file handles
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (6 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 07/17] ovl: encode lower file handles Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 09/17] ovl: decode indexed " Amir Goldstein
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Decoding a lower non-dir file handle is done by decoding the lower dentry
from underlying lower fs, finding or allocating an overlay inode that is
hashed by the real lower inode and instantiating an overlay dentry with
that inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c    | 53 +++++++++++++++++++++++++++++++++++++++---------
 fs/overlayfs/namei.c     |  8 ++++----
 fs/overlayfs/overlayfs.h |  3 +++
 3 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 0b4ad8693b29..6b359f968c01 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -174,15 +174,16 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
 				       struct dentry *upper,
 				       struct ovl_path *lowerpath)
 {
-	struct inode *inode;
+	struct dentry *lower = lowerpath ? lowerpath->dentry : NULL;
 	struct dentry *dentry;
+	struct inode *inode;
 	struct ovl_entry *oe;
 
-	/* TODO: obtain non pure-upper */
-	if (lowerpath)
+	/* TODO: obtain an indexed non-dir upper with origin */
+	if (lower && (upper || d_is_dir(lower)))
 		return ERR_PTR(-EIO);
 
-	inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
+	inode = ovl_get_inode(sb, dget(upper), lower, NULL, !!lower);
 	if (IS_ERR(inode)) {
 		dput(upper);
 		return ERR_CAST(inode);
@@ -192,14 +193,19 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
 	if (IS_ERR(dentry) || dentry->d_fsdata)
 		return dentry;
 
-	oe = ovl_alloc_entry(0);
+	oe = ovl_alloc_entry(!!lower);
 	if (!oe) {
 		dput(dentry);
 		return ERR_PTR(-ENOMEM);
 	}
 
 	dentry->d_fsdata = oe;
-	ovl_dentry_set_upper_alias(dentry);
+	if (upper)
+		ovl_dentry_set_upper_alias(dentry);
+	if (lower) {
+		oe->lowerstack->dentry = dget(lower);
+		oe->lowerstack->layer = lowerpath->layer;
+	}
 
 	return dentry;
 }
@@ -321,7 +327,14 @@ static struct dentry *ovl_get_dentry(struct super_block *sb,
 				     struct dentry *upper,
 				     struct ovl_path *lowerpath)
 {
-	/* TODO: get non-upper dentry */
+	/*
+	 * Obtain a disconnected overlay dentry from a disconnected non-dir
+	 * real lower dentry.
+	 */
+	if (!upper && !d_is_dir(lowerpath->dentry))
+		return ovl_obtain_alias(sb, NULL, lowerpath);
+
+	/* TODO: lookup connected dir from real lower dir */
 	if (!upper)
 		return ERR_PTR(-EACCES);
 
@@ -363,6 +376,26 @@ static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
 	return dentry;
 }
 
+static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
+					struct ovl_fh *fh)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+	struct ovl_path origin = { };
+	struct ovl_path *stack = &origin;
+	struct dentry *dentry = NULL;
+	int err;
+
+	err = ovl_check_origin_fh(fh, NULL, ofs->lower_layers, ofs->numlower,
+				  &stack);
+	if (err)
+		return ERR_PTR(err);
+
+	dentry = ovl_get_dentry(sb, NULL, &origin);
+	dput(origin.dentry);
+
+	return dentry;
+}
+
 static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
 				       int fh_len, int fh_type)
 {
@@ -380,10 +413,10 @@ static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
 	if (err)
 		goto out_err;
 
-	/* TODO: decode non-upper */
 	flags = fh->flags;
-	if (flags & OVL_FH_FLAG_PATH_UPPER)
-		dentry = ovl_upper_fh_to_d(sb, fh);
+	dentry = (flags & OVL_FH_FLAG_PATH_UPPER) ?
+		 ovl_upper_fh_to_d(sb, fh) :
+		 ovl_lower_fh_to_d(sb, fh);
 	err = PTR_ERR(dentry);
 	if (IS_ERR(dentry) && err != -ESTALE)
 		goto out_err;
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 87d39384dc55..638ff196da93 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -310,9 +310,9 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 }
 
 
-static int ovl_check_origin_fh(struct ovl_fh *fh, struct dentry *upperdentry,
-			       struct ovl_layer *layers, unsigned int numlayers,
-			       struct ovl_path **stackp)
+int ovl_check_origin_fh(struct ovl_fh *fh, struct dentry *upperdentry,
+			struct ovl_layer *layers, unsigned int numlayers,
+			struct ovl_path **stackp)
 {
 	struct dentry *origin = NULL;
 	int i;
@@ -328,7 +328,7 @@ static int ovl_check_origin_fh(struct ovl_fh *fh, struct dentry *upperdentry,
 	else if (IS_ERR(origin))
 		return PTR_ERR(origin);
 
-	if (!ovl_is_whiteout(upperdentry) &&
+	if (upperdentry && !ovl_is_whiteout(upperdentry) &&
 	    ((d_inode(origin)->i_mode ^ d_inode(upperdentry)->i_mode) & S_IFMT))
 		goto invalid;
 
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index c4f8e98e209e..2ddd74043b5f 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -260,6 +260,9 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
 /* namei.c */
 int ovl_check_fh_len(struct ovl_fh *fh, int fh_len);
 struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt);
+int ovl_check_origin_fh(struct ovl_fh *fh, struct dentry *upperdentry,
+			struct ovl_layer *layers, unsigned int numlayers,
+			struct ovl_path **stackp);
 int ovl_verify_origin(struct dentry *dentry, struct dentry *origin,
 		      bool is_upper, bool set);
 int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 09/17] ovl: decode indexed non-dir file handles
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (7 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 08/17] ovl: decode lower non-dir " Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-18 13:11   ` Miklos Szeredi
  2018-01-04 17:20 ` [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files Amir Goldstein
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Decoding an indexed non-dir file handle is similar to decoding a lower
non-dir file handle, but additionally, we lookup the file handle in index
dir by name to find the real upper inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c | 72 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 47 insertions(+), 25 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 6b359f968c01..602bada474ba 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -168,22 +168,24 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
 }
 
 /*
- * Find or instantiate an overlay dentry from real dentries.
+ * Find or instantiate an overlay dentry from real dentries and index.
  */
 static struct dentry *ovl_obtain_alias(struct super_block *sb,
-				       struct dentry *upper,
-				       struct ovl_path *lowerpath)
+				       struct dentry *upper_alias,
+				       struct ovl_path *lowerpath,
+				       struct dentry *index)
 {
 	struct dentry *lower = lowerpath ? lowerpath->dentry : NULL;
+	struct dentry *upper = upper_alias ?: index;
 	struct dentry *dentry;
 	struct inode *inode;
 	struct ovl_entry *oe;
 
-	/* TODO: obtain an indexed non-dir upper with origin */
-	if (lower && (upper || d_is_dir(lower)))
+	/* We get overlay directory dentries with ovl_lookup_real() */
+	if (d_is_dir(upper ?: lower))
 		return ERR_PTR(-EIO);
 
-	inode = ovl_get_inode(sb, dget(upper), lower, NULL, !!lower);
+	inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower);
 	if (IS_ERR(inode)) {
 		dput(upper);
 		return ERR_CAST(inode);
@@ -200,13 +202,16 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
 	}
 
 	dentry->d_fsdata = oe;
-	if (upper)
+	if (upper_alias)
 		ovl_dentry_set_upper_alias(dentry);
 	if (lower) {
 		oe->lowerstack->dentry = dget(lower);
 		oe->lowerstack->layer = lowerpath->layer;
 	}
 
+	if (index)
+		ovl_set_flag(OVL_INDEX, inode);
+
 	return dentry;
 }
 
@@ -321,30 +326,26 @@ static struct dentry *ovl_lookup_real(struct super_block *sb,
 }
 
 /*
- * Get an overlay dentry from upper/lower real dentries.
+ * Get an overlay dentry from upper/lower real dentries and index.
  */
 static struct dentry *ovl_get_dentry(struct super_block *sb,
 				     struct dentry *upper,
-				     struct ovl_path *lowerpath)
+				     struct ovl_path *lowerpath,
+				     struct dentry *index)
 {
+	struct dentry *real = upper ?: (index ?: lowerpath->dentry);
+
 	/*
-	 * Obtain a disconnected overlay dentry from a disconnected non-dir
-	 * real lower dentry.
+	 * Obtain a disconnected overlay dentry from a non-dir real dentry
+	 * and index.
 	 */
-	if (!upper && !d_is_dir(lowerpath->dentry))
-		return ovl_obtain_alias(sb, NULL, lowerpath);
+	if (!d_is_dir(real))
+		return ovl_obtain_alias(sb, upper, lowerpath, index);
 
 	/* TODO: lookup connected dir from real lower dir */
 	if (!upper)
 		return ERR_PTR(-EACCES);
 
-	/*
-	 * Obtain a disconnected overlay dentry from a non-dir real upper
-	 * dentry.
-	 */
-	if (!d_is_dir(upper))
-		return ovl_obtain_alias(sb, upper, NULL);
-
 	/* Removed empty directory? */
 	if ((upper->d_flags & DCACHE_DISCONNECTED) || d_unhashed(upper))
 		return ERR_PTR(-ENOENT);
@@ -370,7 +371,7 @@ static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
 	if (IS_ERR_OR_NULL(upper))
 		return upper;
 
-	dentry = ovl_get_dentry(sb, upper, NULL);
+	dentry = ovl_get_dentry(sb, upper, NULL, NULL);
 	dput(upper);
 
 	return dentry;
@@ -383,17 +384,38 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
 	struct ovl_path origin = { };
 	struct ovl_path *stack = &origin;
 	struct dentry *dentry = NULL;
+	struct dentry *index = NULL;
 	int err;
 
+	/* First lookup indexed upper by fh */
+	index = ovl_get_index_fh(ofs, fh);
+	err = PTR_ERR(index);
+	if (IS_ERR(index))
+		return ERR_PTR(err);
+
+	/* Then lookup origin by fh */
 	err = ovl_check_origin_fh(fh, NULL, ofs->lower_layers, ofs->numlower,
 				  &stack);
-	if (err)
-		return ERR_PTR(err);
+	if (err) {
+		goto out_err;
+	} else if (!index && !origin.dentry) {
+		return NULL;
+	} else if (index && origin.dentry) {
+		err = ovl_verify_origin(index, origin.dentry, false, false);
+		if (err)
+			goto out_err;
+	}
 
-	dentry = ovl_get_dentry(sb, NULL, &origin);
-	dput(origin.dentry);
+	dentry = ovl_get_dentry(sb, NULL, &origin, index);
 
+out:
+	dput(origin.dentry);
+	dput(index);
 	return dentry;
+
+out_err:
+	dentry = ERR_PTR(err);
+	goto out;
 }
 
 static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (8 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 09/17] ovl: decode indexed " Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-16  9:16   ` Miklos Szeredi
  2018-01-18 14:18   ` Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 11/17] ovl: decode indexed dir file handles Amir Goldstein
                   ` (6 subsequent siblings)
  16 siblings, 2 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Lookup overlay inode in cache by origin inode, so we can decode a file
handle of an open file even if the index has a whiteout index entry to
mark this overlay inode was unlinked.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c    | 22 ++++++++++++++++++++--
 fs/overlayfs/inode.c     | 16 ++++++++++++++++
 fs/overlayfs/overlayfs.h |  1 +
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 602bada474ba..6ecb54d4b52c 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -385,13 +385,21 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
 	struct ovl_path *stack = &origin;
 	struct dentry *dentry = NULL;
 	struct dentry *index = NULL;
+	struct inode *inode = NULL;
+	bool is_deleted = false;
 	int err;
 
 	/* First lookup indexed upper by fh */
 	index = ovl_get_index_fh(ofs, fh);
 	err = PTR_ERR(index);
-	if (IS_ERR(index))
-		return ERR_PTR(err);
+	if (IS_ERR(index)) {
+		if (err != -ESTALE)
+			return ERR_PTR(err);
+
+		/* Found a whiteout index - treat as deleted inode */
+		is_deleted = true;
+		index = NULL;
+	}
 
 	/* Then lookup origin by fh */
 	err = ovl_check_origin_fh(fh, NULL, ofs->lower_layers, ofs->numlower,
@@ -404,6 +412,15 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
 		err = ovl_verify_origin(index, origin.dentry, false, false);
 		if (err)
 			goto out_err;
+	} else if (is_deleted && origin.dentry && !d_is_dir(origin.dentry)) {
+		/* Lookup deleted overlay inode by origin inode */
+		inode = ovl_lookup_inode(sb, origin.dentry);
+		err = -ESTALE;
+		if (!inode || atomic_read(&inode->i_count) == 1)
+			goto out_err;
+
+		/* Deleted but still open? */
+		index = dget(ovl_i_dentry_upper(inode));
 	}
 
 	dentry = ovl_get_dentry(sb, NULL, &origin, index);
@@ -411,6 +428,7 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
 out:
 	dput(origin.dentry);
 	dput(index);
+	iput(inode);
 	return dentry;
 
 out_err:
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index bb742d195f12..a25908ba3512 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -637,6 +637,22 @@ static bool ovl_verify_inode(struct inode *inode, struct dentry *lowerdentry,
 	return true;
 }
 
+struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *origin)
+{
+	struct inode *inode, *key = d_inode(origin);
+
+	inode = ilookup5(sb, (unsigned long) key, ovl_inode_test, key);
+	if (!inode)
+		return NULL;
+
+	if (!ovl_verify_inode(inode, origin, NULL)) {
+		iput(inode);
+		return ERR_PTR(-ESTALE);
+	}
+
+	return inode;
+}
+
 struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 			    struct dentry *lowerdentry, struct dentry *index,
 			    unsigned int numlower)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 2ddd74043b5f..8fa8253af7cb 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -305,6 +305,7 @@ int ovl_update_time(struct inode *inode, struct timespec *ts, int flags);
 bool ovl_is_private_xattr(const char *name);
 
 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev);
+struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *origin);
 struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 			    struct dentry *lowerdentry, struct dentry *index,
 			    unsigned int numlower);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 11/17] ovl: decode indexed dir file handles
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (9 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 12/17] ovl: decode pure lower " Amir Goldstein
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Decoding an indexed dir file handle is done by looking up the file handle
in index dir by name and then decoding the upper dir from the index origin
file handle. The decoded upper path is used to lookup an overlay dentry of
the same path.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c    | 13 +++++++++++++
 fs/overlayfs/namei.c     |  2 +-
 fs/overlayfs/overlayfs.h |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 6ecb54d4b52c..6141682301d6 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -401,6 +401,19 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
 		index = NULL;
 	}
 
+	/* Then try to get upper dir by index */
+	if (index && d_is_dir(index)) {
+		struct dentry *upper = ovl_index_upper(ofs, index);
+
+		err = PTR_ERR(upper);
+		if (IS_ERR_OR_NULL(upper))
+			goto out_err;
+
+		dentry = ovl_get_dentry(sb, upper, NULL, NULL);
+		dput(upper);
+		goto out;
+	}
+
 	/* Then lookup origin by fh */
 	err = ovl_check_origin_fh(fh, NULL, ofs->lower_layers, ofs->numlower,
 				  &stack);
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 638ff196da93..13869108dc32 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -436,7 +436,7 @@ int ovl_verify_origin(struct dentry *dentry, struct dentry *origin,
 }
 
 /* Get upper dentry from index */
-static struct dentry *ovl_index_upper(struct ovl_fs *ofs, struct dentry *index)
+struct dentry *ovl_index_upper(struct ovl_fs *ofs, struct dentry *index)
 {
 	struct ovl_fh *fh;
 	struct ovl_layer layer = { .mnt = ofs->upper_mnt };
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 8fa8253af7cb..7310e0eca383 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -265,6 +265,7 @@ int ovl_check_origin_fh(struct ovl_fh *fh, struct dentry *upperdentry,
 			struct ovl_path **stackp);
 int ovl_verify_origin(struct dentry *dentry, struct dentry *origin,
 		      bool is_upper, bool set);
+struct dentry *ovl_index_upper(struct ovl_fs *ofs, struct dentry *index);
 int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index);
 int ovl_get_index_name(struct dentry *origin, struct qstr *name);
 struct dentry *ovl_get_index_fh(struct ovl_fs *ofs, struct ovl_fh *fh);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 12/17] ovl: decode pure lower dir file handles
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (10 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 11/17] ovl: decode indexed dir file handles Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 13/17] ovl: hash directory inodes for NFS export Amir Goldstein
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Similar to decoding a pure upper dir file handle, decoding a pure lower
dir file handle is implemented by looking an overlay dentry of the same
path as the pure lower path and verifying that the overlay dentry's
real lower matches the decoded real lower file handle.

Unlike the case of upper dir file handle, the lookup of overlay path by
lower real path can fail or find a mismatched overlay dentry if any of
the lower parents have been copied up and renamed. To address this case
we will need to check if any of the lower parents are indexed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 6141682301d6..ec4b9f29d40d 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -215,6 +215,11 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
 	return dentry;
 }
 
+static struct dentry *ovl_dentry_real_at(struct dentry *dentry, bool is_upper)
+{
+	return is_upper ? ovl_dentry_upper(dentry) : ovl_dentry_lower(dentry);
+}
+
 /*
  * Lookup a child overlay dentry whose real dentry is @real.
  * If @is_upper is true then we lookup a child overlay dentry with the same
@@ -228,8 +233,6 @@ static struct dentry *ovl_lookup_real_one(struct dentry *parent,
 	int err;
 
 	/* TODO: use index when looking up by lower real dentry */
-	if (!is_upper)
-		return ERR_PTR(-EACCES);
 
 	/* Lookup overlay dentry by real name */
 	this = lookup_one_len_unlocked(name->name, parent, name->len);
@@ -240,7 +243,7 @@ static struct dentry *ovl_lookup_real_one(struct dentry *parent,
 		dput(this);
 		err = -ENOENT;
 		goto fail;
-	} else if (ovl_dentry_upper(this) != real) {
+	} else if (ovl_dentry_real_at(this, is_upper) != real) {
 		dput(this);
 		err = -ESTALE;
 		goto fail;
@@ -265,15 +268,12 @@ static struct dentry *ovl_lookup_real(struct super_block *sb,
 	struct dentry *connected;
 	int err = 0;
 
-	/* TODO: use index when looking up by lower real dentry */
-	if (!is_upper)
-		return ERR_PTR(-EACCES);
-
 	connected = dget(sb->s_root);
 	while (!err) {
 		struct dentry *next, *this;
 		struct dentry *parent = NULL;
-		struct dentry *real_connected = ovl_dentry_upper(connected);
+		struct dentry *real_connected = ovl_dentry_real_at(connected,
+								   is_upper);
 
 		if (real_connected == real)
 			break;
@@ -342,19 +342,15 @@ static struct dentry *ovl_get_dentry(struct super_block *sb,
 	if (!d_is_dir(real))
 		return ovl_obtain_alias(sb, upper, lowerpath, index);
 
-	/* TODO: lookup connected dir from real lower dir */
-	if (!upper)
-		return ERR_PTR(-EACCES);
-
 	/* Removed empty directory? */
-	if ((upper->d_flags & DCACHE_DISCONNECTED) || d_unhashed(upper))
+	if ((real->d_flags & DCACHE_DISCONNECTED) || d_unhashed(real))
 		return ERR_PTR(-ENOENT);
 
 	/*
-	 * If real upper dentry is connected and hashed, get a connected
-	 * overlay dentry with the same path as the real upper dentry.
+	 * If real dentry is connected and hashed, get a connected overlay
+	 * dentry whose real dentry is @real.
 	 */
-	return ovl_lookup_real(sb, upper, true);
+	return ovl_lookup_real(sb, real, !!upper);
 }
 
 static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 13/17] ovl: hash directory inodes for NFS export
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (11 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 12/17] ovl: decode pure lower " Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 14/17] ovl: lookup connected ancestor of dir in inode cache Amir Goldstein
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

If NFS export is enabled, hash indexed directory inodes by origin inode,
so we can find them in inode cache using the decoded origin inode before
looking up origin file handle in index.

Non-indexed and pure upper dirs are hashed by upper inode, because those
are encoded as upper file handles.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c | 11 +++++++----
 fs/overlayfs/super.c |  3 +++
 fs/overlayfs/util.c  |  4 +++-
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index a25908ba3512..8db3f466df60 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -563,7 +563,9 @@ unsigned int ovl_get_nlink(struct dentry *lowerdentry,
 	char buf[13];
 	int err;
 
-	if (!lowerdentry || !upperdentry || d_inode(lowerdentry)->i_nlink == 1)
+	if (!lowerdentry || !upperdentry ||
+	    d_inode(lowerdentry)->i_nlink == 1 ||
+	    S_ISDIR(d_inode(upperdentry)->i_mode))
 		return fallback;
 
 	err = vfs_getxattr(upperdentry, OVL_XATTR_NLINK, &buf, sizeof(buf) - 1);
@@ -661,6 +663,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	struct inode *inode;
 	/* Already indexed or could be indexed on copy up? */
 	bool indexed = (index || (ovl_indexdir(sb) && !upperdentry));
+	struct dentry *origin = indexed ? lowerdentry : upperdentry;
 
 	if (WARN_ON(upperdentry && indexed && !lowerdentry))
 		return ERR_PTR(-EIO);
@@ -673,10 +676,10 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 	 * not use lower as hash key in that case.
 	 * Hash inodes that are or could be indexed by origin inode and
 	 * non-indexed upper inodes that could be hard linked by upper inode.
+	 * Hash directory inodes only if NFS export is supported.
 	 */
-	if (!S_ISDIR(realinode->i_mode) && (upperdentry || indexed)) {
-		struct inode *key = d_inode(indexed ? lowerdentry :
-						      upperdentry);
+	if (origin && (!S_ISDIR(realinode->i_mode) || sb->s_export_op)) {
+		struct inode *key = d_inode(origin);
 		unsigned int nlink;
 
 		inode = iget5_locked(sb, (unsigned long) key,
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index f152b817e4d0..1bc37bc23e89 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1341,6 +1341,9 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	/* Root is always merge -> can have whiteouts */
 	ovl_set_flag(OVL_WHITEOUTS, d_inode(root_dentry));
+	/* Hash root directory inode by upper dir inode for NFS export */
+	if (sb->s_export_op)
+		ovl_inode_update(d_inode(root_dentry), upperpath.dentry);
 	ovl_inode_init(d_inode(root_dentry), upperpath.dentry,
 		       ovl_dentry_lower(root_dentry));
 
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 1b0dc903cf6d..92a2a70db67c 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -308,7 +308,9 @@ void ovl_inode_update(struct inode *inode, struct dentry *upperdentry)
 	 */
 	smp_wmb();
 	OVL_I(inode)->__upperdentry = upperdentry;
-	if (!S_ISDIR(upperinode->i_mode) && inode_unhashed(inode)) {
+	/* Hash directory inodes only if NFS export is supported */
+	if ((!S_ISDIR(upperinode->i_mode) || inode->i_sb->s_export_op) &&
+	    inode_unhashed(inode)) {
 		inode->i_private = upperinode;
 		__insert_inode_hash(inode, (unsigned long) upperinode);
 	}
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 14/17] ovl: lookup connected ancestor of dir in inode cache
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (12 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 13/17] ovl: hash directory inodes for NFS export Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 15/17] ovl: lookup indexed ancestor of lower dir Amir Goldstein
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Decoding a dir file handle requires walking backward up to layer root and
for lower dir also checking the index to see if any of the parents have
been copied up.

Lookup overlay ancestor dentry in inode/dentry cache by decoded real
parents to shortcut looking up all the way back to layer root.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c    | 88 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/overlayfs/inode.c     |  8 +++--
 fs/overlayfs/overlayfs.h |  3 +-
 3 files changed, 93 insertions(+), 6 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index ec4b9f29d40d..01c4e3f733c1 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -258,6 +258,87 @@ static struct dentry *ovl_lookup_real_one(struct dentry *parent,
 }
 
 /*
+ * Lookup an indexed or hashed overlay dentry by real inode.
+ */
+static struct dentry *ovl_lookup_real_inode(struct super_block *sb,
+					    struct dentry *real, bool is_upper)
+{
+	struct dentry *this = NULL;
+	struct inode *inode;
+
+	inode = ovl_lookup_inode(sb, real, is_upper);
+	if (IS_ERR(inode))
+		return ERR_CAST(inode);
+	if (inode) {
+		this = d_find_any_alias(inode);
+		iput(inode);
+	}
+
+	/* TODO: use index when looking up by origin inode */
+	if (!this)
+		return NULL;
+
+	if (WARN_ON(ovl_dentry_real_at(this, is_upper) != real)) {
+		dput(this);
+		this = ERR_PTR(-EIO);
+	}
+
+	return this;
+}
+
+/*
+ * Lookup an indexed or hashed overlay dentry, whose real dentry is an
+ * ancestor of @real.
+ */
+static struct dentry *ovl_lookup_real_ancestor(struct super_block *sb,
+					       struct dentry *real,
+					       bool is_upper)
+{
+	struct dentry *real_root = ovl_dentry_real_at(sb->s_root, is_upper);
+	struct dentry *next, *parent = NULL;
+	struct dentry *ancestor;
+
+	if (real_root == real)
+		return dget(sb->s_root);
+
+	/* Find the topmost indexed or hashed ancestor */
+	next = dget(real);
+	for (;;) {
+		parent = dget_parent(next);
+
+		/*
+		 * Lookup a matching overlay dentry in inode/dentry
+		 * cache or in index by real inode.
+		 */
+		ancestor = ovl_lookup_real_inode(sb, next, is_upper);
+		if (ancestor)
+			break;
+
+		if (real_root == parent) {
+			ancestor = dget(sb->s_root);
+			break;
+		}
+
+		/*
+		 * If @real has been moved out of the layer root directory,
+		 * we will eventully hit the real fs root.
+		 */
+		if (parent == next) {
+			ancestor = ERR_PTR(-EXDEV);
+			break;
+		}
+
+		dput(next);
+		next = parent;
+	}
+
+	dput(parent);
+	dput(next);
+
+	return ancestor;
+}
+
+/*
  * Lookup an overlay dentry whose real dentry is @real.
  * If @is_upper is true then we lookup an overlay dentry with the same path
  * as the real dentry. Otherwise, we need to consult index for lookup.
@@ -268,7 +349,10 @@ static struct dentry *ovl_lookup_real(struct super_block *sb,
 	struct dentry *connected;
 	int err = 0;
 
-	connected = dget(sb->s_root);
+	connected = ovl_lookup_real_ancestor(sb, real, is_upper);
+	if (IS_ERR_OR_NULL(connected))
+		return connected;
+
 	while (!err) {
 		struct dentry *next, *this;
 		struct dentry *parent = NULL;
@@ -423,7 +507,7 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
 			goto out_err;
 	} else if (is_deleted && origin.dentry && !d_is_dir(origin.dentry)) {
 		/* Lookup deleted overlay inode by origin inode */
-		inode = ovl_lookup_inode(sb, origin.dentry);
+		inode = ovl_lookup_inode(sb, origin.dentry, false);
 		err = -ESTALE;
 		if (!inode || atomic_read(&inode->i_count) == 1)
 			goto out_err;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 8db3f466df60..038a62857580 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -639,15 +639,17 @@ static bool ovl_verify_inode(struct inode *inode, struct dentry *lowerdentry,
 	return true;
 }
 
-struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *origin)
+struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *real,
+			       bool is_upper)
 {
-	struct inode *inode, *key = d_inode(origin);
+	struct inode *inode, *key = d_inode(real);
 
 	inode = ilookup5(sb, (unsigned long) key, ovl_inode_test, key);
 	if (!inode)
 		return NULL;
 
-	if (!ovl_verify_inode(inode, origin, NULL)) {
+	if (!ovl_verify_inode(inode, is_upper ? NULL : real,
+			      is_upper ? real : NULL)) {
 		iput(inode);
 		return ERR_PTR(-ESTALE);
 	}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 7310e0eca383..d06299b27ec3 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -306,7 +306,8 @@ int ovl_update_time(struct inode *inode, struct timespec *ts, int flags);
 bool ovl_is_private_xattr(const char *name);
 
 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev);
-struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *origin);
+struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *real,
+			       bool is_upper);
 struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
 			    struct dentry *lowerdentry, struct dentry *index,
 			    unsigned int numlower);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 15/17] ovl: lookup indexed ancestor of lower dir
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (13 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 14/17] ovl: lookup connected ancestor of dir in inode cache Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 16/17] ovl: wire up NFS export support Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 17/17] nfsd: encode stat->mtime for getattr instead of inode->i_mtime Amir Goldstein
  16 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

ovl_lookup_real(is_upper=false) walks back lower parents to find the
topmost indexed parent. If an indexed ancestor is found before reaching
lower layer root, ovl_lookup_real(is_upper=true) is called recursively
to walk back from indexed upper to the topmost connected/hashed upper
parent (or up to root).

ovl_lookup_real(is_upper=true) then walks forward to connect the topmost
upper overlay dir dentry and ovl_lookup_real(is_upper=false) continues to
walk forward to connect the decoded lower overlay dir dentry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c    | 39 ++++++++++++++++++++++++++++++++++++++-
 fs/overlayfs/namei.c     | 20 ++++++++++++++------
 fs/overlayfs/overlayfs.h |  2 ++
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 01c4e3f733c1..147d9061cc40 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -257,15 +257,24 @@ static struct dentry *ovl_lookup_real_one(struct dentry *parent,
 	return ERR_PTR(err);
 }
 
+static struct dentry *ovl_lookup_real(struct super_block *sb,
+				      struct dentry *real, bool is_upper);
+
 /*
  * Lookup an indexed or hashed overlay dentry by real inode.
  */
 static struct dentry *ovl_lookup_real_inode(struct super_block *sb,
 					    struct dentry *real, bool is_upper)
 {
+	struct ovl_fs *ofs = sb->s_fs_info;
+	struct dentry *index = NULL;
 	struct dentry *this = NULL;
 	struct inode *inode;
 
+	/*
+	 * Decoding upper dir from index is expensive, so first try to lookup
+	 * overlay dentry in inode/dcache.
+	 */
 	inode = ovl_lookup_inode(sb, real, is_upper);
 	if (IS_ERR(inode))
 		return ERR_CAST(inode);
@@ -274,7 +283,35 @@ static struct dentry *ovl_lookup_real_inode(struct super_block *sb,
 		iput(inode);
 	}
 
-	/* TODO: use index when looking up by origin inode */
+	/*
+	 * For decoded lower dir file handle, lookup index by origin to check
+	 * if lower dir was copied up and and/or removed.
+	 */
+	if (!this && !is_upper && !WARN_ON(!d_is_dir(real))) {
+		index = ovl_lookup_index(ofs, NULL, real, false);
+		if (IS_ERR(index))
+			return index;
+	}
+
+	/* Get connected upper overlay dir from index */
+	if (index) {
+		struct dentry *upper = ovl_index_upper(ofs, index);
+
+		dput(index);
+		if (IS_ERR_OR_NULL(upper))
+			return upper;
+
+		/*
+		 * ovl_lookup_real(is_upper=false) may call recursively once to
+		 * ovl_lookup_real(is_upper=true). The first level call walks
+		 * back lower parents to the topmost indexed parent. The second
+		 * recursive call walks back from indexed upper to the topmost
+		 * connected/hashed upper parent (or up to root).
+		 */
+		this = ovl_lookup_real(sb, upper, true);
+		dput(upper);
+	}
+
 	if (!this)
 		return NULL;
 
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 13869108dc32..f728942a5a1f 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -665,11 +665,9 @@ struct dentry *ovl_get_index_fh(struct ovl_fs *ofs, struct ovl_fh *fh)
 	return ERR_PTR(err);
 }
 
-static struct dentry *ovl_lookup_index(struct dentry *dentry,
-				       struct dentry *upper,
-				       struct dentry *origin)
+struct dentry *ovl_lookup_index(struct ovl_fs *ofs, struct dentry *upper,
+				struct dentry *origin, bool verify)
 {
-	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
 	struct dentry *index;
 	struct inode *inode;
 	struct qstr name;
@@ -697,6 +695,16 @@ static struct dentry *ovl_lookup_index(struct dentry *dentry,
 	inode = d_inode(index);
 	if (d_is_negative(index)) {
 		goto out_dput;
+	} else if (ovl_is_whiteout(index) && !verify) {
+		/*
+		 * When index lookup is called with !verify for decoding an
+		 * overlay file handle, a whiteout index implies that decode
+		 * should treat file handle as stale and no need to print a
+		 * warning about it.
+		 */
+		dput(index);
+		index = ERR_PTR(-ESTALE);
+		goto out;
 	} else if (ovl_dentry_weird(index) || ovl_is_whiteout(index) ||
 		   ((inode->i_mode ^ d_inode(origin)->i_mode) & S_IFMT)) {
 		/*
@@ -710,7 +718,7 @@ static struct dentry *ovl_lookup_index(struct dentry *dentry,
 				    index, d_inode(index)->i_mode & S_IFMT,
 				    d_inode(origin)->i_mode & S_IFMT);
 		goto fail;
-	} else if (is_dir) {
+	} else if (is_dir && verify) {
 		if (!upper) {
 			pr_warn_ratelimited("overlayfs: suspected uncovered redirected dir found (origin=%pd2, index=%pd2).\n",
 					    origin, index);
@@ -948,7 +956,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 
 	if (origin && ovl_indexdir(dentry->d_sb) &&
 	    (!d.is_dir || ovl_verify(dentry->d_sb))) {
-		index = ovl_lookup_index(dentry, upperdentry, origin);
+		index = ovl_lookup_index(ofs, upperdentry, origin, true);
 		if (IS_ERR(index)) {
 			err = PTR_ERR(index);
 			index = NULL;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index d06299b27ec3..661f33bd9793 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -269,6 +269,8 @@ struct dentry *ovl_index_upper(struct ovl_fs *ofs, struct dentry *index);
 int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index);
 int ovl_get_index_name(struct dentry *origin, struct qstr *name);
 struct dentry *ovl_get_index_fh(struct ovl_fs *ofs, struct ovl_fh *fh);
+struct dentry *ovl_lookup_index(struct ovl_fs *ofs, struct dentry *upper,
+				struct dentry *origin, bool verify);
 int ovl_path_next(int idx, struct dentry *dentry, struct path *path);
 struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			  unsigned int flags);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 16/17] ovl: wire up NFS export support
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (14 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 15/17] ovl: lookup indexed ancestor of lower dir Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  2018-01-04 17:20 ` [PATCH v2 17/17] nfsd: encode stat->mtime for getattr instead of inode->i_mtime Amir Goldstein
  16 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

Enable NFS export support if the "verify" feature is enabled.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/Kconfig | 2 ++
 fs/overlayfs/super.c | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index 0e4764ed4e23..c63473677c95 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -67,6 +67,8 @@ config OVERLAY_FS_VERIFY
 
 	  The verify feature prevents multiple redirects to the same lower dir
 	  and prevents broken hardlinks from using the same inode number.
+	  The verify feature is required for exporting an overlay filesystem
+	  subtree as an NFS share.
 
 	  Note, that verify feature is not backward compatible.  That is,
 	  mounting an overlay with verification index entries on a kernel
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 1bc37bc23e89..28596d0a53d2 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1316,6 +1316,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		ofs->config.verify = false;
 	}
 
+	/*
+	 * NFS export requires that all layers support file handles and that
+	 * all files and dirs are indexed on copy up (verify=on). We already
+	 * check that all layers support file handles for enabling index.
+	 */
+	if (ofs->config.verify)
+		sb->s_export_op = &ovl_export_operations;
+
 	/* Never override disk quota limits or use reserved space */
 	cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 17/17] nfsd: encode stat->mtime for getattr instead of inode->i_mtime
  2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
                   ` (15 preceding siblings ...)
  2018-01-04 17:20 ` [PATCH v2 16/17] ovl: wire up NFS export support Amir Goldstein
@ 2018-01-04 17:20 ` Amir Goldstein
  16 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-04 17:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, J . Bruce Fields, linux-unionfs, linux-fsdevel

The values of stat->mtime and inode->i_mtime may differ for overlayfs
and stat->mtime is the correct value to use when encoding getattr.
This is also consistent with the fact that other attr times are also
encoded from stat values.

Both callers of lease_get_mtime() already have the value of stat->mtime,
so the only needed change is that lease_get_mtime() will not overwrite
this value with inode->i_mtime in case the inode does not have an
exclusive lease.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/locks.c       | 6 ++----
 fs/nfsd/nfsxdr.c | 1 +
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 21b4dfa289ee..d6ff4beb70ce 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1554,9 +1554,9 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
 EXPORT_SYMBOL(__break_lease);
 
 /**
- *	lease_get_mtime - get the last modified time of an inode
+ *	lease_get_mtime - update modified time of an inode with exclusive lease
  *	@inode: the inode
- *      @time:  pointer to a timespec which will contain the last modified time
+ *      @time:  pointer to a timespec which contains the last modified time
  *
  * This is to force NFS clients to flush their caches for files with
  * exclusive leases.  The justification is that if someone has an
@@ -1580,8 +1580,6 @@ void lease_get_mtime(struct inode *inode, struct timespec *time)
 
 	if (has_lease)
 		*time = current_time(inode);
-	else
-		*time = inode->i_mtime;
 }
 
 EXPORT_SYMBOL(lease_get_mtime);
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
index 644a0342f0e0..79b6064f8977 100644
--- a/fs/nfsd/nfsxdr.c
+++ b/fs/nfsd/nfsxdr.c
@@ -188,6 +188,7 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp,
 	*p++ = htonl((u32) stat->ino);
 	*p++ = htonl((u32) stat->atime.tv_sec);
 	*p++ = htonl(stat->atime.tv_nsec ? stat->atime.tv_nsec / 1000 : 0);
+	time = stat->mtime;
 	lease_get_mtime(d_inode(dentry), &time); 
 	*p++ = htonl((u32) time.tv_sec);
 	*p++ = htonl(time.tv_nsec ? time.tv_nsec / 1000 : 0); 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-04 17:20 ` [PATCH v2 04/17] ovl: decode connected upper dir " Amir Goldstein
@ 2018-01-05 12:33   ` Amir Goldstein
  2018-01-05 15:18     ` J . Bruce Fields
  2018-01-15 11:41     ` Miklos Szeredi
  2018-01-15 11:33   ` Miklos Szeredi
  1 sibling, 2 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-05 12:33 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 4, 2018 at 7:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Until this change, we decoded upper file handles by instantiating an
> overlay dentry from the real upper dentry. This is sufficient to handle
> pure upper files, but insufficient to handle merge/impure dirs.
>
> To that end, if decoded real upper dir is connected and hashed, we
> lookup an overlay dentry with the same path as the real upper dir.
> If decoded real upper is non-dir, we instantiate a disconnected overlay
> dentry as before this change.
>
> Because ovl_fh_to_dentry() returns connected overlay dir dentries,
> exportfs never need to call get_parent() and get_name() to reconnect an
> upper overlay dir. Because connectable non-dir file handles are not
> supported, exportfs will not be able to use fh_to_parent() and get_name()
> methods to reconnect a disconnected non-dir to its parent. Therefore, the
> methods get_parent() and get_name() are implemented just to print out a
> sanity warning and the method fh_to_parent() is implemented to warn the
> user that using the 'subtree_check' exportfs option is not supported.
>

Reviewers who will get this far, should have their eyebrows slightly raised
after reading this commit message and should be asking themselves:

"Why not return a disconnected overlay dentry like any other fs and implement
ovl_get_parent()/ovl_get_name() by looking at parent/name of upper dir?"

I have had this debate with myself for a while and experimented a bit with
both approaches and in the end, I liked the "return connected dentry" result
better. I did not want to write this entire story in commit message, because
in the end, there is nothing incorrect about the choice of either implementation
there are only pros and cons to each choice.

At the moment, the only argument I can think of to counter the chosen approach
is that it adds ~100 lines on code in ovl_lookup_real() and
ovl_lookup_real_one()
helpers that could have been avoided by using the common reconnect_path()
code in fs/exportfs/expfs.c.

The arguments to counter the disconnected dir approach are:
- Obtaining a disconnected overlay dir dentry would requires a
delicate re-factoring
  of ovl_lookup() to get a dentry with overlay parent info. I
personally preferred to
  avoid doing that re-factoring unless it was proven worthy.
- Going down the path of disconnected dir would mean that the (non trivial) code
  path of d_splice_alias() could be traveled and that meant writing
more tests and
  introduces race cases that are very hard to hit on purpose. Taking the path of
  connecting overlay dentry by forward lookup is therefore the safe and boring
  way to avoid surprises.
- In the current implementation, there is an anomaly in the multi
lower layer setup.
  In that case, indexed upper dir inodes are hashed by the lower
inode, but their file
  handles are encoded from the upper inode. Obtaining a disconnected
dir from this
  type of upper file handle would have been a special case that would add more
  code and more complexity. With the forward lookup connect approach, the
  anomaly does not require changing the code - connecting the dentry
is just less
  efficient in case there is an ancestor in inode cache (we won't find
it in cache
  because we will be looking with the wrong inode) and that can be
fixed later if
  we find that use case important enough.

There. Now you see why I did not want this story in commit message?

Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-05 12:33   ` Amir Goldstein
@ 2018-01-05 15:18     ` J . Bruce Fields
  2018-01-05 15:34       ` Amir Goldstein
  2018-01-15 11:41     ` Miklos Szeredi
  1 sibling, 1 reply; 68+ messages in thread
From: J . Bruce Fields @ 2018-01-05 15:18 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Jeff Layton, overlayfs, linux-fsdevel

On Fri, Jan 05, 2018 at 02:33:22PM +0200, Amir Goldstein wrote:
> There. Now you see why I did not want this story in commit message?

No.  I think it's interesting, and might be useful to have around if
someone needs to revisit this decision in the future.  So I'd rather
have it in the changelog or in code comments.  I've had to track down
old mailing list threads for this kind of information in the past and
found it sometimes time-consuming.

--b.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-05 15:18     ` J . Bruce Fields
@ 2018-01-05 15:34       ` Amir Goldstein
  0 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-05 15:34 UTC (permalink / raw)
  To: J . Bruce Fields; +Cc: Miklos Szeredi, Jeff Layton, overlayfs, linux-fsdevel

On Fri, Jan 5, 2018 at 5:18 PM, J . Bruce Fields <bfields@fieldses.org> wrote:
> On Fri, Jan 05, 2018 at 02:33:22PM +0200, Amir Goldstein wrote:
>> There. Now you see why I did not want this story in commit message?
>
> No.  I think it's interesting, and might be useful to have around if
> someone needs to revisit this decision in the future.  So I'd rather
> have it in the changelog or in code comments.  I've had to track down
> old mailing list threads for this kind of information in the past and
> found it sometimes time-consuming.
>

Fair enough, but I'll wait to hear from Miklos first, because he may
have different arguments, or maybe he will call BS on some of my
arguments.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/17] ovl: document NFS export
  2018-01-04 17:20 ` [PATCH v2 01/17] ovl: document NFS export Amir Goldstein
@ 2018-01-11 16:06   ` Miklos Szeredi
  2018-01-11 16:26     ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-11 16:06 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  Documentation/filesystems/overlayfs.txt | 59 +++++++++++++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
>
> diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
> index 00e0595f3d7e..9e21c14c914c 100644
> --- a/Documentation/filesystems/overlayfs.txt
> +++ b/Documentation/filesystems/overlayfs.txt
> @@ -315,6 +315,65 @@ origin file handle that was stored at copy_up time.  If a found lower
>  directory does not match the stored origin, that directory will not be
>  merged with the upper directory.
>
> +
> +NFS export
> +----------
> +
> +When the underlying filesystems supports NFS export and the "verify"
> +feature is enabled, an overlay filesystem may be exported to NFS.
> +
> +With the "verify" feature, on copy_up of any lower object, an index
> +entry is created under the index directory.  The index entry name is the
> +hexadecimal representation of the copy up origin file handle.  For a
> +non-directory object, the index entry is a hard link to the upper inode.
> +For a directory object, the index entry has an extended attribute
> +"trusted.overlay.origin" with an encoded file handle of the upper
> +directory inode.
> +
> +When encoding a file handle from an overlay filesystem object, the
> +following rules apply:
> +
> +1. For a non-upper object, encode a lower file handle from lower inode
> +2. For an indexed object, encode a lower file handle from copy_up origin
> +3. For a pure-upper object and for an existing non-indexed upper object,
> +   encode an upper file handle from upper inode
> +
> +Encoding of a non-upper directory object is not supported when overlay
> +filesystem has multiple lower layers.  In this case, the directory will
> +be copied up first, and then encoded as an upper file handle.

Why?

What's the difference from encoding the uppermost lower layer directory?

> +
> +The encoded overlay file handle includes:
> + - Header including path type information (e.g. lower/upper)
> + - UUID of the underlying filesystem
> + - Underlying filesystem encoding of underlying inode
> +
> +This encoding is identical to the encoding of copy_up origin stored in
> +"trusted.overlay.origin".
> +
> +When decoding an overlay file handle, the following steps are followed:
> +
> +1. Find underlying layer by UUID and path type information.
> +2. Decode the underlying filesystem file handle to underlying dentry.
> +3. For a lower file handle, lookup the handle in index directory by name.
> +4. If a whiteout is found in index, return ESTALE. This represents an
> +   overlay object that was deleted after its file handle was encoded.
> +5. For a non-directory, instantiate a disconnected overlay dentry from the
> +   decoded underlying dentry, the path type and index inode, if found.
> +6. For a directory, use the connected underlying decoded dentry, path type
> +   and index, to lookup a connected overlay dentry.
> +
> +The "verify" feature ensures, that a decoded overlay directory object will
> +be equivalent to the object that was used to encode the file handle.
> +

What's equivalent?  What are the guarantees needed by NFS server?  It
doesn't verify object version, so modification is OK.

Does swapping out lower dirs count as modification or does it count as
new object?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/17] ovl: document NFS export
  2018-01-11 16:06   ` Miklos Szeredi
@ 2018-01-11 16:26     ` Amir Goldstein
  2018-01-12 15:43       ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-11 16:26 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 11, 2018 at 6:06 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  Documentation/filesystems/overlayfs.txt | 59 +++++++++++++++++++++++++++++++++
>>  1 file changed, 59 insertions(+)
>>
>> diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
>> index 00e0595f3d7e..9e21c14c914c 100644
>> --- a/Documentation/filesystems/overlayfs.txt
>> +++ b/Documentation/filesystems/overlayfs.txt
>> @@ -315,6 +315,65 @@ origin file handle that was stored at copy_up time.  If a found lower
>>  directory does not match the stored origin, that directory will not be
>>  merged with the upper directory.
>>
>> +
>> +NFS export
>> +----------
>> +
>> +When the underlying filesystems supports NFS export and the "verify"
>> +feature is enabled, an overlay filesystem may be exported to NFS.
>> +
>> +With the "verify" feature, on copy_up of any lower object, an index
>> +entry is created under the index directory.  The index entry name is the
>> +hexadecimal representation of the copy up origin file handle.  For a
>> +non-directory object, the index entry is a hard link to the upper inode.
>> +For a directory object, the index entry has an extended attribute
>> +"trusted.overlay.origin" with an encoded file handle of the upper
>> +directory inode.
>> +
>> +When encoding a file handle from an overlay filesystem object, the
>> +following rules apply:
>> +
>> +1. For a non-upper object, encode a lower file handle from lower inode
>> +2. For an indexed object, encode a lower file handle from copy_up origin
>> +3. For a pure-upper object and for an existing non-indexed upper object,
>> +   encode an upper file handle from upper inode
>> +
>> +Encoding of a non-upper directory object is not supported when overlay
>> +filesystem has multiple lower layers.  In this case, the directory will
>> +be copied up first, and then encoded as an upper file handle.
>
> Why?
>
> What's the difference from encoding the uppermost lower layer directory?

Sigh... hard to document... here goes an attempt.
Let me know if it works:

When decoding an upper dir, the decoded upper path is the same path as
the overlay path, so we lookup same path in overlay.

When decoding a lower dir from layer 1, every ancestor is either still lower
(and therefore not renamed) or been copied up and indexed by lower inode,
so we can use index to know the path of every ancestor in overlay (or if it
has been removed).

When decoding a lower dir from layer 2, there may be an ancestor in layer 2
covered by whiteout in layer 1 and redirected from another directory in layer 1.
In that case, we have no information in index to reconstruct the overlay path
from the connected layer 2 directory, hence, we cannot decode a connected
overlay directory from dir file handle encoded from layer 2.

Copy up on encode mitigates this problem, because it hops over the non
indexed redirects.

BTW, same thing could happen with dir file handle from layer 1 when exporting
an overlay that has existing non-indexed merge dirs.

>
>> +
>> +The encoded overlay file handle includes:
>> + - Header including path type information (e.g. lower/upper)
>> + - UUID of the underlying filesystem
>> + - Underlying filesystem encoding of underlying inode
>> +
>> +This encoding is identical to the encoding of copy_up origin stored in
>> +"trusted.overlay.origin".
>> +
>> +When decoding an overlay file handle, the following steps are followed:
>> +
>> +1. Find underlying layer by UUID and path type information.
>> +2. Decode the underlying filesystem file handle to underlying dentry.
>> +3. For a lower file handle, lookup the handle in index directory by name.
>> +4. If a whiteout is found in index, return ESTALE. This represents an
>> +   overlay object that was deleted after its file handle was encoded.
>> +5. For a non-directory, instantiate a disconnected overlay dentry from the
>> +   decoded underlying dentry, the path type and index inode, if found.
>> +6. For a directory, use the connected underlying decoded dentry, path type
>> +   and index, to lookup a connected overlay dentry.
>> +
>> +The "verify" feature ensures, that a decoded overlay directory object will
>> +be equivalent to the object that was used to encode the file handle.
>> +
>
> What's equivalent?  What are the guarantees needed by NFS server?  It
> doesn't verify object version, so modification is OK.
>
> Does swapping out lower dirs count as modification or does it count as
> new object?
>

To be honest, I don't know what I was trying to say.
In the updated version of patches and documentation I just pushed to
https://github.com/amir73il/linux/commits/ovl-nfs-export
this obscure sentence is gone.

It there anything else that needs clarification?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/17] ovl: document NFS export
  2018-01-11 16:26     ` Amir Goldstein
@ 2018-01-12 15:43       ` Miklos Szeredi
  2018-01-12 15:49         ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-12 15:43 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 11, 2018 at 5:26 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Jan 11, 2018 at 6:06 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>>  Documentation/filesystems/overlayfs.txt | 59 +++++++++++++++++++++++++++++++++
>>>  1 file changed, 59 insertions(+)
>>>
>>> diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
>>> index 00e0595f3d7e..9e21c14c914c 100644
>>> --- a/Documentation/filesystems/overlayfs.txt
>>> +++ b/Documentation/filesystems/overlayfs.txt
>>> @@ -315,6 +315,65 @@ origin file handle that was stored at copy_up time.  If a found lower
>>>  directory does not match the stored origin, that directory will not be
>>>  merged with the upper directory.
>>>
>>> +
>>> +NFS export
>>> +----------
>>> +
>>> +When the underlying filesystems supports NFS export and the "verify"
>>> +feature is enabled, an overlay filesystem may be exported to NFS.
>>> +
>>> +With the "verify" feature, on copy_up of any lower object, an index
>>> +entry is created under the index directory.  The index entry name is the
>>> +hexadecimal representation of the copy up origin file handle.  For a
>>> +non-directory object, the index entry is a hard link to the upper inode.
>>> +For a directory object, the index entry has an extended attribute
>>> +"trusted.overlay.origin" with an encoded file handle of the upper
>>> +directory inode.
>>> +
>>> +When encoding a file handle from an overlay filesystem object, the
>>> +following rules apply:
>>> +
>>> +1. For a non-upper object, encode a lower file handle from lower inode
>>> +2. For an indexed object, encode a lower file handle from copy_up origin
>>> +3. For a pure-upper object and for an existing non-indexed upper object,
>>> +   encode an upper file handle from upper inode
>>> +
>>> +Encoding of a non-upper directory object is not supported when overlay
>>> +filesystem has multiple lower layers.  In this case, the directory will
>>> +be copied up first, and then encoded as an upper file handle.
>>
>> Why?
>>
>> What's the difference from encoding the uppermost lower layer directory?
>
> Sigh... hard to document... here goes an attempt.
> Let me know if it works:
>
> When decoding an upper dir, the decoded upper path is the same path as
> the overlay path, so we lookup same path in overlay.
>
> When decoding a lower dir from layer 1, every ancestor is either still lower
> (and therefore not renamed) or been copied up and indexed by lower inode,
> so we can use index to know the path of every ancestor in overlay (or if it
> has been removed).
>
> When decoding a lower dir from layer 2, there may be an ancestor in layer 2
> covered by whiteout in layer 1 and redirected from another directory in layer 1.
> In that case, we have no information in index to reconstruct the overlay path
> from the connected layer 2 directory, hence, we cannot decode a connected
> overlay directory from dir file handle encoded from layer 2.

Now I understand: we are missing the back pointer from layer2 to
layer1 that the index provides us when going from lower to upper.

However, this is only needed if we end up below a redirecting layer.
So we could limit copy-up to these cases.  It doesn't seem hard to
keep track of highest layer that had a redirect in each overlay
dentry, and when ending up on a layer below that, mark the overlay
dentry COPY_UP_FOR_ENCODE.  This information is constant, since lower
layers are immutable, so no worries there.  Can postpone this to a
later version, but the takeaway is that we need to mark the fh to
indicate if it's a merge upper or not.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/17] ovl: document NFS export
  2018-01-12 15:43       ` Miklos Szeredi
@ 2018-01-12 15:49         ` Miklos Szeredi
  2018-01-12 18:50           ` Amir Goldstein
  2018-01-13  8:54           ` Amir Goldstein
  0 siblings, 2 replies; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-12 15:49 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Fri, Jan 12, 2018 at 4:43 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jan 11, 2018 at 5:26 PM, Amir Goldstein <amir73il@gmail.com> wrote:

>> When decoding an upper dir, the decoded upper path is the same path as
>> the overlay path, so we lookup same path in overlay.
>>
>> When decoding a lower dir from layer 1, every ancestor is either still lower
>> (and therefore not renamed) or been copied up and indexed by lower inode,
>> so we can use index to know the path of every ancestor in overlay (or if it
>> has been removed).
>>
>> When decoding a lower dir from layer 2, there may be an ancestor in layer 2
>> covered by whiteout in layer 1 and redirected from another directory in layer 1.
>> In that case, we have no information in index to reconstruct the overlay path
>> from the connected layer 2 directory, hence, we cannot decode a connected
>> overlay directory from dir file handle encoded from layer 2.
>
> Now I understand: we are missing the back pointer from layer2 to
> layer1 that the index provides us when going from lower to upper.
>
> However, this is only needed if we end up below a redirecting layer.
> So we could limit copy-up to these cases.  It doesn't seem hard to
> keep track of highest layer that had a redirect in each overlay
> dentry, and when ending up on a layer below that, mark the overlay
> dentry COPY_UP_FOR_ENCODE.  This information is constant, since lower
> layers are immutable, so no worries there.  Can postpone this to a
> later version, but the takeaway is that we need to mark the fh to
> indicate if it's a merge upper or not.

And BTW, we need to copy up only the directory that has the redirect,
since that's where we are missing the mapping in the lower layers.
Below that in the tree, we are fine, until we come across another
redirect, and so on...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/17] ovl: document NFS export
  2018-01-12 15:49         ` Miklos Szeredi
@ 2018-01-12 18:50           ` Amir Goldstein
  2018-01-13  8:54           ` Amir Goldstein
  1 sibling, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-12 18:50 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Fri, Jan 12, 2018 at 5:49 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Jan 12, 2018 at 4:43 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Jan 11, 2018 at 5:26 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>>> When decoding an upper dir, the decoded upper path is the same path as
>>> the overlay path, so we lookup same path in overlay.
>>>
>>> When decoding a lower dir from layer 1, every ancestor is either still lower
>>> (and therefore not renamed) or been copied up and indexed by lower inode,
>>> so we can use index to know the path of every ancestor in overlay (or if it
>>> has been removed).
>>>
>>> When decoding a lower dir from layer 2, there may be an ancestor in layer 2
>>> covered by whiteout in layer 1 and redirected from another directory in layer 1.
>>> In that case, we have no information in index to reconstruct the overlay path
>>> from the connected layer 2 directory, hence, we cannot decode a connected
>>> overlay directory from dir file handle encoded from layer 2.
>>
>> Now I understand: we are missing the back pointer from layer2 to
>> layer1 that the index provides us when going from lower to upper.
>>
>> However, this is only needed if we end up below a redirecting layer.
>> So we could limit copy-up to these cases.  It doesn't seem hard to
>> keep track of highest layer that had a redirect in each overlay
>> dentry, and when ending up on a layer below that, mark the overlay
>> dentry COPY_UP_FOR_ENCODE.  This information is constant, since lower
>> layers are immutable, so no worries there.

Right.

>> Can postpone this to a
>> later version, but the takeaway is that we need to mark the fh to
>> indicate if it's a merge upper or not.
>

This I did not get.
The fh is marked upper or not.
If it is upper, we get the real upper path and lookup that path in overlay.
Whether upper is merge or not, overlay lookup will find out.

What am I missing?


> And BTW, we need to copy up only the directory that has the redirect,
> since that's where we are missing the mapping in the lower layers.
> Below that in the tree, we are fine, until we come across another
> redirect, and so on...
>

So actually, I can use OVL_RENAMED flag from patch 8/23
and implement ovl_copy_up_renamed_parent() on encode
This will actually also cover the case of dir in layer1 that has a
non-indexed redirected upper parent.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/17] ovl: document NFS export
  2018-01-12 15:49         ` Miklos Szeredi
  2018-01-12 18:50           ` Amir Goldstein
@ 2018-01-13  8:54           ` Amir Goldstein
  1 sibling, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-13  8:54 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Fri, Jan 12, 2018 at 5:49 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Jan 12, 2018 at 4:43 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Jan 11, 2018 at 5:26 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>>> When decoding an upper dir, the decoded upper path is the same path as
>>> the overlay path, so we lookup same path in overlay.
>>>
>>> When decoding a lower dir from layer 1, every ancestor is either still lower
>>> (and therefore not renamed) or been copied up and indexed by lower inode,
>>> so we can use index to know the path of every ancestor in overlay (or if it
>>> has been removed).
>>>
>>> When decoding a lower dir from layer 2, there may be an ancestor in layer 2
>>> covered by whiteout in layer 1 and redirected from another directory in layer 1.
>>> In that case, we have no information in index to reconstruct the overlay path
>>> from the connected layer 2 directory, hence, we cannot decode a connected
>>> overlay directory from dir file handle encoded from layer 2.
>>
>> Now I understand: we are missing the back pointer from layer2 to
>> layer1 that the index provides us when going from lower to upper.
>>
>> However, this is only needed if we end up below a redirecting layer.
>> So we could limit copy-up to these cases.  It doesn't seem hard to
>> keep track of highest layer that had a redirect in each overlay
>> dentry, and when ending up on a layer below that, mark the overlay
>> dentry COPY_UP_FOR_ENCODE.  This information is constant, since lower
>> layers are immutable, so no worries there.  Can postpone this to a
>> later version, but the takeaway is that we need to mark the fh to
>> indicate if it's a merge upper or not.
>

I did not understand what you mean by marking to fh merge upper or not.
In any case, the idea is to mark the dentries like these as ENCODE_UPPER,
then not only they will be copied up on encode, but also *always* encoded as
upper for consistency.

> And BTW, we need to copy up only the directory that has the redirect,
> since that's where we are missing the mapping in the lower layers.
> Below that in the tree, we are fine, until we come across another
> redirect, and so on...
>

I think that is a somewhat simplified description of the situation.
Things can be more complicated, for example:

layer1: /a/b (/a has redirect to 'A')
layer2: /A/b/c/d
layer3: /A/b/c/d/e/f

When decoding the lower path /A/b/c/d, copy up of /a will index by
layer1 dir /a and doesn't help with backward redirect from layer2 dir /A.

Copy up of layer1 /a/b doesn't help either.

We must find the ancestor of /A/b/c/d which is an 'uppermost lower',
which is /A/b/c, and copy up/index that ancestor.

So I *think* we need to store in lookup per dentry:
- reconnect_layer_idx:
The highest layer with a non-indexed redirect (can be upper layer
in case of a non-indexed upper merge dir) among all ancestors.
If we encode a file handle from dir in reconnect_layer or above,
we can decode it and use decoded path to reconnect overlay dentry.
- OVL_ENCODE_UPPER
This is determined by combination of reconnect_layer_idx, the
uppermost lower layer of self and uppermost lower layer of parent.
I *think* the condition is:
lowerpath[0]->layer->idx > parent->lowerpath[0]->layer->idx &&
lowerpath[0]->layer->idx > reconnect_layer_idx

In the example above, dentries /a/b/c and /a/b/c/d/e are marked
OVL_ENCODE_UPPER. /a/b/c should be copied up when encoding
/a/b/c/d and /a/b/c/d/e should be copied up when encoding /a/b/c/d/e/f.
In reality, I assume nfsd always encodes /a/b/c on lookup of
/a/b/c/d before encoding /a/b/c/d, but to be on the safe side, we
need to take care of copy up of OVL_ENCODE_UPPER ancestor.


This is my re-take on Documentation of this wrinkle:

When overlay filesystem has multiple lower layers, a middle layer
directory may have a "redirect" to lower directory.  Because middle layer
"redirects" are not indexed, a lower file handle that was encoded from the
"redirect" origin directory, cannot be used to find the middle or upper
layer directory.  Similarly, a lower file handle that was encoded from a
descendant of the "redirect" origin directory, cannot be used to
reconstruct a connected overlay path.  To mitigate the cases of
directories that cannot be decoded from a lower file handle, these
directories are copied up on encode and encoded as an upper file handle.

Let me know what you think.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-04 17:20 ` [PATCH v2 04/17] ovl: decode connected upper dir " Amir Goldstein
  2018-01-05 12:33   ` Amir Goldstein
@ 2018-01-15 11:33   ` Miklos Szeredi
  2018-01-15 12:20     ` Amir Goldstein
  1 sibling, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-15 11:33 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Until this change, we decoded upper file handles by instantiating an
> overlay dentry from the real upper dentry. This is sufficient to handle
> pure upper files, but insufficient to handle merge/impure dirs.
>
> To that end, if decoded real upper dir is connected and hashed, we
> lookup an overlay dentry with the same path as the real upper dir.
> If decoded real upper is non-dir, we instantiate a disconnected overlay
> dentry as before this change.
>
> Because ovl_fh_to_dentry() returns connected overlay dir dentries,
> exportfs never need to call get_parent() and get_name() to reconnect an
> upper overlay dir. Because connectable non-dir file handles are not
> supported, exportfs will not be able to use fh_to_parent() and get_name()
> methods to reconnect a disconnected non-dir to its parent. Therefore, the
> methods get_parent() and get_name() are implemented just to print out a
> sanity warning and the method fh_to_parent() is implemented to warn the
> user that using the 'subtree_check' exportfs option is not supported.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/export.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 171 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 5c72784a0b4d..48ae02f3acb8 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -130,6 +130,145 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
>         return dentry;
>  }
>
> +/*
> + * Lookup a child overlay dentry whose real dentry is @real.
> + * If @is_upper is true then we lookup a child overlay dentry with the same
> + * name as the real dentry. Otherwise, we need to consult index for lookup.
> + */
> +static struct dentry *ovl_lookup_real_one(struct dentry *parent,
> +                                         struct dentry *real, bool is_upper)
> +{
> +       struct dentry *this;
> +       struct qstr *name = &real->d_name;
> +       int err;
> +
> +       /* TODO: use index when looking up by lower real dentry */
> +       if (!is_upper)
> +               return ERR_PTR(-EACCES);
> +
> +       /* Lookup overlay dentry by real name */
> +       this = lookup_one_len_unlocked(name->name, parent, name->len);
> +       err = PTR_ERR(this);
> +       if (IS_ERR(this)) {
> +               goto fail;
> +       } else if (!this || !this->d_inode) {
> +               dput(this);
> +               err = -ENOENT;
> +               goto fail;
> +       } else if (ovl_dentry_upper(this) != real) {
> +               dput(this);
> +               err = -ESTALE;
> +               goto fail;
> +       }
> +
> +       return this;
> +
> +fail:
> +       pr_warn_ratelimited("overlayfs: failed to lookup one by real (%pd2, is_upper=%d, parent=%pd2, err=%i)\n",
> +                           real, is_upper, parent, err);
> +       return ERR_PTR(err);
> +}
> +
> +/*
> + * Lookup an overlay dentry whose real dentry is @real.
> + * If @is_upper is true then we lookup an overlay dentry with the same path
> + * as the real dentry. Otherwise, we need to consult index for lookup.
> + */
> +static struct dentry *ovl_lookup_real(struct super_block *sb,
> +                                     struct dentry *real, bool is_upper)
> +{
> +       struct dentry *connected;
> +       int err = 0;
> +
> +       /* TODO: use index when looking up by lower real dentry */
> +       if (!is_upper)
> +               return ERR_PTR(-EACCES);
> +
> +       connected = dget(sb->s_root);
> +       while (!err) {
> +               struct dentry *next, *this;
> +               struct dentry *parent = NULL;
> +               struct dentry *real_connected = ovl_dentry_upper(connected);
> +
> +               if (real_connected == real)
> +                       break;
> +
> +               next = dget(real);
> +               /* find the topmost dentry not yet connected */
> +               for (;;) {
> +                       parent = dget_parent(next);
> +
> +                       if (real_connected == parent)
> +                               break;
> +
> +                       /*
> +                        * If real file has been moved out of the layer root
> +                        * directory, we will eventully hit the real fs root.
> +                        */
> +                       if (parent == next) {
> +                               err = -EXDEV;
> +                               break;
> +                       }

This seems to assume no cross directory renames of directories in the
ancestry of "real", but AFAICS nothing prevents that.

Also why not use the inode cache to find already connected dirs?
Seems more efficient, than always going up to the root and going down
from there.

So, a working algorithm would be going up to the first connected
parent or root, lock parent, lookup name and restart.  Not guaranteed
to finish, since not protected against always racing with renames.
Can we take s_vfs_rename_sem on ovl to prevent that?

> +
> +                       dput(next);
> +                       next = parent;
> +               }
> +
> +               if (!err) {
> +                       this = ovl_lookup_real_one(connected, next, is_upper);
> +                       if (!IS_ERR(this)) {
> +                               dput(connected);
> +                               connected = this;
> +                       } else {
> +                               err = PTR_ERR(this);
> +                       }
> +               }
> +
> +               dput(parent);
> +               dput(next);
> +       }
> +
> +       if (err)
> +               goto fail;
> +
> +       return connected;
> +
> +fail:
> +       pr_warn_ratelimited("overlayfs: failed to lookup by real (%pd2, is_upper=%d, connected=%pd2, err=%i)\n",
> +                           real, is_upper, connected, err);
> +       dput(connected);
> +       return ERR_PTR(err);
> +}
> +
> +/*
> + * Get an overlay dentry from upper/lower real dentries.
> + */
> +static struct dentry *ovl_get_dentry(struct super_block *sb,
> +                                    struct dentry *upper,
> +                                    struct ovl_path *lowerpath)
> +{
> +       /* TODO: get non-upper dentry */
> +       if (!upper)
> +               return ERR_PTR(-EACCES);
> +
> +       /*
> +        * Obtain a disconnected overlay dentry from a non-dir real upper
> +        * dentry.
> +        */
> +       if (!d_is_dir(upper))
> +               return ovl_obtain_alias(sb, upper, NULL);
> +
> +       /* Removed empty directory? */
> +       if ((upper->d_flags & DCACHE_DISCONNECTED) || d_unhashed(upper))
> +               return ERR_PTR(-ENOENT);
> +
> +       /*
> +        * If real upper dentry is connected and hashed, get a connected
> +        * overlay dentry with the same path as the real upper dentry.
> +        */
> +       return ovl_lookup_real(sb, upper, true);
> +}
> +
>  static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
>                                         struct ovl_fh *fh)
>  {
> @@ -144,7 +283,7 @@ static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
>         if (IS_ERR_OR_NULL(upper))
>                 return upper;
>
> -       dentry = ovl_obtain_alias(sb, upper, NULL);
> +       dentry = ovl_get_dentry(sb, upper, NULL);
>         dput(upper);
>
>         return dentry;
> @@ -183,7 +322,38 @@ static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
>         return ERR_PTR(err);
>  }
>
> +static struct dentry *ovl_fh_to_parent(struct super_block *sb, struct fid *fid,
> +                                      int fh_len, int fh_type)
> +{
> +       pr_warn_ratelimited("overlayfs: connectable file handles not supported; use 'no_subtree_check' exportfs option.\n");
> +       return ERR_PTR(-EACCES);
> +}
> +
> +static int ovl_get_name(struct dentry *parent, char *name,
> +                       struct dentry *child)
> +{
> +       /*
> +        * ovl_fh_to_dentry() returns connected dir overlay dentries and
> +        * ovl_fh_to_parent() is not implemented, so we should not get here.
> +        */
> +       WARN_ON_ONCE(1);
> +       return -EIO;
> +}
> +
> +static struct dentry *ovl_get_parent(struct dentry *dentry)
> +{
> +       /*
> +        * ovl_fh_to_dentry() returns connected dir overlay dentries, so we
> +        * should not get here.
> +        */
> +       WARN_ON_ONCE(1);
> +       return ERR_PTR(-EIO);
> +}
> +
>  const struct export_operations ovl_export_operations = {
>         .encode_fh      = ovl_encode_inode_fh,
>         .fh_to_dentry   = ovl_fh_to_dentry,
> +       .fh_to_parent   = ovl_fh_to_parent,
> +       .get_name       = ovl_get_name,
> +       .get_parent     = ovl_get_parent,
>  };
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-05 12:33   ` Amir Goldstein
  2018-01-05 15:18     ` J . Bruce Fields
@ 2018-01-15 11:41     ` Miklos Szeredi
  1 sibling, 0 replies; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-15 11:41 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Fri, Jan 5, 2018 at 1:33 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Jan 4, 2018 at 7:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Until this change, we decoded upper file handles by instantiating an
>> overlay dentry from the real upper dentry. This is sufficient to handle
>> pure upper files, but insufficient to handle merge/impure dirs.
>>
>> To that end, if decoded real upper dir is connected and hashed, we
>> lookup an overlay dentry with the same path as the real upper dir.
>> If decoded real upper is non-dir, we instantiate a disconnected overlay
>> dentry as before this change.
>>
>> Because ovl_fh_to_dentry() returns connected overlay dir dentries,
>> exportfs never need to call get_parent() and get_name() to reconnect an
>> upper overlay dir. Because connectable non-dir file handles are not
>> supported, exportfs will not be able to use fh_to_parent() and get_name()
>> methods to reconnect a disconnected non-dir to its parent. Therefore, the
>> methods get_parent() and get_name() are implemented just to print out a
>> sanity warning and the method fh_to_parent() is implemented to warn the
>> user that using the 'subtree_check' exportfs option is not supported.
>>
>
> Reviewers who will get this far, should have their eyebrows slightly raised
> after reading this commit message and should be asking themselves:
>
> "Why not return a disconnected overlay dentry like any other fs and implement
> ovl_get_parent()/ovl_get_name() by looking at parent/name of upper dir?"
>
> I have had this debate with myself for a while and experimented a bit with
> both approaches and in the end, I liked the "return connected dentry" result
> better. I did not want to write this entire story in commit message, because
> in the end, there is nothing incorrect about the choice of either implementation
> there are only pros and cons to each choice.
>
> At the moment, the only argument I can think of to counter the chosen approach
> is that it adds ~100 lines on code in ovl_lookup_real() and
> ovl_lookup_real_one()
> helpers that could have been avoided by using the common reconnect_path()
> code in fs/exportfs/expfs.c.

And also not having to deal with rename races would be good.  And the
way to do it is the same way as ovl_get_redirect(), except now we are
walking the upper layer instead of the overlay layer.

Not sure which approach is better.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 05/17] ovl: encode non-indexed upper file handles
  2018-01-04 17:20 ` [PATCH v2 05/17] ovl: encode non-indexed upper " Amir Goldstein
@ 2018-01-15 11:58   ` Miklos Szeredi
  2018-01-15 12:07     ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-15 11:58 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> We only need to encode origin if there is a chance that the same object was
> encoded pre copy up and then we need to stay consistent with the same
> encoding also after copy up.
>
> In case a non-pure upper is not indexed, then it was copied up before NFS
> export support was enabled. In that case, we don't need to worry about
> staying consistent with pre copy up encoding and we encode an upper file
> handle.
>
> This mitigates the problem that with no index, we cannot find an upper
> inode from origin inode, so we cannot decode a non-indexed upper from
> origin file handle.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/export.c | 28 +++++++++++++++++++++++-----
>  1 file changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 48ae02f3acb8..919d43aaa387 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -19,6 +19,28 @@
>  #include <linux/ratelimit.h>
>  #include "overlayfs.h"
>
> +/*
> + * We only need to encode origin if there is a chance that the same object was
> + * encoded pre copy up and then we need to stay consistent with the same
> + * encoding also after copy up. If non-pure upper is not indexed, then it was
> + * copied up before NFS export was enabled. In that case we don't need to worry
> + * about staying consistent with pre copy up encoding and we encode an upper
> + * file handle.
> + */
> +static bool ovl_should_encode_origin(struct dentry *dentry)
> +{
> +       /* Root dentry was born upper */
> +       if (dentry == dentry->d_sb->s_root)
> +               return false;

Root can be lower, no (when there's no upper layer at all)?  The
comment is confusing at best.

> +
> +       /* Decoding a non-indexed upper from origin is not implemented */
> +       if (ovl_dentry_upper(dentry) &&
> +           !ovl_test_flag(OVL_INDEX, d_inode(dentry)))
> +               return false;
> +
> +       return true;
> +}
> +
>  int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
>  {
>         struct dentry *upper = ovl_dentry_upper(dentry);
> @@ -26,11 +48,7 @@ int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
>         struct ovl_fh *fh = NULL;
>         int err;
>
> -       /*
> -        * Overlay root dir inode is encoded as an upper file handle upper,
> -        * because root dir dentry is born upper and not indexed.
> -        */
> -       if (dentry == dentry->d_sb->s_root)
> +       if (!ovl_should_encode_origin(dentry))
>                 origin = NULL;
>
>         err = -EACCES;
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 05/17] ovl: encode non-indexed upper file handles
  2018-01-15 11:58   ` Miklos Szeredi
@ 2018-01-15 12:07     ` Amir Goldstein
  0 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-15 12:07 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Mon, Jan 15, 2018 at 1:58 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> We only need to encode origin if there is a chance that the same object was
>> encoded pre copy up and then we need to stay consistent with the same
>> encoding also after copy up.
>>
>> In case a non-pure upper is not indexed, then it was copied up before NFS
>> export support was enabled. In that case, we don't need to worry about
>> staying consistent with pre copy up encoding and we encode an upper file
>> handle.
>>
>> This mitigates the problem that with no index, we cannot find an upper
>> inode from origin inode, so we cannot decode a non-indexed upper from
>> origin file handle.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/export.c | 28 +++++++++++++++++++++++-----
>>  1 file changed, 23 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>> index 48ae02f3acb8..919d43aaa387 100644
>> --- a/fs/overlayfs/export.c
>> +++ b/fs/overlayfs/export.c
>> @@ -19,6 +19,28 @@
>>  #include <linux/ratelimit.h>
>>  #include "overlayfs.h"
>>
>> +/*
>> + * We only need to encode origin if there is a chance that the same object was
>> + * encoded pre copy up and then we need to stay consistent with the same
>> + * encoding also after copy up. If non-pure upper is not indexed, then it was
>> + * copied up before NFS export was enabled. In that case we don't need to worry
>> + * about staying consistent with pre copy up encoding and we encode an upper
>> + * file handle.
>> + */
>> +static bool ovl_should_encode_origin(struct dentry *dentry)
>> +{
>> +       /* Root dentry was born upper */
>> +       if (dentry == dentry->d_sb->s_root)
>> +               return false;
>
> Root can be lower, no (when there's no upper layer at all)?  The
> comment is confusing at best.
>

Probably more than "just confusing" as I did not test NFS export
with no-upper overlayfs case.
At best encode will fail.

I will test this setup.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-15 11:33   ` Miklos Szeredi
@ 2018-01-15 12:20     ` Amir Goldstein
  2018-01-15 14:56       ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-15 12:20 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Mon, Jan 15, 2018 at 1:33 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Until this change, we decoded upper file handles by instantiating an
>> overlay dentry from the real upper dentry. This is sufficient to handle
>> pure upper files, but insufficient to handle merge/impure dirs.
>>
>> To that end, if decoded real upper dir is connected and hashed, we
>> lookup an overlay dentry with the same path as the real upper dir.
>> If decoded real upper is non-dir, we instantiate a disconnected overlay
>> dentry as before this change.
>>
>> Because ovl_fh_to_dentry() returns connected overlay dir dentries,
>> exportfs never need to call get_parent() and get_name() to reconnect an
>> upper overlay dir. Because connectable non-dir file handles are not
>> supported, exportfs will not be able to use fh_to_parent() and get_name()
>> methods to reconnect a disconnected non-dir to its parent. Therefore, the
>> methods get_parent() and get_name() are implemented just to print out a
>> sanity warning and the method fh_to_parent() is implemented to warn the
>> user that using the 'subtree_check' exportfs option is not supported.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/export.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 171 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>> index 5c72784a0b4d..48ae02f3acb8 100644
>> --- a/fs/overlayfs/export.c
>> +++ b/fs/overlayfs/export.c
>> @@ -130,6 +130,145 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
>>         return dentry;
>>  }
>>
>> +/*
>> + * Lookup a child overlay dentry whose real dentry is @real.
>> + * If @is_upper is true then we lookup a child overlay dentry with the same
>> + * name as the real dentry. Otherwise, we need to consult index for lookup.
>> + */
>> +static struct dentry *ovl_lookup_real_one(struct dentry *parent,
>> +                                         struct dentry *real, bool is_upper)
>> +{
>> +       struct dentry *this;
>> +       struct qstr *name = &real->d_name;
>> +       int err;
>> +
>> +       /* TODO: use index when looking up by lower real dentry */
>> +       if (!is_upper)
>> +               return ERR_PTR(-EACCES);
>> +
>> +       /* Lookup overlay dentry by real name */
>> +       this = lookup_one_len_unlocked(name->name, parent, name->len);
>> +       err = PTR_ERR(this);
>> +       if (IS_ERR(this)) {
>> +               goto fail;
>> +       } else if (!this || !this->d_inode) {
>> +               dput(this);
>> +               err = -ENOENT;
>> +               goto fail;
>> +       } else if (ovl_dentry_upper(this) != real) {
>> +               dput(this);
>> +               err = -ESTALE;
>> +               goto fail;
>> +       }
>> +
>> +       return this;
>> +
>> +fail:
>> +       pr_warn_ratelimited("overlayfs: failed to lookup one by real (%pd2, is_upper=%d, parent=%pd2, err=%i)\n",
>> +                           real, is_upper, parent, err);
>> +       return ERR_PTR(err);
>> +}
>> +
>> +/*
>> + * Lookup an overlay dentry whose real dentry is @real.
>> + * If @is_upper is true then we lookup an overlay dentry with the same path
>> + * as the real dentry. Otherwise, we need to consult index for lookup.
>> + */
>> +static struct dentry *ovl_lookup_real(struct super_block *sb,
>> +                                     struct dentry *real, bool is_upper)
>> +{
>> +       struct dentry *connected;
>> +       int err = 0;
>> +
>> +       /* TODO: use index when looking up by lower real dentry */
>> +       if (!is_upper)
>> +               return ERR_PTR(-EACCES);
>> +
>> +       connected = dget(sb->s_root);
>> +       while (!err) {
>> +               struct dentry *next, *this;
>> +               struct dentry *parent = NULL;
>> +               struct dentry *real_connected = ovl_dentry_upper(connected);
>> +
>> +               if (real_connected == real)
>> +                       break;
>> +
>> +               next = dget(real);
>> +               /* find the topmost dentry not yet connected */
>> +               for (;;) {
>> +                       parent = dget_parent(next);
>> +
>> +                       if (real_connected == parent)
>> +                               break;
>> +
>> +                       /*
>> +                        * If real file has been moved out of the layer root
>> +                        * directory, we will eventully hit the real fs root.
>> +                        */
>> +                       if (parent == next) {
>> +                               err = -EXDEV;
>> +                               break;
>> +                       }
>
> This seems to assume no cross directory renames of directories in the
> ancestry of "real", but AFAICS nothing prevents that.

Do you mean online modification of underlying fs? or rename in overlay?
For online modification fo underlying fs, I don't a reason to make it work.
-ESTALE would be a perfectly valid result in that case.

>
> Also why not use the inode cache to find already connected dirs?
> Seems more efficient, than always going up to the root and going down
> from there.

See patch [14/17] ovl: lookup connected ancestor of dir in inode cache
Sorry for ordering patches like this, it was more convenient to implement
the cold cache algorithm and then add hot cache into the mix.

>
> So, a working algorithm would be going up to the first connected
> parent or root, lock parent, lookup name and restart.  Not guaranteed
> to finish, since not protected against always racing with renames.
> Can we take s_vfs_rename_sem on ovl to prevent that?
>

Sounds like a simple and good enough solution.
Do we really need the locking of parent and restart connect if
we take s_vfs_rename_sem around ovl_lookup_real()?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-15 12:20     ` Amir Goldstein
@ 2018-01-15 14:56       ` Miklos Szeredi
  2018-01-17 11:18         ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-15 14:56 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Mon, Jan 15, 2018 at 1:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Jan 15, 2018 at 1:33 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Until this change, we decoded upper file handles by instantiating an
>>> overlay dentry from the real upper dentry. This is sufficient to handle
>>> pure upper files, but insufficient to handle merge/impure dirs.
>>>
>>> To that end, if decoded real upper dir is connected and hashed, we
>>> lookup an overlay dentry with the same path as the real upper dir.
>>> If decoded real upper is non-dir, we instantiate a disconnected overlay
>>> dentry as before this change.
>>>
>>> Because ovl_fh_to_dentry() returns connected overlay dir dentries,
>>> exportfs never need to call get_parent() and get_name() to reconnect an
>>> upper overlay dir. Because connectable non-dir file handles are not
>>> supported, exportfs will not be able to use fh_to_parent() and get_name()
>>> methods to reconnect a disconnected non-dir to its parent. Therefore, the
>>> methods get_parent() and get_name() are implemented just to print out a
>>> sanity warning and the method fh_to_parent() is implemented to warn the
>>> user that using the 'subtree_check' exportfs option is not supported.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>>  fs/overlayfs/export.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 171 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>> index 5c72784a0b4d..48ae02f3acb8 100644
>>> --- a/fs/overlayfs/export.c
>>> +++ b/fs/overlayfs/export.c
>>> @@ -130,6 +130,145 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
>>>         return dentry;
>>>  }
>>>
>>> +/*
>>> + * Lookup a child overlay dentry whose real dentry is @real.
>>> + * If @is_upper is true then we lookup a child overlay dentry with the same
>>> + * name as the real dentry. Otherwise, we need to consult index for lookup.
>>> + */
>>> +static struct dentry *ovl_lookup_real_one(struct dentry *parent,
>>> +                                         struct dentry *real, bool is_upper)
>>> +{
>>> +       struct dentry *this;
>>> +       struct qstr *name = &real->d_name;
>>> +       int err;
>>> +
>>> +       /* TODO: use index when looking up by lower real dentry */
>>> +       if (!is_upper)
>>> +               return ERR_PTR(-EACCES);
>>> +
>>> +       /* Lookup overlay dentry by real name */
>>> +       this = lookup_one_len_unlocked(name->name, parent, name->len);
>>> +       err = PTR_ERR(this);
>>> +       if (IS_ERR(this)) {
>>> +               goto fail;
>>> +       } else if (!this || !this->d_inode) {
>>> +               dput(this);
>>> +               err = -ENOENT;
>>> +               goto fail;
>>> +       } else if (ovl_dentry_upper(this) != real) {
>>> +               dput(this);
>>> +               err = -ESTALE;
>>> +               goto fail;
>>> +       }
>>> +
>>> +       return this;
>>> +
>>> +fail:
>>> +       pr_warn_ratelimited("overlayfs: failed to lookup one by real (%pd2, is_upper=%d, parent=%pd2, err=%i)\n",
>>> +                           real, is_upper, parent, err);
>>> +       return ERR_PTR(err);
>>> +}
>>> +
>>> +/*
>>> + * Lookup an overlay dentry whose real dentry is @real.
>>> + * If @is_upper is true then we lookup an overlay dentry with the same path
>>> + * as the real dentry. Otherwise, we need to consult index for lookup.
>>> + */
>>> +static struct dentry *ovl_lookup_real(struct super_block *sb,
>>> +                                     struct dentry *real, bool is_upper)
>>> +{
>>> +       struct dentry *connected;
>>> +       int err = 0;
>>> +
>>> +       /* TODO: use index when looking up by lower real dentry */
>>> +       if (!is_upper)
>>> +               return ERR_PTR(-EACCES);
>>> +
>>> +       connected = dget(sb->s_root);
>>> +       while (!err) {
>>> +               struct dentry *next, *this;
>>> +               struct dentry *parent = NULL;
>>> +               struct dentry *real_connected = ovl_dentry_upper(connected);
>>> +
>>> +               if (real_connected == real)
>>> +                       break;
>>> +
>>> +               next = dget(real);
>>> +               /* find the topmost dentry not yet connected */
>>> +               for (;;) {
>>> +                       parent = dget_parent(next);
>>> +
>>> +                       if (real_connected == parent)
>>> +                               break;
>>> +
>>> +                       /*
>>> +                        * If real file has been moved out of the layer root
>>> +                        * directory, we will eventully hit the real fs root.
>>> +                        */
>>> +                       if (parent == next) {
>>> +                               err = -EXDEV;
>>> +                               break;
>>> +                       }
>>
>> This seems to assume no cross directory renames of directories in the
>> ancestry of "real", but AFAICS nothing prevents that.
>
> Do you mean online modification of underlying fs? or rename in overlay?

Rename in overlay.

> For online modification fo underlying fs, I don't a reason to make it work.
> -ESTALE would be a perfectly valid result in that case.

Sure.

>>
>> Also why not use the inode cache to find already connected dirs?
>> Seems more efficient, than always going up to the root and going down
>> from there.
>
> See patch [14/17] ovl: lookup connected ancestor of dir in inode cache
> Sorry for ordering patches like this, it was more convenient to implement
> the cold cache algorithm and then add hot cache into the mix.

Okay.

>>
>> So, a working algorithm would be going up to the first connected
>> parent or root, lock parent, lookup name and restart.  Not guaranteed
>> to finish, since not protected against always racing with renames.
>> Can we take s_vfs_rename_sem on ovl to prevent that?
>>
>
> Sounds like a simple and good enough solution.
> Do we really need the locking of parent and restart connect if
> we take s_vfs_rename_sem around ovl_lookup_real()?

No, but s_vfs_rename_sem is a really heavyweight solution, we should
do better than that for decoding a file handle.

And we probably don't need anything else, since rename on ancestor
means renamed dir is connected, and hopefully not evicted from the
cache until we repeat the walk up.

So need to lock parent, lookup ovl dentry, verify we got the same
upper, if not retry icache lookup.

Not sure we need to worry about that "hopefully".  Hopefully not.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files
  2018-01-04 17:20 ` [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files Amir Goldstein
@ 2018-01-16  9:16   ` Miklos Szeredi
  2018-01-16  9:37     ` Amir Goldstein
  2018-01-18 14:18   ` Amir Goldstein
  1 sibling, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-16  9:16 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Lookup overlay inode in cache by origin inode, so we can decode a file
> handle of an open file even if the index has a whiteout index entry to
> mark this overlay inode was unlinked.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/export.c    | 22 ++++++++++++++++++++--
>  fs/overlayfs/inode.c     | 16 ++++++++++++++++
>  fs/overlayfs/overlayfs.h |  1 +
>  3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 602bada474ba..6ecb54d4b52c 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -385,13 +385,21 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
>         struct ovl_path *stack = &origin;
>         struct dentry *dentry = NULL;
>         struct dentry *index = NULL;
> +       struct inode *inode = NULL;
> +       bool is_deleted = false;
>         int err;
>
>         /* First lookup indexed upper by fh */

Why not first look up origin, then look up ovl inode by origin?  It
seems a faster path than going through the index first.  Obviously if
icache lookup fails then we need to look up index, but the common case
will the cached one, so that should be the fast one, no?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files
  2018-01-16  9:16   ` Miklos Szeredi
@ 2018-01-16  9:37     ` Amir Goldstein
  2018-01-16 10:10       ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-16  9:37 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Tue, Jan 16, 2018 at 11:16 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Lookup overlay inode in cache by origin inode, so we can decode a file
>> handle of an open file even if the index has a whiteout index entry to
>> mark this overlay inode was unlinked.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/export.c    | 22 ++++++++++++++++++++--
>>  fs/overlayfs/inode.c     | 16 ++++++++++++++++
>>  fs/overlayfs/overlayfs.h |  1 +
>>  3 files changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>> index 602bada474ba..6ecb54d4b52c 100644
>> --- a/fs/overlayfs/export.c
>> +++ b/fs/overlayfs/export.c
>> @@ -385,13 +385,21 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
>>         struct ovl_path *stack = &origin;
>>         struct dentry *dentry = NULL;
>>         struct dentry *index = NULL;
>> +       struct inode *inode = NULL;
>> +       bool is_deleted = false;
>>         int err;
>>
>>         /* First lookup indexed upper by fh */
>
> Why not first look up origin, then look up ovl inode by origin?  It
> seems a faster path than going through the index first.  Obviously if
> icache lookup fails then we need to look up index, but the common case
> will the cached one, so that should be the fast one, no?
>

Not really, because we do not know if the file handle is dir or non-dir.
If file handle is dir than decode of file handle is expensive and can
reduce worst case from two file handle decodes to just one:

For lower dir:
- one index lookup fails
- one lower dir decode
- one icache lookup
- maybe one ovl_lookup_real(is_upper=false)

For copied up indexed dir:
- one index lookup success
- one upper dir decode
- one ovl_lookup_real(is_upper=true)

That method avoids the origin dir decode for upper indexed
dir at the cost of not looking for the decoded dir in icache.

How about this as in idea: hash overlay inodes for NFS export
by origin fh instead of by origin inode pointer.

We can also avoid the "lookup index first" for non-dir
if we set a flag OVL_FH_FLAG_CONNECTABLE on exported
dir file handle, but my thinking was trying to keep the first version
simple with as fewer special cases as possible.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files
  2018-01-16  9:37     ` Amir Goldstein
@ 2018-01-16 10:10       ` Miklos Szeredi
  2018-01-16 10:40         ` Amir Goldstein
  2018-01-17 21:05         ` Amir Goldstein
  0 siblings, 2 replies; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-16 10:10 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Tue, Jan 16, 2018 at 10:37 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Jan 16, 2018 at 11:16 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Lookup overlay inode in cache by origin inode, so we can decode a file
>>> handle of an open file even if the index has a whiteout index entry to
>>> mark this overlay inode was unlinked.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>>  fs/overlayfs/export.c    | 22 ++++++++++++++++++++--
>>>  fs/overlayfs/inode.c     | 16 ++++++++++++++++
>>>  fs/overlayfs/overlayfs.h |  1 +
>>>  3 files changed, 37 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>> index 602bada474ba..6ecb54d4b52c 100644
>>> --- a/fs/overlayfs/export.c
>>> +++ b/fs/overlayfs/export.c
>>> @@ -385,13 +385,21 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
>>>         struct ovl_path *stack = &origin;
>>>         struct dentry *dentry = NULL;
>>>         struct dentry *index = NULL;
>>> +       struct inode *inode = NULL;
>>> +       bool is_deleted = false;
>>>         int err;
>>>
>>>         /* First lookup indexed upper by fh */
>>
>> Why not first look up origin, then look up ovl inode by origin?  It
>> seems a faster path than going through the index first.  Obviously if
>> icache lookup fails then we need to look up index, but the common case
>> will the cached one, so that should be the fast one, no?
>>
>
> Not really, because we do not know if the file handle is dir or non-dir.
> If file handle is dir than decode of file handle is expensive and can
> reduce worst case from two file handle decodes to just one:
>
> For lower dir:
> - one index lookup fails
> - one lower dir decode
> - one icache lookup
> - maybe one ovl_lookup_real(is_upper=false)
>
> For copied up indexed dir:
> - one index lookup success
> - one upper dir decode
> - one ovl_lookup_real(is_upper=true)
>
> That method avoids the origin dir decode for upper indexed
> dir at the cost of not looking for the decoded dir in icache.
>
> How about this as in idea: hash overlay inodes for NFS export
> by origin fh instead of by origin inode pointer.

Good idea.  That way we can leave out the middleman (underlying fh
decode) in the cached case.

> We can also avoid the "lookup index first" for non-dir
> if we set a flag OVL_FH_FLAG_CONNECTABLE on exported
> dir file handle, but my thinking was trying to keep the first version
> simple with as fewer special cases as possible.

Not sure I understand.  If cached lookup fails, then we do always need
to try and lookup index first before falling back to decoding origin,
right?

Thanks,
Miklos




>
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files
  2018-01-16 10:10       ` Miklos Szeredi
@ 2018-01-16 10:40         ` Amir Goldstein
  2018-01-16 11:07           ` Miklos Szeredi
  2018-01-17 21:05         ` Amir Goldstein
  1 sibling, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-16 10:40 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Tue, Jan 16, 2018 at 12:10 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Tue, Jan 16, 2018 at 10:37 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Tue, Jan 16, 2018 at 11:16 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> Lookup overlay inode in cache by origin inode, so we can decode a file
>>>> handle of an open file even if the index has a whiteout index entry to
>>>> mark this overlay inode was unlinked.
>>>>
>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>> ---
>>>>  fs/overlayfs/export.c    | 22 ++++++++++++++++++++--
>>>>  fs/overlayfs/inode.c     | 16 ++++++++++++++++
>>>>  fs/overlayfs/overlayfs.h |  1 +
>>>>  3 files changed, 37 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>>> index 602bada474ba..6ecb54d4b52c 100644
>>>> --- a/fs/overlayfs/export.c
>>>> +++ b/fs/overlayfs/export.c
>>>> @@ -385,13 +385,21 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
>>>>         struct ovl_path *stack = &origin;
>>>>         struct dentry *dentry = NULL;
>>>>         struct dentry *index = NULL;
>>>> +       struct inode *inode = NULL;
>>>> +       bool is_deleted = false;
>>>>         int err;
>>>>
>>>>         /* First lookup indexed upper by fh */
>>>
>>> Why not first look up origin, then look up ovl inode by origin?  It
>>> seems a faster path than going through the index first.  Obviously if
>>> icache lookup fails then we need to look up index, but the common case
>>> will the cached one, so that should be the fast one, no?
>>>
>>
>> Not really, because we do not know if the file handle is dir or non-dir.
>> If file handle is dir than decode of file handle is expensive and can
>> reduce worst case from two file handle decodes to just one:
>>
>> For lower dir:
>> - one index lookup fails
>> - one lower dir decode
>> - one icache lookup
>> - maybe one ovl_lookup_real(is_upper=false)
>>
>> For copied up indexed dir:
>> - one index lookup success
>> - one upper dir decode
>> - one ovl_lookup_real(is_upper=true)
>>
>> That method avoids the origin dir decode for upper indexed
>> dir at the cost of not looking for the decoded dir in icache.
>>
>> How about this as in idea: hash overlay inodes for NFS export
>> by origin fh instead of by origin inode pointer.
>
> Good idea.  That way we can leave out the middleman (underlying fh
> decode) in the cached case.
>
>> We can also avoid the "lookup index first" for non-dir
>> if we set a flag OVL_FH_FLAG_CONNECTABLE on exported
>> dir file handle, but my thinking was trying to keep the first version
>> simple with as fewer special cases as possible.
>
> Not sure I understand.  If cached lookup fails, then we do always need
> to try and lookup index first before falling back to decoding origin,
> right?
>

If you are referring to cache lookup by origin fh, then yes.
If icache by origin fh lookup fails, we should lookup index to check
for whiteout, before we decode origin fh, because index lookup is
cheaper than reconnecting a connectable file handle decode.

If we had marked the file handle 'non-connectable', then for non-dir
non-connectable file handles, origin decode is actually slightly
faster than index lookup, but I don't think it is worth the special
casing and marking the file handle for the corner case, right?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files
  2018-01-16 10:40         ` Amir Goldstein
@ 2018-01-16 11:07           ` Miklos Szeredi
  0 siblings, 0 replies; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-16 11:07 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Tue, Jan 16, 2018 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Jan 16, 2018 at 12:10 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Tue, Jan 16, 2018 at 10:37 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Tue, Jan 16, 2018 at 11:16 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>> Lookup overlay inode in cache by origin inode, so we can decode a file
>>>>> handle of an open file even if the index has a whiteout index entry to
>>>>> mark this overlay inode was unlinked.
>>>>>
>>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>>> ---
>>>>>  fs/overlayfs/export.c    | 22 ++++++++++++++++++++--
>>>>>  fs/overlayfs/inode.c     | 16 ++++++++++++++++
>>>>>  fs/overlayfs/overlayfs.h |  1 +
>>>>>  3 files changed, 37 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>>>> index 602bada474ba..6ecb54d4b52c 100644
>>>>> --- a/fs/overlayfs/export.c
>>>>> +++ b/fs/overlayfs/export.c
>>>>> @@ -385,13 +385,21 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
>>>>>         struct ovl_path *stack = &origin;
>>>>>         struct dentry *dentry = NULL;
>>>>>         struct dentry *index = NULL;
>>>>> +       struct inode *inode = NULL;
>>>>> +       bool is_deleted = false;
>>>>>         int err;
>>>>>
>>>>>         /* First lookup indexed upper by fh */
>>>>
>>>> Why not first look up origin, then look up ovl inode by origin?  It
>>>> seems a faster path than going through the index first.  Obviously if
>>>> icache lookup fails then we need to look up index, but the common case
>>>> will the cached one, so that should be the fast one, no?
>>>>
>>>
>>> Not really, because we do not know if the file handle is dir or non-dir.
>>> If file handle is dir than decode of file handle is expensive and can
>>> reduce worst case from two file handle decodes to just one:
>>>
>>> For lower dir:
>>> - one index lookup fails
>>> - one lower dir decode
>>> - one icache lookup
>>> - maybe one ovl_lookup_real(is_upper=false)
>>>
>>> For copied up indexed dir:
>>> - one index lookup success
>>> - one upper dir decode
>>> - one ovl_lookup_real(is_upper=true)
>>>
>>> That method avoids the origin dir decode for upper indexed
>>> dir at the cost of not looking for the decoded dir in icache.
>>>
>>> How about this as in idea: hash overlay inodes for NFS export
>>> by origin fh instead of by origin inode pointer.
>>
>> Good idea.  That way we can leave out the middleman (underlying fh
>> decode) in the cached case.
>>
>>> We can also avoid the "lookup index first" for non-dir
>>> if we set a flag OVL_FH_FLAG_CONNECTABLE on exported
>>> dir file handle, but my thinking was trying to keep the first version
>>> simple with as fewer special cases as possible.
>>
>> Not sure I understand.  If cached lookup fails, then we do always need
>> to try and lookup index first before falling back to decoding origin,
>> right?
>>
>
> If you are referring to cache lookup by origin fh, then yes.
> If icache by origin fh lookup fails, we should lookup index to check
> for whiteout, before we decode origin fh, because index lookup is
> cheaper than reconnecting a connectable file handle decode.
>
> If we had marked the file handle 'non-connectable', then for non-dir
> non-connectable file handles, origin decode is actually slightly
> faster than index lookup, but I don't think it is worth the special
> casing and marking the file handle for the corner case, right?

My point is: if icache lookup fails, then for origin handles we always
have to do an index lookup to find the current version overlay object.
So no point in doing the origin decode first, since that one may not
be needed (if index is a whiteout).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-15 14:56       ` Miklos Szeredi
@ 2018-01-17 11:18         ` Amir Goldstein
  2018-01-17 12:20           ` Amir Goldstein
  2018-01-17 15:42           ` Miklos Szeredi
  0 siblings, 2 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-17 11:18 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
[...]
> >>
> >> So, a working algorithm would be going up to the first connected
> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
> >> to finish, since not protected against always racing with renames.
> >> Can we take s_vfs_rename_sem on ovl to prevent that?
> >>
> >
> > Sounds like a simple and good enough solution.
> > Do we really need the locking of parent and restart connect if
> > we take s_vfs_rename_sem around ovl_lookup_real()?
>
> No, but s_vfs_rename_sem is a really heavyweight solution, we should
> do better than that for decoding a file handle.
>
> And we probably don't need anything else, since rename on ancestor
> means renamed dir is connected, and hopefully not evicted from the
> cache until we repeat the walk up.
>
> So need to lock parent, lookup ovl dentry, verify we got the same
> upper, if not retry icache lookup.
>
> Not sure we need to worry about that "hopefully".  Hopefully not.
>

Something like this??

This is just the raw fix to patch 4/17 without the icache lookup
that is added by later patches.

I added rename_lock seqlock around backwalk to connected ancestor
and take_dentry_name_snapshot() for the stability of real name
during overlay lookup.

I considered also storing OVL_I(d_inode(connected))->version
inside seqlock and comparing it to version in case lookup of child
failed. This could help us distinguish between overlay rename and
underlying rename (overlay dir version did not change) and return
ESTALE instead of restarting lookup in the latter case.
Wasn't sure if that was a good idea and what we loose if we leave it out.

I tested this code, but only with upper file handles of course
(xfstest generic/467).

Please let me know what you think.

Thanks,
Amir.

================================================================

>From 337543c3fcdf9323d3720d17ab6fc13e287bbec1 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@gmail.com>
Date: Thu, 28 Dec 2017 18:36:16 +0200
Subject: [PATCH v3 4/17] ovl: decode connected upper dir file handles

Until this change, we decoded upper file handles by instantiating an
overlay dentry from the real upper dentry. This is sufficient to handle
pure upper files, but insufficient to handle merge/impure dirs.

To that end, if decoded real upper dir is connected and hashed, we
lookup an overlay dentry with the same path as the real upper dir.
If decoded real upper is non-dir, we instantiate a disconnected overlay
dentry as before this change.

Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
exportfs never needs to call get_parent() and get_name() to reconnect an
upper overlay dir. Because connectable non-dir file handles are not
supported, exportfs will not be able to use fh_to_parent() and get_name()
methods to reconnect a disconnected non-dir to its parent. Therefore, the
methods get_parent() and get_name() are implemented just to print out a
sanity warning and the method fh_to_parent() is implemented to warn the
user that using the 'subtree_check' exportfs option is not supported.

An alternative approach could have been to implement instantiating of
an overlay directory inode from origin/index and implement get_parent()
and get_name() by calling into underlying fs operations and them
instantiating the overlay parent dir.

The reasons for not choosing the get_parent() approach were:
- Obtaining a disconnected overlay dir dentry would requires a
  delicate re-factoring of ovl_lookup() to get a dentry with overlay
  parent info. It was preferred to avoid doing that re-factoring unless
  it was proven worthy.
- Going down the path of disconnected dir would mean that the (non
  trivial) code path of d_splice_alias() could be traveled and that
  meant writing more tests and introduces race cases that are very hard
  to hit on purpose. Taking the path of connecting overlay dentry by
  forward lookup is therefore the safe and boring way to avoid surprises.

The culprit of the chosen "connected overlay dentry" approach:
- We need to take special care to rename of ancestors while connecting
  the overlay dentry by real dentry path. These subtleties are usually
  handled by generic exportfs and VFS code.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/export.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 214 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 557c29928e98..35f37a72d55e 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -130,6 +130,188 @@ static struct dentry *ovl_obtain_alias(struct
super_block *sb,
        return dentry;
 }

+/*
+ * Lookup a child overlay dentry whose real dentry is @real.
+ * If @real is on upper layer, we lookup a child overlay dentry with the same
+ * name as the real dentry. Otherwise, we need to consult index for lookup.
+ */
+static struct dentry *ovl_lookup_real_one(struct dentry *parent,
+                                         struct dentry *real,
+                                         struct ovl_layer *layer)
+{
+       struct dentry *this;
+       struct name_snapshot name;
+       int err;
+
+       /* TODO: use index when looking up by lower real dentry */
+       if (layer->idx)
+               return ERR_PTR(-EACCES);
+
+       /*
+        * Lookup overlay dentry by real name. The parent mutex protects us
+        * from racing with overlay rename. If the overlay dentry that is
+        * above real has already been moved to a different parent, then this
+        * lookup will fail to find a child dentry whose real dentry is @real
+        * and we will have to restart the lookup of real path from the top.
+        *
+        * We also need to take a snapshot of real dentry name to protect us
+        * from racing with underlying layer rename. In this case, we don't
+        * care about returning ESTALE, only from referencing a free name
+        * pointer.
+        *
+        * TODO: try to lookup the renamed overlay dentry in inode cache by
+        *       real inode.
+        */
+       inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
+       take_dentry_name_snapshot(&name, real);
+       this = lookup_one_len(name.name, parent, strlen(name.name));
+       err = PTR_ERR(this);
+       if (IS_ERR(this)) {
+               goto fail;
+       } else if (!this || !this->d_inode) {
+               dput(this);
+               err = -ENOENT;
+               goto fail;
+       } else if (ovl_dentry_upper(this) != real) {
+               dput(this);
+               err = -ESTALE;
+               goto fail;
+       }
+
+out:
+       release_dentry_name_snapshot(&name);
+       inode_unlock(d_inode(parent));
+       return this;
+
+fail:
+       pr_warn_ratelimited("overlayfs: failed to lookup one by real
(%pd2, layer=%d, parent=%pd2, err=%i)\n",
+                           real, layer->idx, parent, err);
+       this = ERR_PTR(err);
+       goto out;
+}
+
+/*
+ * Lookup an overlay dentry whose real dentry is @real.
+ * If @real is on upper layer, we lookup a child overlay dentry with the same
+ * path the real dentry. Otherwise, we need to consult index for lookup.
+ */
+static struct dentry *ovl_lookup_real(struct super_block *sb,
+                                     struct dentry *real,
+                                     struct ovl_layer *layer)
+{
+       struct dentry *connected;
+       unsigned int seq;
+       int err = 0;
+
+       /* TODO: use index when looking up by lower real dentry */
+       if (layer->idx)
+               return ERR_PTR(-EACCES);
+
+       connected = dget(sb->s_root);
+       while (!err) {
+               struct dentry *next, *this;
+               struct dentry *parent = NULL;
+               struct dentry *real_connected = layer->mnt->mnt_root;
+
+               if (real_connected == real)
+                       break;
+
+               /*
+                * Find the topmost dentry not yet connected. Taking rename_lock
+                * so at least we don't race with rename when walking back to
+                * 'real_connected'.
+                */
+               seq = read_seqbegin(&rename_lock);
+               next = dget(real);
+               for (;;) {
+                       parent = dget_parent(next);
+
+                       if (real_connected == parent)
+                               break;
+
+                       /*
+                        * If real file has been moved out of the layer root
+                        * directory, we will eventully hit the real fs root.
+                        */
+                       if (parent == next) {
+                               err = -EXDEV;
+                               break;
+                       }
+
+                       dput(next);
+                       next = parent;
+               }
+
+               if (!read_seqretry(&rename_lock, seq) && !err) {
+                       this = ovl_lookup_real_one(connected, next, layer);
+                       /*
+                        * Lookup of child in overlay can fail when racing with
+                        * overlay rename of child away from 'connected' parent.
+                        * In this case, we need to restart the lookup from the
+                        * top, because we cannot trust that 'real_connected' is
+                        * still an ancestor of 'real'.
+                        */
+                       if (IS_ERR(this)) {
+                               err = PTR_ERR(this);
+                               if (err == -ENOENT || err == -ESTALE) {
+                                       this = dget(sb->s_root);
+                                       err = 0;
+                               }
+                       }
+                       if (!err) {
+                               dput(connected);
+                               connected = this;
+                       }
+               }
+
+               dput(parent);
+               dput(next);
+       }
+
+       if (err)
+               goto fail;
+
+       return connected;
+
+fail:
+       pr_warn_ratelimited("overlayfs: failed to lookup by real
(%pd2, layer=%d, connected=%pd2, err=%i)\n",
+                           real, layer->idx, connected, err);
+       dput(connected);
+       return ERR_PTR(err);
+}
+
+
+/*
+ * Get an overlay dentry from upper/lower real dentries.
+ */
+static struct dentry *ovl_get_dentry(struct super_block *sb,
+                                    struct dentry *upper,
+                                    struct ovl_path *lowerpath)
+{
+       struct ovl_fs *ofs = sb->s_fs_info;
+       struct ovl_layer layer = { .mnt = ofs->upper_mnt };
+
+       /* TODO: get non-upper dentry */
+       if (!upper)
+               return ERR_PTR(-EACCES);
+
+       /*
+        * Obtain a disconnected overlay dentry from a non-dir real upper
+        * dentry.
+        */
+       if (!d_is_dir(upper))
+               return ovl_obtain_alias(sb, upper, NULL);
+
+       /* Removed empty directory? */
+       if ((upper->d_flags & DCACHE_DISCONNECTED) || d_unhashed(upper))
+               return ERR_PTR(-ENOENT);
+
+       /*
+        * If real upper dentry is connected and hashed, get a connected
+        * overlay dentry with the same path as the real upper dentry.
+        */
+       return ovl_lookup_real(sb, upper, &layer);
+}
+
 static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
                                        struct ovl_fh *fh)
 {
@@ -144,7 +326,7 @@ static struct dentry *ovl_upper_fh_to_d(struct
super_block *sb,
        if (IS_ERR_OR_NULL(upper))
                return upper;

-       dentry = ovl_obtain_alias(sb, upper, NULL);
+       dentry = ovl_get_dentry(sb, upper, NULL);
        dput(upper);

        return dentry;
@@ -183,7 +365,38 @@ static struct dentry *ovl_fh_to_dentry(struct
super_block *sb, struct fid *fid,
        return ERR_PTR(err);
 }

+static struct dentry *ovl_fh_to_parent(struct super_block *sb, struct fid *fid,
+                                      int fh_len, int fh_type)
+{
+       pr_warn_ratelimited("overlayfs: connectable file handles not
supported; use 'no_subtree_check' exportfs option.\n");
+       return ERR_PTR(-EACCES);
+}
+
+static int ovl_get_name(struct dentry *parent, char *name,
+                       struct dentry *child)
+{
+       /*
+        * ovl_fh_to_dentry() returns connected dir overlay dentries and
+        * ovl_fh_to_parent() is not implemented, so we should not get here.
+        */
+       WARN_ON_ONCE(1);
+       return -EIO;
+}
+
+static struct dentry *ovl_get_parent(struct dentry *dentry)
+{
+       /*
+        * ovl_fh_to_dentry() returns connected dir overlay dentries, so we
+        * should not get here.
+        */
+       WARN_ON_ONCE(1);
+       return ERR_PTR(-EIO);
+}
+
 const struct export_operations ovl_export_operations = {
        .encode_fh      = ovl_encode_inode_fh,
        .fh_to_dentry   = ovl_fh_to_dentry,
+       .fh_to_parent   = ovl_fh_to_parent,
+       .get_name       = ovl_get_name,
+       .get_parent     = ovl_get_parent,
 };
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-17 11:18         ` Amir Goldstein
@ 2018-01-17 12:20           ` Amir Goldstein
  2018-01-17 13:29             ` Amir Goldstein
  2018-01-17 15:42           ` Miklos Szeredi
  1 sibling, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-17 12:20 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Wed, Jan 17, 2018 at 1:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
> [...]
>> >>
>> >> So, a working algorithm would be going up to the first connected
>> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
>> >> to finish, since not protected against always racing with renames.
>> >> Can we take s_vfs_rename_sem on ovl to prevent that?
>> >>
>> >
>> > Sounds like a simple and good enough solution.
>> > Do we really need the locking of parent and restart connect if
>> > we take s_vfs_rename_sem around ovl_lookup_real()?
>>
>> No, but s_vfs_rename_sem is a really heavyweight solution, we should
>> do better than that for decoding a file handle.
>>
>> And we probably don't need anything else, since rename on ancestor
>> means renamed dir is connected, and hopefully not evicted from the
>> cache until we repeat the walk up.
>>
>> So need to lock parent, lookup ovl dentry, verify we got the same
>> upper, if not retry icache lookup.
>>
>> Not sure we need to worry about that "hopefully".  Hopefully not.
>>
>
> Something like this??
>
> This is just the raw fix to patch 4/17 without the icache lookup
> that is added by later patches.
>
> I added rename_lock seqlock around backwalk to connected ancestor
> and take_dentry_name_snapshot() for the stability of real name
> during overlay lookup.
>
> I considered also storing OVL_I(d_inode(connected))->version
> inside seqlock and comparing it to version in case lookup of child
> failed. This could help us distinguish between overlay rename and
> underlying rename (overlay dir version did not change) and return
> ESTALE instead of restarting lookup in the latter case.
> Wasn't sure if that was a good idea and what we loose if we leave it out.

Well, if nothing else, it's a good idea for preventing endless loop
due to bugs...
adding some snippets below.

>
> I tested this code, but only with upper file handles of course
> (xfstest generic/467).
>
> Please let me know what you think.
>
> Thanks,
> Amir.
>
> ================================================================
>
> From 337543c3fcdf9323d3720d17ab6fc13e287bbec1 Mon Sep 17 00:00:00 2001
> From: Amir Goldstein <amir73il@gmail.com>
> Date: Thu, 28 Dec 2017 18:36:16 +0200
> Subject: [PATCH v3 4/17] ovl: decode connected upper dir file handles
>
> Until this change, we decoded upper file handles by instantiating an
> overlay dentry from the real upper dentry. This is sufficient to handle
> pure upper files, but insufficient to handle merge/impure dirs.
>
> To that end, if decoded real upper dir is connected and hashed, we
> lookup an overlay dentry with the same path as the real upper dir.
> If decoded real upper is non-dir, we instantiate a disconnected overlay
> dentry as before this change.
>
> Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
> exportfs never needs to call get_parent() and get_name() to reconnect an
> upper overlay dir. Because connectable non-dir file handles are not
> supported, exportfs will not be able to use fh_to_parent() and get_name()
> methods to reconnect a disconnected non-dir to its parent. Therefore, the
> methods get_parent() and get_name() are implemented just to print out a
> sanity warning and the method fh_to_parent() is implemented to warn the
> user that using the 'subtree_check' exportfs option is not supported.
>
> An alternative approach could have been to implement instantiating of
> an overlay directory inode from origin/index and implement get_parent()
> and get_name() by calling into underlying fs operations and them
> instantiating the overlay parent dir.
>
> The reasons for not choosing the get_parent() approach were:
> - Obtaining a disconnected overlay dir dentry would requires a
>   delicate re-factoring of ovl_lookup() to get a dentry with overlay
>   parent info. It was preferred to avoid doing that re-factoring unless
>   it was proven worthy.
> - Going down the path of disconnected dir would mean that the (non
>   trivial) code path of d_splice_alias() could be traveled and that
>   meant writing more tests and introduces race cases that are very hard
>   to hit on purpose. Taking the path of connecting overlay dentry by
>   forward lookup is therefore the safe and boring way to avoid surprises.
>
> The culprit of the chosen "connected overlay dentry" approach:
> - We need to take special care to rename of ancestors while connecting
>   the overlay dentry by real dentry path. These subtleties are usually
>   handled by generic exportfs and VFS code.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/export.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 214 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 557c29928e98..35f37a72d55e 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -130,6 +130,188 @@ static struct dentry *ovl_obtain_alias(struct
> super_block *sb,
>         return dentry;
>  }
>
> +/*
> + * Lookup a child overlay dentry whose real dentry is @real.
> + * If @real is on upper layer, we lookup a child overlay dentry with the same
> + * name as the real dentry. Otherwise, we need to consult index for lookup.
> + */
> +static struct dentry *ovl_lookup_real_one(struct dentry *parent, u64 ver,
> +                                         struct dentry *real,
> +                                         struct ovl_layer *layer)
> +{
> +       struct dentry *this;
> +       struct name_snapshot name;
> +       int err;
> +
> +       /* TODO: use index when looking up by lower real dentry */
> +       if (layer->idx)
> +               return ERR_PTR(-EACCES);
> +
> +       /*
> +        * Lookup overlay dentry by real name. The parent mutex protects us
> +        * from racing with overlay rename. If the overlay dentry that is
> +        * above real has already been moved to a different parent, then this
> +        * lookup will fail to find a child dentry whose real dentry is @real
> +        * and we will have to restart the lookup of real path from the top.
> +        *
> +        * We also need to take a snapshot of real dentry name to protect us
> +        * from racing with underlying layer rename. In this case, we don't
> +        * care about returning ESTALE, only from referencing a free name
> +        * pointer.
> +        *
> +        * TODO: try to lookup the renamed overlay dentry in inode cache by
> +        *       real inode.
> +        */
> +       inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);

+       err = -ECHILD;
+       if (ovl_dentry_version_get(parent) != ver)
+               goto fail;
+

> +       take_dentry_name_snapshot(&name, real);
> +       this = lookup_one_len(name.name, parent, strlen(name.name));
> +       err = PTR_ERR(this);
> +       if (IS_ERR(this)) {
> +               goto fail;
> +       } else if (!this || !this->d_inode) {
> +               dput(this);
> +               err = -ENOENT;
> +               goto fail;
> +       } else if (ovl_dentry_upper(this) != real) {
> +               dput(this);
> +               err = -ESTALE;
> +               goto fail;
> +       }
> +
> +out:
> +       release_dentry_name_snapshot(&name);
> +       inode_unlock(d_inode(parent));
> +       return this;
> +
> +fail:
> +       pr_warn_ratelimited("overlayfs: failed to lookup one by real
> (%pd2, layer=%d, parent=%pd2, err=%i)\n",
> +                           real, layer->idx, parent, err);
> +       this = ERR_PTR(err);
> +       goto out;
> +}
> +
> +/*
> + * Lookup an overlay dentry whose real dentry is @real.
> + * If @real is on upper layer, we lookup a child overlay dentry with the same
> + * path the real dentry. Otherwise, we need to consult index for lookup.
> + */
> +static struct dentry *ovl_lookup_real(struct super_block *sb,
> +                                     struct dentry *real,
> +                                     struct ovl_layer *layer)
> +{
> +       struct dentry *connected;
> +       unsigned int seq;
> +       int err = 0;
> +
> +       /* TODO: use index when looking up by lower real dentry */
> +       if (layer->idx)
> +               return ERR_PTR(-EACCES);
> +
> +       connected = dget(sb->s_root);
> +       while (!err) {
> +               struct dentry *next, *this;
> +               struct dentry *parent = NULL;
> +               struct dentry *real_connected = layer->mnt->mnt_root;

That's a bug. 'real_connected' should be forwarded to real dentry of
'connected'. A later patch implements ovl_dentry_real_at().

+               struct dentry *real_connected = ovl_dentry_upper(connected);

> +
> +               if (real_connected == real)
> +                       break;
> +
> +               /*
> +                * Find the topmost dentry not yet connected. Taking rename_lock
> +                * so at least we don't race with rename when walking back to
> +                * 'real_connected'.
> +                */
> +               seq = read_seqbegin(&rename_lock);

+               inode_lock(d_inode(connected));
+               ver = ovl_dentry_version_get(connected);
+               inode_unlock(d_inode(connected));

> +               next = dget(real);
> +               for (;;) {
> +                       parent = dget_parent(next);
> +
> +                       if (real_connected == parent)
> +                               break;
> +
> +                       /*
> +                        * If real file has been moved out of the layer root
> +                        * directory, we will eventully hit the real fs root.
> +                        */
> +                       if (parent == next) {
> +                               err = -EXDEV;
> +                               break;
> +                       }
> +
> +                       dput(next);
> +                       next = parent;
> +               }
> +
> +               if (!read_seqretry(&rename_lock, seq) && !err) {
> +                       this = ovl_lookup_real_one(connected, ver, next, layer);
> +                       /*
> +                        * Lookup of child in overlay can fail when racing with
> +                        * overlay rename of child away from 'connected' parent.
> +                        * In this case, we need to restart the lookup from the
> +                        * top, because we cannot trust that 'real_connected' is
> +                        * still an ancestor of 'real'.
> +                        */
> +                       if (IS_ERR(this)) {
> +                               err = PTR_ERR(this);
> +                               if (err == -ECHILD) {
> +                                       this = dget(sb->s_root);
> +                                       err = 0;
> +                               }
> +                       }
> +                       if (!err) {
> +                               dput(connected);
> +                               connected = this;
> +                       }
> +               }
> +
> +               dput(parent);
> +               dput(next);
> +       }
> +
> +       if (err)
> +               goto fail;
> +
> +       return connected;
> +
> +fail:
> +       pr_warn_ratelimited("overlayfs: failed to lookup by real
> (%pd2, layer=%d, connected=%pd2, err=%i)\n",
> +                           real, layer->idx, connected, err);
> +       dput(connected);
> +       return ERR_PTR(err);
> +}
> +
> +
> +/*
> + * Get an overlay dentry from upper/lower real dentries.
> + */
> +static struct dentry *ovl_get_dentry(struct super_block *sb,
> +                                    struct dentry *upper,
> +                                    struct ovl_path *lowerpath)
> +{
> +       struct ovl_fs *ofs = sb->s_fs_info;
> +       struct ovl_layer layer = { .mnt = ofs->upper_mnt };
> +
> +       /* TODO: get non-upper dentry */
> +       if (!upper)
> +               return ERR_PTR(-EACCES);
> +
> +       /*
> +        * Obtain a disconnected overlay dentry from a non-dir real upper
> +        * dentry.
> +        */
> +       if (!d_is_dir(upper))
> +               return ovl_obtain_alias(sb, upper, NULL);
> +
> +       /* Removed empty directory? */
> +       if ((upper->d_flags & DCACHE_DISCONNECTED) || d_unhashed(upper))
> +               return ERR_PTR(-ENOENT);
> +
> +       /*
> +        * If real upper dentry is connected and hashed, get a connected
> +        * overlay dentry with the same path as the real upper dentry.
> +        */
> +       return ovl_lookup_real(sb, upper, &layer);
> +}
> +
>  static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
>                                         struct ovl_fh *fh)
>  {
> @@ -144,7 +326,7 @@ static struct dentry *ovl_upper_fh_to_d(struct
> super_block *sb,
>         if (IS_ERR_OR_NULL(upper))
>                 return upper;
>
> -       dentry = ovl_obtain_alias(sb, upper, NULL);
> +       dentry = ovl_get_dentry(sb, upper, NULL);
>         dput(upper);
>
>         return dentry;
> @@ -183,7 +365,38 @@ static struct dentry *ovl_fh_to_dentry(struct
> super_block *sb, struct fid *fid,
>         return ERR_PTR(err);
>  }
>
> +static struct dentry *ovl_fh_to_parent(struct super_block *sb, struct fid *fid,
> +                                      int fh_len, int fh_type)
> +{
> +       pr_warn_ratelimited("overlayfs: connectable file handles not
> supported; use 'no_subtree_check' exportfs option.\n");
> +       return ERR_PTR(-EACCES);
> +}
> +
> +static int ovl_get_name(struct dentry *parent, char *name,
> +                       struct dentry *child)
> +{
> +       /*
> +        * ovl_fh_to_dentry() returns connected dir overlay dentries and
> +        * ovl_fh_to_parent() is not implemented, so we should not get here.
> +        */
> +       WARN_ON_ONCE(1);
> +       return -EIO;
> +}
> +
> +static struct dentry *ovl_get_parent(struct dentry *dentry)
> +{
> +       /*
> +        * ovl_fh_to_dentry() returns connected dir overlay dentries, so we
> +        * should not get here.
> +        */
> +       WARN_ON_ONCE(1);
> +       return ERR_PTR(-EIO);
> +}
> +
>  const struct export_operations ovl_export_operations = {
>         .encode_fh      = ovl_encode_inode_fh,
>         .fh_to_dentry   = ovl_fh_to_dentry,
> +       .fh_to_parent   = ovl_fh_to_parent,
> +       .get_name       = ovl_get_name,
> +       .get_parent     = ovl_get_parent,
>  };
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-17 12:20           ` Amir Goldstein
@ 2018-01-17 13:29             ` Amir Goldstein
  0 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-17 13:29 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Wed, Jan 17, 2018 at 2:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Jan 17, 2018 at 1:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>
>> [...]
>>> >>
>>> >> So, a working algorithm would be going up to the first connected
>>> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
>>> >> to finish, since not protected against always racing with renames.
>>> >> Can we take s_vfs_rename_sem on ovl to prevent that?
>>> >>
>>> >
>>> > Sounds like a simple and good enough solution.
>>> > Do we really need the locking of parent and restart connect if
>>> > we take s_vfs_rename_sem around ovl_lookup_real()?
>>>
>>> No, but s_vfs_rename_sem is a really heavyweight solution, we should
>>> do better than that for decoding a file handle.
>>>
>>> And we probably don't need anything else, since rename on ancestor
>>> means renamed dir is connected, and hopefully not evicted from the
>>> cache until we repeat the walk up.
>>>
>>> So need to lock parent, lookup ovl dentry, verify we got the same
>>> upper, if not retry icache lookup.
>>>
>>> Not sure we need to worry about that "hopefully".  Hopefully not.
>>>
>>
>> Something like this??
>>
>> This is just the raw fix to patch 4/17 without the icache lookup
>> that is added by later patches.
>>
>> I added rename_lock seqlock around backwalk to connected ancestor
>> and take_dentry_name_snapshot() for the stability of real name
>> during overlay lookup.
>>
>> I considered also storing OVL_I(d_inode(connected))->version
>> inside seqlock and comparing it to version in case lookup of child
>> failed. This could help us distinguish between overlay rename and
>> underlying rename (overlay dir version did not change) and return
>> ESTALE instead of restarting lookup in the latter case.
>> Wasn't sure if that was a good idea and what we loose if we leave it out.
>
> Well, if nothing else, it's a good idea for preventing endless loop
> due to bugs...
> adding some snippets below.
>
>>
>> I tested this code, but only with upper file handles of course
>> (xfstest generic/467).
>>
>> Please let me know what you think.
>>
>> Thanks,
>> Amir.
>>
>> ================================================================
>>
>> From 337543c3fcdf9323d3720d17ab6fc13e287bbec1 Mon Sep 17 00:00:00 2001
>> From: Amir Goldstein <amir73il@gmail.com>
>> Date: Thu, 28 Dec 2017 18:36:16 +0200
>> Subject: [PATCH v3 4/17] ovl: decode connected upper dir file handles
>>
>> Until this change, we decoded upper file handles by instantiating an
>> overlay dentry from the real upper dentry. This is sufficient to handle
>> pure upper files, but insufficient to handle merge/impure dirs.
>>
>> To that end, if decoded real upper dir is connected and hashed, we
>> lookup an overlay dentry with the same path as the real upper dir.
>> If decoded real upper is non-dir, we instantiate a disconnected overlay
>> dentry as before this change.
>>
>> Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
>> exportfs never needs to call get_parent() and get_name() to reconnect an
>> upper overlay dir. Because connectable non-dir file handles are not
>> supported, exportfs will not be able to use fh_to_parent() and get_name()
>> methods to reconnect a disconnected non-dir to its parent. Therefore, the
>> methods get_parent() and get_name() are implemented just to print out a
>> sanity warning and the method fh_to_parent() is implemented to warn the
>> user that using the 'subtree_check' exportfs option is not supported.
>>
>> An alternative approach could have been to implement instantiating of
>> an overlay directory inode from origin/index and implement get_parent()
>> and get_name() by calling into underlying fs operations and them
>> instantiating the overlay parent dir.
>>
>> The reasons for not choosing the get_parent() approach were:
>> - Obtaining a disconnected overlay dir dentry would requires a
>>   delicate re-factoring of ovl_lookup() to get a dentry with overlay
>>   parent info. It was preferred to avoid doing that re-factoring unless
>>   it was proven worthy.
>> - Going down the path of disconnected dir would mean that the (non
>>   trivial) code path of d_splice_alias() could be traveled and that
>>   meant writing more tests and introduces race cases that are very hard
>>   to hit on purpose. Taking the path of connecting overlay dentry by
>>   forward lookup is therefore the safe and boring way to avoid surprises.
>>
>> The culprit of the chosen "connected overlay dentry" approach:
>> - We need to take special care to rename of ancestors while connecting
>>   the overlay dentry by real dentry path. These subtleties are usually
>>   handled by generic exportfs and VFS code.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/export.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 214 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>> index 557c29928e98..35f37a72d55e 100644
>> --- a/fs/overlayfs/export.c
>> +++ b/fs/overlayfs/export.c
>> @@ -130,6 +130,188 @@ static struct dentry *ovl_obtain_alias(struct
>> super_block *sb,
>>         return dentry;
>>  }
>>
>> +/*
>> + * Lookup a child overlay dentry whose real dentry is @real.
>> + * If @real is on upper layer, we lookup a child overlay dentry with the same
>> + * name as the real dentry. Otherwise, we need to consult index for lookup.
>> + */
>> +static struct dentry *ovl_lookup_real_one(struct dentry *parent, u64 ver,
>> +                                         struct dentry *real,
>> +                                         struct ovl_layer *layer)
>> +{
>> +       struct dentry *this;
>> +       struct name_snapshot name;
>> +       int err;
>> +
>> +       /* TODO: use index when looking up by lower real dentry */
>> +       if (layer->idx)
>> +               return ERR_PTR(-EACCES);
>> +
>> +       /*
>> +        * Lookup overlay dentry by real name. The parent mutex protects us
>> +        * from racing with overlay rename. If the overlay dentry that is
>> +        * above real has already been moved to a different parent, then this
>> +        * lookup will fail to find a child dentry whose real dentry is @real
>> +        * and we will have to restart the lookup of real path from the top.
>> +        *
>> +        * We also need to take a snapshot of real dentry name to protect us
>> +        * from racing with underlying layer rename. In this case, we don't
>> +        * care about returning ESTALE, only from referencing a free name
>> +        * pointer.
>> +        *
>> +        * TODO: try to lookup the renamed overlay dentry in inode cache by
>> +        *       real inode.
>> +        */
>> +       inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
>
> +       err = -ECHILD;
> +       if (ovl_dentry_version_get(parent) != ver)
> +               goto fail;
> +
>

Actually, the version check seem correct for connecting up lower real dir,
which is the case where I hit the endless loop (under redirected middle layers),
but it does not belong in this patch, because pure upper dirs don't have their
overlay version incremented and because for upper layer lookup
ENOENT and ESTALE *should* restart lookup from the top.
However, for lower layer lookup they shouldn't (only ECHILD above should
trigger restart from the top).

I'll sort this thing up and push a branch for test/review.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-17 11:18         ` Amir Goldstein
  2018-01-17 12:20           ` Amir Goldstein
@ 2018-01-17 15:42           ` Miklos Szeredi
  2018-01-17 16:34             ` Amir Goldstein
  1 sibling, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-17 15:42 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Wed, Jan 17, 2018 at 12:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
> [...]
>> >>
>> >> So, a working algorithm would be going up to the first connected
>> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
>> >> to finish, since not protected against always racing with renames.
>> >> Can we take s_vfs_rename_sem on ovl to prevent that?
>> >>
>> >
>> > Sounds like a simple and good enough solution.
>> > Do we really need the locking of parent and restart connect if
>> > we take s_vfs_rename_sem around ovl_lookup_real()?
>>
>> No, but s_vfs_rename_sem is a really heavyweight solution, we should
>> do better than that for decoding a file handle.
>>
>> And we probably don't need anything else, since rename on ancestor
>> means renamed dir is connected, and hopefully not evicted from the
>> cache until we repeat the walk up.
>>
>> So need to lock parent, lookup ovl dentry, verify we got the same
>> upper, if not retry icache lookup.
>>
>> Not sure we need to worry about that "hopefully".  Hopefully not.
>>
>
> Something like this??
>
> This is just the raw fix to patch 4/17 without the icache lookup
> that is added by later patches.
>
> I added rename_lock seqlock around backwalk to connected ancestor
> and take_dentry_name_snapshot() for the stability of real name
> during overlay lookup.
>
> I considered also storing OVL_I(d_inode(connected))->version
> inside seqlock and comparing it to version in case lookup of child
> failed. This could help us distinguish between overlay rename and
> underlying rename (overlay dir version did not change) and return
> ESTALE instead of restarting lookup in the latter case.
> Wasn't sure if that was a good idea and what we loose if we leave it out.
>
> I tested this code, but only with upper file handles of course
> (xfstest generic/467).
>
> Please let me know what you think.
>
> Thanks,
> Amir.
>
> ================================================================
>
> From 337543c3fcdf9323d3720d17ab6fc13e287bbec1 Mon Sep 17 00:00:00 2001
> From: Amir Goldstein <amir73il@gmail.com>
> Date: Thu, 28 Dec 2017 18:36:16 +0200
> Subject: [PATCH v3 4/17] ovl: decode connected upper dir file handles
>
> Until this change, we decoded upper file handles by instantiating an
> overlay dentry from the real upper dentry. This is sufficient to handle
> pure upper files, but insufficient to handle merge/impure dirs.
>
> To that end, if decoded real upper dir is connected and hashed, we
> lookup an overlay dentry with the same path as the real upper dir.
> If decoded real upper is non-dir, we instantiate a disconnected overlay
> dentry as before this change.
>
> Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
> exportfs never needs to call get_parent() and get_name() to reconnect an
> upper overlay dir. Because connectable non-dir file handles are not
> supported, exportfs will not be able to use fh_to_parent() and get_name()
> methods to reconnect a disconnected non-dir to its parent. Therefore, the
> methods get_parent() and get_name() are implemented just to print out a
> sanity warning and the method fh_to_parent() is implemented to warn the
> user that using the 'subtree_check' exportfs option is not supported.
>
> An alternative approach could have been to implement instantiating of
> an overlay directory inode from origin/index and implement get_parent()
> and get_name() by calling into underlying fs operations and them
> instantiating the overlay parent dir.
>
> The reasons for not choosing the get_parent() approach were:
> - Obtaining a disconnected overlay dir dentry would requires a
>   delicate re-factoring of ovl_lookup() to get a dentry with overlay
>   parent info. It was preferred to avoid doing that re-factoring unless
>   it was proven worthy.
> - Going down the path of disconnected dir would mean that the (non
>   trivial) code path of d_splice_alias() could be traveled and that
>   meant writing more tests and introduces race cases that are very hard
>   to hit on purpose. Taking the path of connecting overlay dentry by
>   forward lookup is therefore the safe and boring way to avoid surprises.
>
> The culprit of the chosen "connected overlay dentry" approach:
> - We need to take special care to rename of ancestors while connecting
>   the overlay dentry by real dentry path. These subtleties are usually
>   handled by generic exportfs and VFS code.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/export.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 214 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 557c29928e98..35f37a72d55e 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -130,6 +130,188 @@ static struct dentry *ovl_obtain_alias(struct
> super_block *sb,
>         return dentry;
>  }
>
> +/*
> + * Lookup a child overlay dentry whose real dentry is @real.
> + * If @real is on upper layer, we lookup a child overlay dentry with the same
> + * name as the real dentry. Otherwise, we need to consult index for lookup.
> + */
> +static struct dentry *ovl_lookup_real_one(struct dentry *parent,
> +                                         struct dentry *real,
> +                                         struct ovl_layer *layer)
> +{
> +       struct dentry *this;
> +       struct name_snapshot name;
> +       int err;
> +
> +       /* TODO: use index when looking up by lower real dentry */
> +       if (layer->idx)
> +               return ERR_PTR(-EACCES);
> +
> +       /*
> +        * Lookup overlay dentry by real name. The parent mutex protects us
> +        * from racing with overlay rename. If the overlay dentry that is
> +        * above real has already been moved to a different parent, then this
> +        * lookup will fail to find a child dentry whose real dentry is @real
> +        * and we will have to restart the lookup of real path from the top.
> +        *
> +        * We also need to take a snapshot of real dentry name to protect us
> +        * from racing with underlying layer rename. In this case, we don't
> +        * care about returning ESTALE, only from referencing a free name
> +        * pointer.
> +        *
> +        * TODO: try to lookup the renamed overlay dentry in inode cache by
> +        *       real inode.
> +        */
> +       inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
> +       take_dentry_name_snapshot(&name, real);

No need to snapshot, just check if parent hasn't changed after
locking.  If parent is same, then name is guaranteed to be stable.

This also means, that only ESTALE is possible after this.   And ESTALE
is fatal, no need to retry after that.

> +       this = lookup_one_len(name.name, parent, strlen(name.name));
> +       err = PTR_ERR(this);
> +       if (IS_ERR(this)) {
> +               goto fail;
> +       } else if (!this || !this->d_inode) {
> +               dput(this);
> +               err = -ENOENT;
> +               goto fail;
> +       } else if (ovl_dentry_upper(this) != real) {
> +               dput(this);
> +               err = -ESTALE;
> +               goto fail;
> +       }
> +
> +out:
> +       release_dentry_name_snapshot(&name);
> +       inode_unlock(d_inode(parent));
> +       return this;
> +
> +fail:
> +       pr_warn_ratelimited("overlayfs: failed to lookup one by real
> (%pd2, layer=%d, parent=%pd2, err=%i)\n",
> +                           real, layer->idx, parent, err);
> +       this = ERR_PTR(err);
> +       goto out;
> +}
> +
> +/*
> + * Lookup an overlay dentry whose real dentry is @real.
> + * If @real is on upper layer, we lookup a child overlay dentry with the same
> + * path the real dentry. Otherwise, we need to consult index for lookup.
> + */
> +static struct dentry *ovl_lookup_real(struct super_block *sb,
> +                                     struct dentry *real,
> +                                     struct ovl_layer *layer)
> +{
> +       struct dentry *connected;
> +       unsigned int seq;
> +       int err = 0;
> +
> +       /* TODO: use index when looking up by lower real dentry */
> +       if (layer->idx)
> +               return ERR_PTR(-EACCES);
> +
> +       connected = dget(sb->s_root);
> +       while (!err) {
> +               struct dentry *next, *this;
> +               struct dentry *parent = NULL;
> +               struct dentry *real_connected = layer->mnt->mnt_root;
> +
> +               if (real_connected == real)
> +                       break;

This loop will never finish, since real_connected is mnt_root now.
Would be nice if there was a guaranteed way to finish this without
icache lookup, but I don't see how.

> +
> +               /*
> +                * Find the topmost dentry not yet connected. Taking rename_lock
> +                * so at least we don't race with rename when walking back to
> +                * 'real_connected'.
> +                */
> +               seq = read_seqbegin(&rename_lock);

I don't see what we gain with this.


> +               next = dget(real);
> +               for (;;) {
> +                       parent = dget_parent(next);
> +
> +                       if (real_connected == parent)
> +                               break;
> +
> +                       /*
> +                        * If real file has been moved out of the layer root
> +                        * directory, we will eventully hit the real fs root.
> +                        */
> +                       if (parent == next) {
> +                               err = -EXDEV;
> +                               break;
> +                       }
> +
> +                       dput(next);
> +                       next = parent;
> +               }
> +
> +               if (!read_seqretry(&rename_lock, seq) && !err) {
> +                       this = ovl_lookup_real_one(connected, next, layer);
> +                       /*
> +                        * Lookup of child in overlay can fail when racing with
> +                        * overlay rename of child away from 'connected' parent.
> +                        * In this case, we need to restart the lookup from the
> +                        * top, because we cannot trust that 'real_connected' is
> +                        * still an ancestor of 'real'.
> +                        */
> +                       if (IS_ERR(this)) {
> +                               err = PTR_ERR(this);
> +                               if (err == -ENOENT || err == -ESTALE) {
> +                                       this = dget(sb->s_root);
> +                                       err = 0;
> +                               }
> +                       }
> +                       if (!err) {
> +                               dput(connected);
> +                               connected = this;
> +                       }
> +               }
> +
> +               dput(parent);
> +               dput(next);
> +       }
> +
> +       if (err)
> +               goto fail;
> +
> +       return connected;
> +
> +fail:
> +       pr_warn_ratelimited("overlayfs: failed to lookup by real
> (%pd2, layer=%d, connected=%pd2, err=%i)\n",
> +                           real, layer->idx, connected, err);
> +       dput(connected);
> +       return ERR_PTR(err);
> +}
> +
> +
> +/*
> + * Get an overlay dentry from upper/lower real dentries.
> + */
> +static struct dentry *ovl_get_dentry(struct super_block *sb,
> +                                    struct dentry *upper,
> +                                    struct ovl_path *lowerpath)
> +{
> +       struct ovl_fs *ofs = sb->s_fs_info;
> +       struct ovl_layer layer = { .mnt = ofs->upper_mnt };
> +
> +       /* TODO: get non-upper dentry */
> +       if (!upper)
> +               return ERR_PTR(-EACCES);
> +
> +       /*
> +        * Obtain a disconnected overlay dentry from a non-dir real upper
> +        * dentry.
> +        */
> +       if (!d_is_dir(upper))
> +               return ovl_obtain_alias(sb, upper, NULL);
> +
> +       /* Removed empty directory? */
> +       if ((upper->d_flags & DCACHE_DISCONNECTED) || d_unhashed(upper))
> +               return ERR_PTR(-ENOENT);
> +
> +       /*
> +        * If real upper dentry is connected and hashed, get a connected
> +        * overlay dentry with the same path as the real upper dentry.
> +        */
> +       return ovl_lookup_real(sb, upper, &layer);
> +}
> +
>  static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
>                                         struct ovl_fh *fh)
>  {
> @@ -144,7 +326,7 @@ static struct dentry *ovl_upper_fh_to_d(struct
> super_block *sb,
>         if (IS_ERR_OR_NULL(upper))
>                 return upper;
>
> -       dentry = ovl_obtain_alias(sb, upper, NULL);
> +       dentry = ovl_get_dentry(sb, upper, NULL);
>         dput(upper);
>
>         return dentry;
> @@ -183,7 +365,38 @@ static struct dentry *ovl_fh_to_dentry(struct
> super_block *sb, struct fid *fid,
>         return ERR_PTR(err);
>  }
>
> +static struct dentry *ovl_fh_to_parent(struct super_block *sb, struct fid *fid,
> +                                      int fh_len, int fh_type)
> +{
> +       pr_warn_ratelimited("overlayfs: connectable file handles not
> supported; use 'no_subtree_check' exportfs option.\n");
> +       return ERR_PTR(-EACCES);
> +}
> +
> +static int ovl_get_name(struct dentry *parent, char *name,
> +                       struct dentry *child)
> +{
> +       /*
> +        * ovl_fh_to_dentry() returns connected dir overlay dentries and
> +        * ovl_fh_to_parent() is not implemented, so we should not get here.
> +        */
> +       WARN_ON_ONCE(1);
> +       return -EIO;
> +}
> +
> +static struct dentry *ovl_get_parent(struct dentry *dentry)
> +{
> +       /*
> +        * ovl_fh_to_dentry() returns connected dir overlay dentries, so we
> +        * should not get here.
> +        */
> +       WARN_ON_ONCE(1);
> +       return ERR_PTR(-EIO);
> +}
> +
>  const struct export_operations ovl_export_operations = {
>         .encode_fh      = ovl_encode_inode_fh,
>         .fh_to_dentry   = ovl_fh_to_dentry,
> +       .fh_to_parent   = ovl_fh_to_parent,
> +       .get_name       = ovl_get_name,
> +       .get_parent     = ovl_get_parent,
>  };
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-17 15:42           ` Miklos Szeredi
@ 2018-01-17 16:34             ` Amir Goldstein
  2018-01-17 21:36               ` Amir Goldstein
  2018-01-18  8:22               ` Miklos Szeredi
  0 siblings, 2 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-17 16:34 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Wed, Jan 17, 2018 at 5:42 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Jan 17, 2018 at 12:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>
>> [...]
>>> >>
>>> >> So, a working algorithm would be going up to the first connected
>>> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
>>> >> to finish, since not protected against always racing with renames.
>>> >> Can we take s_vfs_rename_sem on ovl to prevent that?
>>> >>
>>> >
>>> > Sounds like a simple and good enough solution.
>>> > Do we really need the locking of parent and restart connect if
>>> > we take s_vfs_rename_sem around ovl_lookup_real()?
>>>
>>> No, but s_vfs_rename_sem is a really heavyweight solution, we should
>>> do better than that for decoding a file handle.
>>>
>>> And we probably don't need anything else, since rename on ancestor
>>> means renamed dir is connected, and hopefully not evicted from the
>>> cache until we repeat the walk up.
>>>
>>> So need to lock parent, lookup ovl dentry, verify we got the same
>>> upper, if not retry icache lookup.
>>>
>>> Not sure we need to worry about that "hopefully".  Hopefully not.
>>>
>>
>> Something like this??
>>
>> This is just the raw fix to patch 4/17 without the icache lookup
>> that is added by later patches.
>>
>> I added rename_lock seqlock around backwalk to connected ancestor
>> and take_dentry_name_snapshot() for the stability of real name
>> during overlay lookup.
>>
>> I considered also storing OVL_I(d_inode(connected))->version
>> inside seqlock and comparing it to version in case lookup of child
>> failed. This could help us distinguish between overlay rename and
>> underlying rename (overlay dir version did not change) and return
>> ESTALE instead of restarting lookup in the latter case.
>> Wasn't sure if that was a good idea and what we loose if we leave it out.
>>
>> I tested this code, but only with upper file handles of course
>> (xfstest generic/467).
>>
>> Please let me know what you think.
>>
>> Thanks,
>> Amir.
>>
>> ================================================================
>>
>> From 337543c3fcdf9323d3720d17ab6fc13e287bbec1 Mon Sep 17 00:00:00 2001
>> From: Amir Goldstein <amir73il@gmail.com>
>> Date: Thu, 28 Dec 2017 18:36:16 +0200
>> Subject: [PATCH v3 4/17] ovl: decode connected upper dir file handles
>>
>> Until this change, we decoded upper file handles by instantiating an
>> overlay dentry from the real upper dentry. This is sufficient to handle
>> pure upper files, but insufficient to handle merge/impure dirs.
>>
>> To that end, if decoded real upper dir is connected and hashed, we
>> lookup an overlay dentry with the same path as the real upper dir.
>> If decoded real upper is non-dir, we instantiate a disconnected overlay
>> dentry as before this change.
>>
>> Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
>> exportfs never needs to call get_parent() and get_name() to reconnect an
>> upper overlay dir. Because connectable non-dir file handles are not
>> supported, exportfs will not be able to use fh_to_parent() and get_name()
>> methods to reconnect a disconnected non-dir to its parent. Therefore, the
>> methods get_parent() and get_name() are implemented just to print out a
>> sanity warning and the method fh_to_parent() is implemented to warn the
>> user that using the 'subtree_check' exportfs option is not supported.
>>
>> An alternative approach could have been to implement instantiating of
>> an overlay directory inode from origin/index and implement get_parent()
>> and get_name() by calling into underlying fs operations and them
>> instantiating the overlay parent dir.
>>
>> The reasons for not choosing the get_parent() approach were:
>> - Obtaining a disconnected overlay dir dentry would requires a
>>   delicate re-factoring of ovl_lookup() to get a dentry with overlay
>>   parent info. It was preferred to avoid doing that re-factoring unless
>>   it was proven worthy.
>> - Going down the path of disconnected dir would mean that the (non
>>   trivial) code path of d_splice_alias() could be traveled and that
>>   meant writing more tests and introduces race cases that are very hard
>>   to hit on purpose. Taking the path of connecting overlay dentry by
>>   forward lookup is therefore the safe and boring way to avoid surprises.
>>
>> The culprit of the chosen "connected overlay dentry" approach:
>> - We need to take special care to rename of ancestors while connecting
>>   the overlay dentry by real dentry path. These subtleties are usually
>>   handled by generic exportfs and VFS code.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/export.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 214 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>> index 557c29928e98..35f37a72d55e 100644
>> --- a/fs/overlayfs/export.c
>> +++ b/fs/overlayfs/export.c
>> @@ -130,6 +130,188 @@ static struct dentry *ovl_obtain_alias(struct
>> super_block *sb,
>>         return dentry;
>>  }
>>
>> +/*
>> + * Lookup a child overlay dentry whose real dentry is @real.
>> + * If @real is on upper layer, we lookup a child overlay dentry with the same
>> + * name as the real dentry. Otherwise, we need to consult index for lookup.
>> + */
>> +static struct dentry *ovl_lookup_real_one(struct dentry *parent,
>> +                                         struct dentry *real,
>> +                                         struct ovl_layer *layer)
>> +{
>> +       struct dentry *this;
>> +       struct name_snapshot name;
>> +       int err;
>> +
>> +       /* TODO: use index when looking up by lower real dentry */
>> +       if (layer->idx)
>> +               return ERR_PTR(-EACCES);
>> +
>> +       /*
>> +        * Lookup overlay dentry by real name. The parent mutex protects us
>> +        * from racing with overlay rename. If the overlay dentry that is
>> +        * above real has already been moved to a different parent, then this
>> +        * lookup will fail to find a child dentry whose real dentry is @real
>> +        * and we will have to restart the lookup of real path from the top.
>> +        *
>> +        * We also need to take a snapshot of real dentry name to protect us
>> +        * from racing with underlying layer rename. In this case, we don't
>> +        * care about returning ESTALE, only from referencing a free name
>> +        * pointer.
>> +        *
>> +        * TODO: try to lookup the renamed overlay dentry in inode cache by
>> +        *       real inode.
>> +        */
>> +       inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
>> +       take_dentry_name_snapshot(&name, real);
>
> No need to snapshot, just check if parent hasn't changed after
> locking.  If parent is same, then name is guaranteed to be stable.
>

I don't understand.
We are not holding a lock on real parent, only on overlay parent.
What makes the real name stable?
The snapshot is not to protect from racing with overlay rename.
The snapshot is for protecting from race with real rename, just to
make sure we don't dereference a stale name pointer.

> This also means, that only ESTALE is possible after this.   And ESTALE
> is fatal, no need to retry after that.

OK. I will return ECHILD for parent that has changed
and will restart only on ECHILD.

>
>> +       this = lookup_one_len(name.name, parent, strlen(name.name));
>> +       err = PTR_ERR(this);
>> +       if (IS_ERR(this)) {
>> +               goto fail;
>> +       } else if (!this || !this->d_inode) {
>> +               dput(this);
>> +               err = -ENOENT;
>> +               goto fail;
>> +       } else if (ovl_dentry_upper(this) != real) {
>> +               dput(this);
>> +               err = -ESTALE;
>> +               goto fail;
>> +       }
>> +
>> +out:
>> +       release_dentry_name_snapshot(&name);
>> +       inode_unlock(d_inode(parent));
>> +       return this;
>> +
>> +fail:
>> +       pr_warn_ratelimited("overlayfs: failed to lookup one by real
>> (%pd2, layer=%d, parent=%pd2, err=%i)\n",
>> +                           real, layer->idx, parent, err);
>> +       this = ERR_PTR(err);
>> +       goto out;
>> +}
>> +
>> +/*
>> + * Lookup an overlay dentry whose real dentry is @real.
>> + * If @real is on upper layer, we lookup a child overlay dentry with the same
>> + * path the real dentry. Otherwise, we need to consult index for lookup.
>> + */
>> +static struct dentry *ovl_lookup_real(struct super_block *sb,
>> +                                     struct dentry *real,
>> +                                     struct ovl_layer *layer)
>> +{
>> +       struct dentry *connected;
>> +       unsigned int seq;
>> +       int err = 0;
>> +
>> +       /* TODO: use index when looking up by lower real dentry */
>> +       if (layer->idx)
>> +               return ERR_PTR(-EACCES);
>> +
>> +       connected = dget(sb->s_root);
>> +       while (!err) {
>> +               struct dentry *next, *this;
>> +               struct dentry *parent = NULL;
>> +               struct dentry *real_connected = layer->mnt->mnt_root;
>> +
>> +               if (real_connected == real)
>> +                       break;
>
> This loop will never finish, since real_connected is mnt_root now.
> Would be nice if there was a guaranteed way to finish this without
> icache lookup, but I don't see how.
>

That's a bug.
The correct code is:

struct dentry *real_connected = ovl_dentry_upper(connected);


>> +
>> +               /*
>> +                * Find the topmost dentry not yet connected. Taking rename_lock
>> +                * so at least we don't race with rename when walking back to
>> +                * 'real_connected'.
>> +                */
>> +               seq = read_seqbegin(&rename_lock);
>
> I don't see what we gain with this.
>

I can't say that I do see it, but perhaps there is something yet
to be gained by adding this later for lower layers lookup.
Perhaps when looking on lower real layer, we can store the
overlay dir cache version of 'connected' (connected in this case
may be an indexed merge dir).
After we take 'connected' dir mutex, we cannot check that
real parent hasn't changes as an indication to no overlay rename
because overlay rename happens on upper, but we can compare
the dir cache version of 'connected' dir to the version we stored
under rename_lock.
Then we can tell if lower lookup has failed because of some
permanent error (e.g. middle layer redirect) or because of an
indexed rename, so we need to restart.
Maybe that gains us something?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files
  2018-01-16 10:10       ` Miklos Szeredi
  2018-01-16 10:40         ` Amir Goldstein
@ 2018-01-17 21:05         ` Amir Goldstein
  1 sibling, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-17 21:05 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Tue, Jan 16, 2018 at 12:10 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Tue, Jan 16, 2018 at 10:37 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Tue, Jan 16, 2018 at 11:16 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> Lookup overlay inode in cache by origin inode, so we can decode a file
>>>> handle of an open file even if the index has a whiteout index entry to
>>>> mark this overlay inode was unlinked.
>>>>
>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>> ---
>>>>  fs/overlayfs/export.c    | 22 ++++++++++++++++++++--
>>>>  fs/overlayfs/inode.c     | 16 ++++++++++++++++
>>>>  fs/overlayfs/overlayfs.h |  1 +
>>>>  3 files changed, 37 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>>> index 602bada474ba..6ecb54d4b52c 100644
>>>> --- a/fs/overlayfs/export.c
>>>> +++ b/fs/overlayfs/export.c
>>>> @@ -385,13 +385,21 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
>>>>         struct ovl_path *stack = &origin;
>>>>         struct dentry *dentry = NULL;
>>>>         struct dentry *index = NULL;
>>>> +       struct inode *inode = NULL;
>>>> +       bool is_deleted = false;
>>>>         int err;
>>>>
>>>>         /* First lookup indexed upper by fh */
>>>
>>> Why not first look up origin, then look up ovl inode by origin?  It
>>> seems a faster path than going through the index first.  Obviously if
>>> icache lookup fails then we need to look up index, but the common case
>>> will the cached one, so that should be the fast one, no?
>>>
>>
>> Not really, because we do not know if the file handle is dir or non-dir.
>> If file handle is dir than decode of file handle is expensive and can
>> reduce worst case from two file handle decodes to just one:
>>
>> For lower dir:
>> - one index lookup fails
>> - one lower dir decode
>> - one icache lookup
>> - maybe one ovl_lookup_real(is_upper=false)
>>
>> For copied up indexed dir:
>> - one index lookup success
>> - one upper dir decode
>> - one ovl_lookup_real(is_upper=true)
>>
>> That method avoids the origin dir decode for upper indexed
>> dir at the cost of not looking for the decoded dir in icache.
>>
>> How about this as in idea: hash overlay inodes for NFS export
>> by origin fh instead of by origin inode pointer.
>
> Good idea.  That way we can leave out the middleman (underlying fh
> decode) in the cached case.
>

If it's all right with you, I prefer to get the initial version out the door
first and handle this optimization later.

Shout if you disagree.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-17 16:34             ` Amir Goldstein
@ 2018-01-17 21:36               ` Amir Goldstein
  2018-01-18  8:22               ` Miklos Szeredi
  1 sibling, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-17 21:36 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Wed, Jan 17, 2018 at 6:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Jan 17, 2018 at 5:42 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Jan 17, 2018 at 12:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>
>>> [...]
>>>> >>
>>>> >> So, a working algorithm would be going up to the first connected
>>>> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
>>>> >> to finish, since not protected against always racing with renames.
>>>> >> Can we take s_vfs_rename_sem on ovl to prevent that?
>>>> >>
>>>> >
>>>> > Sounds like a simple and good enough solution.
>>>> > Do we really need the locking of parent and restart connect if
>>>> > we take s_vfs_rename_sem around ovl_lookup_real()?
>>>>
>>>> No, but s_vfs_rename_sem is a really heavyweight solution, we should
>>>> do better than that for decoding a file handle.
>>>>
>>>> And we probably don't need anything else, since rename on ancestor
>>>> means renamed dir is connected, and hopefully not evicted from the
>>>> cache until we repeat the walk up.
>>>>
>>>> So need to lock parent, lookup ovl dentry, verify we got the same
>>>> upper, if not retry icache lookup.
>>>>
>>>> Not sure we need to worry about that "hopefully".  Hopefully not.
>>>>
>>>
>>> Something like this??
>>>
>>> This is just the raw fix to patch 4/17 without the icache lookup
>>> that is added by later patches.
>>>
>>> I added rename_lock seqlock around backwalk to connected ancestor
>>> and take_dentry_name_snapshot() for the stability of real name
>>> during overlay lookup.
>>>
>>> I considered also storing OVL_I(d_inode(connected))->version
>>> inside seqlock and comparing it to version in case lookup of child
>>> failed. This could help us distinguish between overlay rename and
>>> underlying rename (overlay dir version did not change) and return
>>> ESTALE instead of restarting lookup in the latter case.
>>> Wasn't sure if that was a good idea and what we loose if we leave it out.
>>>

[...]

>>> +       /*
>>> +        * Lookup overlay dentry by real name. The parent mutex protects us
>>> +        * from racing with overlay rename. If the overlay dentry that is
>>> +        * above real has already been moved to a different parent, then this
>>> +        * lookup will fail to find a child dentry whose real dentry is @real
>>> +        * and we will have to restart the lookup of real path from the top.
>>> +        *
>>> +        * We also need to take a snapshot of real dentry name to protect us
>>> +        * from racing with underlying layer rename. In this case, we don't
>>> +        * care about returning ESTALE, only from referencing a free name
>>> +        * pointer.
>>> +        *
>>> +        * TODO: try to lookup the renamed overlay dentry in inode cache by
>>> +        *       real inode.
>>> +        */
>>> +       inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
>>> +       take_dentry_name_snapshot(&name, real);
>>
>> No need to snapshot, just check if parent hasn't changed after
>> locking.  If parent is same, then name is guaranteed to be stable.
>>
>
> I don't understand.
> We are not holding a lock on real parent, only on overlay parent.
> What makes the real name stable?
> The snapshot is not to protect from racing with overlay rename.
> The snapshot is for protecting from race with real rename, just to
> make sure we don't dereference a stale name pointer.
>

[...]

>>> +
>>> +               /*
>>> +                * Find the topmost dentry not yet connected. Taking rename_lock
>>> +                * so at least we don't race with rename when walking back to
>>> +                * 'real_connected'.
>>> +                */
>>> +               seq = read_seqbegin(&rename_lock);
>>
>> I don't see what we gain with this.
>>
>
> I can't say that I do see it, but perhaps there is something yet
> to be gained by adding this later for lower layers lookup.
> Perhaps when looking on lower real layer, we can store the
> overlay dir cache version of 'connected' (connected in this case
> may be an indexed merge dir).
> After we take 'connected' dir mutex, we cannot check that
> real parent hasn't changes as an indication to no overlay rename
> because overlay rename happens on upper, but we can compare
> the dir cache version of 'connected' dir to the version we stored
> under rename_lock.
> Then we can tell if lower lookup has failed because of some
> permanent error (e.g. middle layer redirect) or because of an
> indexed rename, so we need to restart.
> Maybe that gains us something?
>

OK. I finished reworking the series with these changes on top
of V3 of indexing patches and pushed the work so far to:
https://github.com/amir73il/linux/commits/ovl-nfs-export-wip

There is a new patch at the top:
ovl: retry connect of non-upper dirs on parent rename
that implements the dir cache version compare
it does not use the seqlock.

The modified patch 4/17 mostly affect patches 12/17 and 14/17,
so you may want to continue review on the modified version.

This WIP is tested with xfstests including a test with readonly
no upperdir overlay.

As far as I remember, the only thing remaining to address from V2
review comments so far, that is NOT in the WIP branch is to relax
the cases of copy up on encode.
As for the lookup in icache by fh, I prefer to defer to later time.

Let me know if I forgot anything or if you find anything else that
needs to be addressed.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-17 16:34             ` Amir Goldstein
  2018-01-17 21:36               ` Amir Goldstein
@ 2018-01-18  8:22               ` Miklos Szeredi
  2018-01-18  8:47                 ` Amir Goldstein
  1 sibling, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-18  8:22 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Wed, Jan 17, 2018 at 5:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Jan 17, 2018 at 5:42 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Jan 17, 2018 at 12:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>
>>> [...]
>>>> >>
>>>> >> So, a working algorithm would be going up to the first connected
>>>> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
>>>> >> to finish, since not protected against always racing with renames.
>>>> >> Can we take s_vfs_rename_sem on ovl to prevent that?
>>>> >>
>>>> >
>>>> > Sounds like a simple and good enough solution.
>>>> > Do we really need the locking of parent and restart connect if
>>>> > we take s_vfs_rename_sem around ovl_lookup_real()?
>>>>
>>>> No, but s_vfs_rename_sem is a really heavyweight solution, we should
>>>> do better than that for decoding a file handle.
>>>>
>>>> And we probably don't need anything else, since rename on ancestor
>>>> means renamed dir is connected, and hopefully not evicted from the
>>>> cache until we repeat the walk up.
>>>>
>>>> So need to lock parent, lookup ovl dentry, verify we got the same
>>>> upper, if not retry icache lookup.
>>>>
>>>> Not sure we need to worry about that "hopefully".  Hopefully not.
>>>>
>>>
>>> Something like this??
>>>
>>> This is just the raw fix to patch 4/17 without the icache lookup
>>> that is added by later patches.
>>>
>>> I added rename_lock seqlock around backwalk to connected ancestor
>>> and take_dentry_name_snapshot() for the stability of real name
>>> during overlay lookup.
>>>
>>> I considered also storing OVL_I(d_inode(connected))->version
>>> inside seqlock and comparing it to version in case lookup of child
>>> failed. This could help us distinguish between overlay rename and
>>> underlying rename (overlay dir version did not change) and return
>>> ESTALE instead of restarting lookup in the latter case.
>>> Wasn't sure if that was a good idea and what we loose if we leave it out.
>>>
>>> I tested this code, but only with upper file handles of course
>>> (xfstest generic/467).
>>>
>>> Please let me know what you think.
>>>
>>> Thanks,
>>> Amir.
>>>
>>> ================================================================
>>>
>>> From 337543c3fcdf9323d3720d17ab6fc13e287bbec1 Mon Sep 17 00:00:00 2001
>>> From: Amir Goldstein <amir73il@gmail.com>
>>> Date: Thu, 28 Dec 2017 18:36:16 +0200
>>> Subject: [PATCH v3 4/17] ovl: decode connected upper dir file handles
>>>
>>> Until this change, we decoded upper file handles by instantiating an
>>> overlay dentry from the real upper dentry. This is sufficient to handle
>>> pure upper files, but insufficient to handle merge/impure dirs.
>>>
>>> To that end, if decoded real upper dir is connected and hashed, we
>>> lookup an overlay dentry with the same path as the real upper dir.
>>> If decoded real upper is non-dir, we instantiate a disconnected overlay
>>> dentry as before this change.
>>>
>>> Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
>>> exportfs never needs to call get_parent() and get_name() to reconnect an
>>> upper overlay dir. Because connectable non-dir file handles are not
>>> supported, exportfs will not be able to use fh_to_parent() and get_name()
>>> methods to reconnect a disconnected non-dir to its parent. Therefore, the
>>> methods get_parent() and get_name() are implemented just to print out a
>>> sanity warning and the method fh_to_parent() is implemented to warn the
>>> user that using the 'subtree_check' exportfs option is not supported.
>>>
>>> An alternative approach could have been to implement instantiating of
>>> an overlay directory inode from origin/index and implement get_parent()
>>> and get_name() by calling into underlying fs operations and them
>>> instantiating the overlay parent dir.
>>>
>>> The reasons for not choosing the get_parent() approach were:
>>> - Obtaining a disconnected overlay dir dentry would requires a
>>>   delicate re-factoring of ovl_lookup() to get a dentry with overlay
>>>   parent info. It was preferred to avoid doing that re-factoring unless
>>>   it was proven worthy.
>>> - Going down the path of disconnected dir would mean that the (non
>>>   trivial) code path of d_splice_alias() could be traveled and that
>>>   meant writing more tests and introduces race cases that are very hard
>>>   to hit on purpose. Taking the path of connecting overlay dentry by
>>>   forward lookup is therefore the safe and boring way to avoid surprises.
>>>
>>> The culprit of the chosen "connected overlay dentry" approach:
>>> - We need to take special care to rename of ancestors while connecting
>>>   the overlay dentry by real dentry path. These subtleties are usually
>>>   handled by generic exportfs and VFS code.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>>  fs/overlayfs/export.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 214 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>> index 557c29928e98..35f37a72d55e 100644
>>> --- a/fs/overlayfs/export.c
>>> +++ b/fs/overlayfs/export.c
>>> @@ -130,6 +130,188 @@ static struct dentry *ovl_obtain_alias(struct
>>> super_block *sb,
>>>         return dentry;
>>>  }
>>>
>>> +/*
>>> + * Lookup a child overlay dentry whose real dentry is @real.
>>> + * If @real is on upper layer, we lookup a child overlay dentry with the same
>>> + * name as the real dentry. Otherwise, we need to consult index for lookup.
>>> + */
>>> +static struct dentry *ovl_lookup_real_one(struct dentry *parent,
>>> +                                         struct dentry *real,
>>> +                                         struct ovl_layer *layer)
>>> +{
>>> +       struct dentry *this;
>>> +       struct name_snapshot name;
>>> +       int err;
>>> +
>>> +       /* TODO: use index when looking up by lower real dentry */
>>> +       if (layer->idx)
>>> +               return ERR_PTR(-EACCES);
>>> +
>>> +       /*
>>> +        * Lookup overlay dentry by real name. The parent mutex protects us
>>> +        * from racing with overlay rename. If the overlay dentry that is
>>> +        * above real has already been moved to a different parent, then this
>>> +        * lookup will fail to find a child dentry whose real dentry is @real
>>> +        * and we will have to restart the lookup of real path from the top.
>>> +        *
>>> +        * We also need to take a snapshot of real dentry name to protect us
>>> +        * from racing with underlying layer rename. In this case, we don't
>>> +        * care about returning ESTALE, only from referencing a free name
>>> +        * pointer.
>>> +        *
>>> +        * TODO: try to lookup the renamed overlay dentry in inode cache by
>>> +        *       real inode.
>>> +        */
>>> +       inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
>>> +       take_dentry_name_snapshot(&name, real);
>>
>> No need to snapshot, just check if parent hasn't changed after
>> locking.  If parent is same, then name is guaranteed to be stable.
>>
>
> I don't understand.
> We are not holding a lock on real parent, only on overlay parent.
> What makes the real name stable?
> The snapshot is not to protect from racing with overlay rename.
> The snapshot is for protecting from race with real rename, just to
> make sure we don't dereference a stale name pointer.

Ah,okay.

>
>> This also means, that only ESTALE is possible after this.   And ESTALE
>> is fatal, no need to retry after that.
>
> OK. I will return ECHILD for parent that has changed
> and will restart only on ECHILD.
>
>>
>>> +       this = lookup_one_len(name.name, parent, strlen(name.name));
>>> +       err = PTR_ERR(this);
>>> +       if (IS_ERR(this)) {
>>> +               goto fail;
>>> +       } else if (!this || !this->d_inode) {
>>> +               dput(this);
>>> +               err = -ENOENT;
>>> +               goto fail;
>>> +       } else if (ovl_dentry_upper(this) != real) {
>>> +               dput(this);
>>> +               err = -ESTALE;
>>> +               goto fail;
>>> +       }
>>> +
>>> +out:
>>> +       release_dentry_name_snapshot(&name);
>>> +       inode_unlock(d_inode(parent));
>>> +       return this;
>>> +
>>> +fail:
>>> +       pr_warn_ratelimited("overlayfs: failed to lookup one by real
>>> (%pd2, layer=%d, parent=%pd2, err=%i)\n",
>>> +                           real, layer->idx, parent, err);
>>> +       this = ERR_PTR(err);
>>> +       goto out;
>>> +}
>>> +
>>> +/*
>>> + * Lookup an overlay dentry whose real dentry is @real.
>>> + * If @real is on upper layer, we lookup a child overlay dentry with the same
>>> + * path the real dentry. Otherwise, we need to consult index for lookup.
>>> + */
>>> +static struct dentry *ovl_lookup_real(struct super_block *sb,
>>> +                                     struct dentry *real,
>>> +                                     struct ovl_layer *layer)
>>> +{
>>> +       struct dentry *connected;
>>> +       unsigned int seq;
>>> +       int err = 0;
>>> +
>>> +       /* TODO: use index when looking up by lower real dentry */
>>> +       if (layer->idx)
>>> +               return ERR_PTR(-EACCES);
>>> +
>>> +       connected = dget(sb->s_root);
>>> +       while (!err) {
>>> +               struct dentry *next, *this;
>>> +               struct dentry *parent = NULL;
>>> +               struct dentry *real_connected = layer->mnt->mnt_root;
>>> +
>>> +               if (real_connected == real)
>>> +                       break;
>>
>> This loop will never finish, since real_connected is mnt_root now.
>> Would be nice if there was a guaranteed way to finish this without
>> icache lookup, but I don't see how.
>>
>
> That's a bug.
> The correct code is:
>
> struct dentry *real_connected = ovl_dentry_upper(connected);

No, that's still broken, because if something gets renamed below
"connected", we'll fall off root.  So need to stop at real_connected
AND at root of overlay.

>
>
>>> +
>>> +               /*
>>> +                * Find the topmost dentry not yet connected. Taking rename_lock
>>> +                * so at least we don't race with rename when walking back to
>>> +                * 'real_connected'.
>>> +                */
>>> +               seq = read_seqbegin(&rename_lock);
>>
>> I don't see what we gain with this.
>>
>
> I can't say that I do see it, but perhaps there is something yet
> to be gained by adding this later for lower layers lookup.
> Perhaps when looking on lower real layer, we can store the
> overlay dir cache version of 'connected' (connected in this case
> may be an indexed merge dir).
> After we take 'connected' dir mutex, we cannot check that
> real parent hasn't changes as an indication to no overlay rename
> because overlay rename happens on upper, but we can compare
> the dir cache version of 'connected' dir to the version we stored
> under rename_lock.
> Then we can tell if lower lookup has failed because of some
> permanent error (e.g. middle layer redirect) or because of an
> indexed rename, so we need to restart.
> Maybe that gains us something?

Sorry, couldn't follow that.

I don't see need for additional locking apart from the one in
ovl_lookup_by_real_one().  Because the only race remaining is eviction
of overlay inode(s) from the icache, and no locking is going to
prevent that.

To get a fully race-free version, we'd need to abandon returning a
connected dir from decode_fh().

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-18  8:22               ` Miklos Szeredi
@ 2018-01-18  8:47                 ` Amir Goldstein
  2018-01-18  9:12                   ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-18  8:47 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 18, 2018 at 10:22 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Jan 17, 2018 at 5:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Wed, Jan 17, 2018 at 5:42 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Wed, Jan 17, 2018 at 12:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>
>>>> [...]
>>>>> >>
>>>>> >> So, a working algorithm would be going up to the first connected
>>>>> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
>>>>> >> to finish, since not protected against always racing with renames.
>>>>> >> Can we take s_vfs_rename_sem on ovl to prevent that?
>>>>> >>
>>>>> >
>>>>> > Sounds like a simple and good enough solution.
>>>>> > Do we really need the locking of parent and restart connect if
>>>>> > we take s_vfs_rename_sem around ovl_lookup_real()?
>>>>>
>>>>> No, but s_vfs_rename_sem is a really heavyweight solution, we should
>>>>> do better than that for decoding a file handle.
>>>>>
>>>>> And we probably don't need anything else, since rename on ancestor
>>>>> means renamed dir is connected, and hopefully not evicted from the
>>>>> cache until we repeat the walk up.
>>>>>
>>>>> So need to lock parent, lookup ovl dentry, verify we got the same
>>>>> upper, if not retry icache lookup.
>>>>>
>>>>> Not sure we need to worry about that "hopefully".  Hopefully not.
>>>>>
>>>>
>>>> Something like this??
>>>>
>>>> This is just the raw fix to patch 4/17 without the icache lookup
>>>> that is added by later patches.
>>>>
>>>> I added rename_lock seqlock around backwalk to connected ancestor
>>>> and take_dentry_name_snapshot() for the stability of real name
>>>> during overlay lookup.
>>>>
>>>> I considered also storing OVL_I(d_inode(connected))->version
>>>> inside seqlock and comparing it to version in case lookup of child
>>>> failed. This could help us distinguish between overlay rename and
>>>> underlying rename (overlay dir version did not change) and return
>>>> ESTALE instead of restarting lookup in the latter case.
>>>> Wasn't sure if that was a good idea and what we loose if we leave it out.
>>>>
>>>> I tested this code, but only with upper file handles of course
>>>> (xfstest generic/467).
>>>>
>>>> Please let me know what you think.
>>>>
>>>> Thanks,
>>>> Amir.
>>>>
>>>> ================================================================
>>>>
>>>> From 337543c3fcdf9323d3720d17ab6fc13e287bbec1 Mon Sep 17 00:00:00 2001
>>>> From: Amir Goldstein <amir73il@gmail.com>
>>>> Date: Thu, 28 Dec 2017 18:36:16 +0200
>>>> Subject: [PATCH v3 4/17] ovl: decode connected upper dir file handles
>>>>
>>>> Until this change, we decoded upper file handles by instantiating an
>>>> overlay dentry from the real upper dentry. This is sufficient to handle
>>>> pure upper files, but insufficient to handle merge/impure dirs.
>>>>
>>>> To that end, if decoded real upper dir is connected and hashed, we
>>>> lookup an overlay dentry with the same path as the real upper dir.
>>>> If decoded real upper is non-dir, we instantiate a disconnected overlay
>>>> dentry as before this change.
>>>>
>>>> Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
>>>> exportfs never needs to call get_parent() and get_name() to reconnect an
>>>> upper overlay dir. Because connectable non-dir file handles are not
>>>> supported, exportfs will not be able to use fh_to_parent() and get_name()
>>>> methods to reconnect a disconnected non-dir to its parent. Therefore, the
>>>> methods get_parent() and get_name() are implemented just to print out a
>>>> sanity warning and the method fh_to_parent() is implemented to warn the
>>>> user that using the 'subtree_check' exportfs option is not supported.
>>>>
>>>> An alternative approach could have been to implement instantiating of
>>>> an overlay directory inode from origin/index and implement get_parent()
>>>> and get_name() by calling into underlying fs operations and them
>>>> instantiating the overlay parent dir.
>>>>
>>>> The reasons for not choosing the get_parent() approach were:
>>>> - Obtaining a disconnected overlay dir dentry would requires a
>>>>   delicate re-factoring of ovl_lookup() to get a dentry with overlay
>>>>   parent info. It was preferred to avoid doing that re-factoring unless
>>>>   it was proven worthy.
>>>> - Going down the path of disconnected dir would mean that the (non
>>>>   trivial) code path of d_splice_alias() could be traveled and that
>>>>   meant writing more tests and introduces race cases that are very hard
>>>>   to hit on purpose. Taking the path of connecting overlay dentry by
>>>>   forward lookup is therefore the safe and boring way to avoid surprises.
>>>>
>>>> The culprit of the chosen "connected overlay dentry" approach:
>>>> - We need to take special care to rename of ancestors while connecting
>>>>   the overlay dentry by real dentry path. These subtleties are usually
>>>>   handled by generic exportfs and VFS code.
>>>>
>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>> ---
[...]
>>>> +/*
>>>> + * Lookup an overlay dentry whose real dentry is @real.
>>>> + * If @real is on upper layer, we lookup a child overlay dentry with the same
>>>> + * path the real dentry. Otherwise, we need to consult index for lookup.
>>>> + */
>>>> +static struct dentry *ovl_lookup_real(struct super_block *sb,
>>>> +                                     struct dentry *real,
>>>> +                                     struct ovl_layer *layer)
>>>> +{
>>>> +       struct dentry *connected;
>>>> +       unsigned int seq;
>>>> +       int err = 0;
>>>> +
>>>> +       /* TODO: use index when looking up by lower real dentry */
>>>> +       if (layer->idx)
>>>> +               return ERR_PTR(-EACCES);
>>>> +
>>>> +       connected = dget(sb->s_root);
>>>> +       while (!err) {
>>>> +               struct dentry *next, *this;
>>>> +               struct dentry *parent = NULL;
>>>> +               struct dentry *real_connected = layer->mnt->mnt_root;
>>>> +
>>>> +               if (real_connected == real)
>>>> +                       break;
>>>
>>> This loop will never finish, since real_connected is mnt_root now.
>>> Would be nice if there was a guaranteed way to finish this without
>>> icache lookup, but I don't see how.
>>>
>>
>> That's a bug.
>> The correct code is:
>>
>> struct dentry *real_connected = ovl_dentry_upper(connected);
>
> No, that's still broken, because if something gets renamed below
> "connected", we'll fall off root.  So need to stop at real_connected
> AND at root of overlay.
>

Right. Will fix that.

>>
>>
>>>> +
>>>> +               /*
>>>> +                * Find the topmost dentry not yet connected. Taking rename_lock
>>>> +                * so at least we don't race with rename when walking back to
>>>> +                * 'real_connected'.
>>>> +                */
>>>> +               seq = read_seqbegin(&rename_lock);
>>>
>>> I don't see what we gain with this.
>>>
>>
>> I can't say that I do see it, but perhaps there is something yet
>> to be gained by adding this later for lower layers lookup.
>> Perhaps when looking on lower real layer, we can store the
>> overlay dir cache version of 'connected' (connected in this case
>> may be an indexed merge dir).
>> After we take 'connected' dir mutex, we cannot check that
>> real parent hasn't changes as an indication to no overlay rename
>> because overlay rename happens on upper, but we can compare
>> the dir cache version of 'connected' dir to the version we stored
>> under rename_lock.
>> Then we can tell if lower lookup has failed because of some
>> permanent error (e.g. middle layer redirect) or because of an
>> indexed rename, so we need to restart.
>> Maybe that gains us something?
>
> Sorry, couldn't follow that.
>
> I don't see need for additional locking apart from the one in
> ovl_lookup_by_real_one().  Because the only race remaining is eviction
> of overlay inode(s) from the icache, and no locking is going to
> prevent that.
>

Right. No need for the rename seqlock.
The reason we need the cache version check for connecting by
lower real is to end the retry loop in case there is a permanent
decode error due to middle layer redirects and we did not mitigate
it with copy up on encode.

BTW, with read-only lower-only overlay mount we cannot mitigate
middle layer redirect with copy up, so we need to either fail encode
or fail decode. Failing decode is better, because it still allows nfsd
to work when dentry remains in cache.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-18  8:47                 ` Amir Goldstein
@ 2018-01-18  9:12                   ` Miklos Szeredi
  2018-01-18 10:28                     ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-18  9:12 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 18, 2018 at 9:47 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Jan 18, 2018 at 10:22 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Jan 17, 2018 at 5:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Wed, Jan 17, 2018 at 5:42 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>> On Wed, Jan 17, 2018 at 12:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>> On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>>
>>>>> [...]
>>>>>> >>
>>>>>> >> So, a working algorithm would be going up to the first connected
>>>>>> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
>>>>>> >> to finish, since not protected against always racing with renames.
>>>>>> >> Can we take s_vfs_rename_sem on ovl to prevent that?
>>>>>> >>
>>>>>> >
>>>>>> > Sounds like a simple and good enough solution.
>>>>>> > Do we really need the locking of parent and restart connect if
>>>>>> > we take s_vfs_rename_sem around ovl_lookup_real()?
>>>>>>
>>>>>> No, but s_vfs_rename_sem is a really heavyweight solution, we should
>>>>>> do better than that for decoding a file handle.
>>>>>>
>>>>>> And we probably don't need anything else, since rename on ancestor
>>>>>> means renamed dir is connected, and hopefully not evicted from the
>>>>>> cache until we repeat the walk up.
>>>>>>
>>>>>> So need to lock parent, lookup ovl dentry, verify we got the same
>>>>>> upper, if not retry icache lookup.
>>>>>>
>>>>>> Not sure we need to worry about that "hopefully".  Hopefully not.
>>>>>>
>>>>>
>>>>> Something like this??
>>>>>
>>>>> This is just the raw fix to patch 4/17 without the icache lookup
>>>>> that is added by later patches.
>>>>>
>>>>> I added rename_lock seqlock around backwalk to connected ancestor
>>>>> and take_dentry_name_snapshot() for the stability of real name
>>>>> during overlay lookup.
>>>>>
>>>>> I considered also storing OVL_I(d_inode(connected))->version
>>>>> inside seqlock and comparing it to version in case lookup of child
>>>>> failed. This could help us distinguish between overlay rename and
>>>>> underlying rename (overlay dir version did not change) and return
>>>>> ESTALE instead of restarting lookup in the latter case.
>>>>> Wasn't sure if that was a good idea and what we loose if we leave it out.
>>>>>
>>>>> I tested this code, but only with upper file handles of course
>>>>> (xfstest generic/467).
>>>>>
>>>>> Please let me know what you think.
>>>>>
>>>>> Thanks,
>>>>> Amir.
>>>>>
>>>>> ================================================================
>>>>>
>>>>> From 337543c3fcdf9323d3720d17ab6fc13e287bbec1 Mon Sep 17 00:00:00 2001
>>>>> From: Amir Goldstein <amir73il@gmail.com>
>>>>> Date: Thu, 28 Dec 2017 18:36:16 +0200
>>>>> Subject: [PATCH v3 4/17] ovl: decode connected upper dir file handles
>>>>>
>>>>> Until this change, we decoded upper file handles by instantiating an
>>>>> overlay dentry from the real upper dentry. This is sufficient to handle
>>>>> pure upper files, but insufficient to handle merge/impure dirs.
>>>>>
>>>>> To that end, if decoded real upper dir is connected and hashed, we
>>>>> lookup an overlay dentry with the same path as the real upper dir.
>>>>> If decoded real upper is non-dir, we instantiate a disconnected overlay
>>>>> dentry as before this change.
>>>>>
>>>>> Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
>>>>> exportfs never needs to call get_parent() and get_name() to reconnect an
>>>>> upper overlay dir. Because connectable non-dir file handles are not
>>>>> supported, exportfs will not be able to use fh_to_parent() and get_name()
>>>>> methods to reconnect a disconnected non-dir to its parent. Therefore, the
>>>>> methods get_parent() and get_name() are implemented just to print out a
>>>>> sanity warning and the method fh_to_parent() is implemented to warn the
>>>>> user that using the 'subtree_check' exportfs option is not supported.
>>>>>
>>>>> An alternative approach could have been to implement instantiating of
>>>>> an overlay directory inode from origin/index and implement get_parent()
>>>>> and get_name() by calling into underlying fs operations and them
>>>>> instantiating the overlay parent dir.
>>>>>
>>>>> The reasons for not choosing the get_parent() approach were:
>>>>> - Obtaining a disconnected overlay dir dentry would requires a
>>>>>   delicate re-factoring of ovl_lookup() to get a dentry with overlay
>>>>>   parent info. It was preferred to avoid doing that re-factoring unless
>>>>>   it was proven worthy.
>>>>> - Going down the path of disconnected dir would mean that the (non
>>>>>   trivial) code path of d_splice_alias() could be traveled and that
>>>>>   meant writing more tests and introduces race cases that are very hard
>>>>>   to hit on purpose. Taking the path of connecting overlay dentry by
>>>>>   forward lookup is therefore the safe and boring way to avoid surprises.
>>>>>
>>>>> The culprit of the chosen "connected overlay dentry" approach:
>>>>> - We need to take special care to rename of ancestors while connecting
>>>>>   the overlay dentry by real dentry path. These subtleties are usually
>>>>>   handled by generic exportfs and VFS code.
>>>>>
>>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>>> ---
> [...]
>>>>> +/*
>>>>> + * Lookup an overlay dentry whose real dentry is @real.
>>>>> + * If @real is on upper layer, we lookup a child overlay dentry with the same
>>>>> + * path the real dentry. Otherwise, we need to consult index for lookup.
>>>>> + */
>>>>> +static struct dentry *ovl_lookup_real(struct super_block *sb,
>>>>> +                                     struct dentry *real,
>>>>> +                                     struct ovl_layer *layer)
>>>>> +{
>>>>> +       struct dentry *connected;
>>>>> +       unsigned int seq;
>>>>> +       int err = 0;
>>>>> +
>>>>> +       /* TODO: use index when looking up by lower real dentry */
>>>>> +       if (layer->idx)
>>>>> +               return ERR_PTR(-EACCES);
>>>>> +
>>>>> +       connected = dget(sb->s_root);
>>>>> +       while (!err) {
>>>>> +               struct dentry *next, *this;
>>>>> +               struct dentry *parent = NULL;
>>>>> +               struct dentry *real_connected = layer->mnt->mnt_root;
>>>>> +
>>>>> +               if (real_connected == real)
>>>>> +                       break;
>>>>
>>>> This loop will never finish, since real_connected is mnt_root now.
>>>> Would be nice if there was a guaranteed way to finish this without
>>>> icache lookup, but I don't see how.
>>>>
>>>
>>> That's a bug.
>>> The correct code is:
>>>
>>> struct dentry *real_connected = ovl_dentry_upper(connected);
>>
>> No, that's still broken, because if something gets renamed below
>> "connected", we'll fall off root.  So need to stop at real_connected
>> AND at root of overlay.
>>
>
> Right. Will fix that.
>
>>>
>>>
>>>>> +
>>>>> +               /*
>>>>> +                * Find the topmost dentry not yet connected. Taking rename_lock
>>>>> +                * so at least we don't race with rename when walking back to
>>>>> +                * 'real_connected'.
>>>>> +                */
>>>>> +               seq = read_seqbegin(&rename_lock);
>>>>
>>>> I don't see what we gain with this.
>>>>
>>>
>>> I can't say that I do see it, but perhaps there is something yet
>>> to be gained by adding this later for lower layers lookup.
>>> Perhaps when looking on lower real layer, we can store the
>>> overlay dir cache version of 'connected' (connected in this case
>>> may be an indexed merge dir).
>>> After we take 'connected' dir mutex, we cannot check that
>>> real parent hasn't changes as an indication to no overlay rename
>>> because overlay rename happens on upper, but we can compare
>>> the dir cache version of 'connected' dir to the version we stored
>>> under rename_lock.
>>> Then we can tell if lower lookup has failed because of some
>>> permanent error (e.g. middle layer redirect) or because of an
>>> indexed rename, so we need to restart.
>>> Maybe that gains us something?
>>
>> Sorry, couldn't follow that.
>>
>> I don't see need for additional locking apart from the one in
>> ovl_lookup_by_real_one().  Because the only race remaining is eviction
>> of overlay inode(s) from the icache, and no locking is going to
>> prevent that.
>>
>
> Right. No need for the rename seqlock.
> The reason we need the cache version check for connecting by
> lower real is to end the retry loop in case there is a permanent
> decode error due to middle layer redirects and we did not mitigate
> it with copy up on encode.

Please try to explain, because I don't get what's the issue is here.

>
> BTW, with read-only lower-only overlay mount we cannot mitigate
> middle layer redirect with copy up, so we need to either fail encode
> or fail decode. Failing decode is better, because it still allows nfsd
> to work when dentry remains in cache.

OTOH, getting an error reliably can be better than getting an them
intermittently, because testing will reveal the first case better.
So I'd suggest we error out on mount if no upper and redirects
enabled.  This can actually be worked around by having a dummy upper
layer, but mounting the overlay read-only, so the upper is only there
for indexing the lower redirects.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/17] ovl: decode connected upper dir file handles
  2018-01-18  9:12                   ` Miklos Szeredi
@ 2018-01-18 10:28                     ` Amir Goldstein
  0 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-18 10:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 18, 2018 at 11:12 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jan 18, 2018 at 9:47 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Thu, Jan 18, 2018 at 10:22 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Wed, Jan 17, 2018 at 5:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> On Wed, Jan 17, 2018 at 5:42 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>> On Wed, Jan 17, 2018 at 12:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>>> On Mon, Jan 15, 2018 at 4:56 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>>>
>>>>>> [...]
>>>>>>> >>
>>>>>>> >> So, a working algorithm would be going up to the first connected
>>>>>>> >> parent or root, lock parent, lookup name and restart.  Not guaranteed
>>>>>>> >> to finish, since not protected against always racing with renames.
>>>>>>> >> Can we take s_vfs_rename_sem on ovl to prevent that?
>>>>>>> >>
>>>>>>> >
>>>>>>> > Sounds like a simple and good enough solution.
>>>>>>> > Do we really need the locking of parent and restart connect if
>>>>>>> > we take s_vfs_rename_sem around ovl_lookup_real()?
>>>>>>>
>>>>>>> No, but s_vfs_rename_sem is a really heavyweight solution, we should
>>>>>>> do better than that for decoding a file handle.
>>>>>>>
>>>>>>> And we probably don't need anything else, since rename on ancestor
>>>>>>> means renamed dir is connected, and hopefully not evicted from the
>>>>>>> cache until we repeat the walk up.
>>>>>>>
>>>>>>> So need to lock parent, lookup ovl dentry, verify we got the same
>>>>>>> upper, if not retry icache lookup.
>>>>>>>
>>>>>>> Not sure we need to worry about that "hopefully".  Hopefully not.
>>>>>>>
>>>>>>
>>>>>> Something like this??
>>>>>>
>>>>>> This is just the raw fix to patch 4/17 without the icache lookup
>>>>>> that is added by later patches.
>>>>>>
>>>>>> I added rename_lock seqlock around backwalk to connected ancestor
>>>>>> and take_dentry_name_snapshot() for the stability of real name
>>>>>> during overlay lookup.
>>>>>>
>>>>>> I considered also storing OVL_I(d_inode(connected))->version
>>>>>> inside seqlock and comparing it to version in case lookup of child
>>>>>> failed. This could help us distinguish between overlay rename and
>>>>>> underlying rename (overlay dir version did not change) and return
>>>>>> ESTALE instead of restarting lookup in the latter case.
>>>>>> Wasn't sure if that was a good idea and what we loose if we leave it out.
>>>>>>
>>>>>> I tested this code, but only with upper file handles of course
>>>>>> (xfstest generic/467).
>>>>>>
>>>>>> Please let me know what you think.
>>>>>>
>>>>>> Thanks,
>>>>>> Amir.
>>>>>>
>>>>>> ================================================================
>>>>>>
>>>>>> From 337543c3fcdf9323d3720d17ab6fc13e287bbec1 Mon Sep 17 00:00:00 2001
>>>>>> From: Amir Goldstein <amir73il@gmail.com>
>>>>>> Date: Thu, 28 Dec 2017 18:36:16 +0200
>>>>>> Subject: [PATCH v3 4/17] ovl: decode connected upper dir file handles
>>>>>>
>>>>>> Until this change, we decoded upper file handles by instantiating an
>>>>>> overlay dentry from the real upper dentry. This is sufficient to handle
>>>>>> pure upper files, but insufficient to handle merge/impure dirs.
>>>>>>
>>>>>> To that end, if decoded real upper dir is connected and hashed, we
>>>>>> lookup an overlay dentry with the same path as the real upper dir.
>>>>>> If decoded real upper is non-dir, we instantiate a disconnected overlay
>>>>>> dentry as before this change.
>>>>>>
>>>>>> Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
>>>>>> exportfs never needs to call get_parent() and get_name() to reconnect an
>>>>>> upper overlay dir. Because connectable non-dir file handles are not
>>>>>> supported, exportfs will not be able to use fh_to_parent() and get_name()
>>>>>> methods to reconnect a disconnected non-dir to its parent. Therefore, the
>>>>>> methods get_parent() and get_name() are implemented just to print out a
>>>>>> sanity warning and the method fh_to_parent() is implemented to warn the
>>>>>> user that using the 'subtree_check' exportfs option is not supported.
>>>>>>
>>>>>> An alternative approach could have been to implement instantiating of
>>>>>> an overlay directory inode from origin/index and implement get_parent()
>>>>>> and get_name() by calling into underlying fs operations and them
>>>>>> instantiating the overlay parent dir.
>>>>>>
>>>>>> The reasons for not choosing the get_parent() approach were:
>>>>>> - Obtaining a disconnected overlay dir dentry would requires a
>>>>>>   delicate re-factoring of ovl_lookup() to get a dentry with overlay
>>>>>>   parent info. It was preferred to avoid doing that re-factoring unless
>>>>>>   it was proven worthy.
>>>>>> - Going down the path of disconnected dir would mean that the (non
>>>>>>   trivial) code path of d_splice_alias() could be traveled and that
>>>>>>   meant writing more tests and introduces race cases that are very hard
>>>>>>   to hit on purpose. Taking the path of connecting overlay dentry by
>>>>>>   forward lookup is therefore the safe and boring way to avoid surprises.
>>>>>>
>>>>>> The culprit of the chosen "connected overlay dentry" approach:
>>>>>> - We need to take special care to rename of ancestors while connecting
>>>>>>   the overlay dentry by real dentry path. These subtleties are usually
>>>>>>   handled by generic exportfs and VFS code.
>>>>>>
>>>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>>>> ---
>> [...]
>>>>>> +/*
>>>>>> + * Lookup an overlay dentry whose real dentry is @real.
>>>>>> + * If @real is on upper layer, we lookup a child overlay dentry with the same
>>>>>> + * path the real dentry. Otherwise, we need to consult index for lookup.
>>>>>> + */
>>>>>> +static struct dentry *ovl_lookup_real(struct super_block *sb,
>>>>>> +                                     struct dentry *real,
>>>>>> +                                     struct ovl_layer *layer)
>>>>>> +{
>>>>>> +       struct dentry *connected;
>>>>>> +       unsigned int seq;
>>>>>> +       int err = 0;
>>>>>> +
>>>>>> +       /* TODO: use index when looking up by lower real dentry */
>>>>>> +       if (layer->idx)
>>>>>> +               return ERR_PTR(-EACCES);
>>>>>> +
>>>>>> +       connected = dget(sb->s_root);
>>>>>> +       while (!err) {
>>>>>> +               struct dentry *next, *this;
>>>>>> +               struct dentry *parent = NULL;
>>>>>> +               struct dentry *real_connected = layer->mnt->mnt_root;
>>>>>> +
>>>>>> +               if (real_connected == real)
>>>>>> +                       break;
>>>>>
>>>>> This loop will never finish, since real_connected is mnt_root now.
>>>>> Would be nice if there was a guaranteed way to finish this without
>>>>> icache lookup, but I don't see how.
>>>>>
>>>>
>>>> That's a bug.
>>>> The correct code is:
>>>>
>>>> struct dentry *real_connected = ovl_dentry_upper(connected);
>>>
>>> No, that's still broken, because if something gets renamed below
>>> "connected", we'll fall off root.  So need to stop at real_connected
>>> AND at root of overlay.
>>>
>>
>> Right. Will fix that.

Fix pushed to ovl-nfs-export-wip branch.

>>
>>>>
>>>>
>>>>>> +
>>>>>> +               /*
>>>>>> +                * Find the topmost dentry not yet connected. Taking rename_lock
>>>>>> +                * so at least we don't race with rename when walking back to
>>>>>> +                * 'real_connected'.
>>>>>> +                */
>>>>>> +               seq = read_seqbegin(&rename_lock);
>>>>>
>>>>> I don't see what we gain with this.
>>>>>
>>>>
>>>> I can't say that I do see it, but perhaps there is something yet
>>>> to be gained by adding this later for lower layers lookup.
>>>> Perhaps when looking on lower real layer, we can store the
>>>> overlay dir cache version of 'connected' (connected in this case
>>>> may be an indexed merge dir).
>>>> After we take 'connected' dir mutex, we cannot check that
>>>> real parent hasn't changes as an indication to no overlay rename
>>>> because overlay rename happens on upper, but we can compare
>>>> the dir cache version of 'connected' dir to the version we stored
>>>> under rename_lock.
>>>> Then we can tell if lower lookup has failed because of some
>>>> permanent error (e.g. middle layer redirect) or because of an
>>>> indexed rename, so we need to restart.
>>>> Maybe that gains us something?
>>>
>>> Sorry, couldn't follow that.
>>>
>>> I don't see need for additional locking apart from the one in
>>> ovl_lookup_by_real_one().  Because the only race remaining is eviction
>>> of overlay inode(s) from the icache, and no locking is going to
>>> prevent that.
>>>
>>
>> Right. No need for the rename seqlock.
>> The reason we need the cache version check for connecting by
>> lower real is to end the retry loop in case there is a permanent
>> decode error due to middle layer redirects and we did not mitigate
>> it with copy up on encode.
>
> Please try to explain, because I don't get what's the issue is here.
>

That's probably because there is no issue.
Please disregard.
I removed this patch from ovl-nfs-export-wip.

>>
>> BTW, with read-only lower-only overlay mount we cannot mitigate
>> middle layer redirect with copy up, so we need to either fail encode
>> or fail decode. Failing decode is better, because it still allows nfsd
>> to work when dentry remains in cache.
>
> OTOH, getting an error reliably can be better than getting an them
> intermittently, because testing will reveal the first case better.
> So I'd suggest we error out on mount if no upper and redirects
> enabled.

That sounds much easier.
But no need to error out, just need to fallback to nfs_export=off
and print a warning like we do with no xattr/fh support.
I will do that.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 02/17] ovl: encode pure upper file handles
  2018-01-04 17:20 ` [PATCH v2 02/17] ovl: encode pure upper file handles Amir Goldstein
@ 2018-01-18 10:31   ` Miklos Szeredi
  0 siblings, 0 replies; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-18 10:31 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, linux-fsdevel

On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Encode overlay file handles as struct ovl_fh containing the file handle
> encoding of the real upper inode.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/Makefile    |  3 +-
>  fs/overlayfs/export.c    | 98 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/overlayfs/overlayfs.h |  6 +++
>  3 files changed, 106 insertions(+), 1 deletion(-)
>  create mode 100644 fs/overlayfs/export.c
>
> diff --git a/fs/overlayfs/Makefile b/fs/overlayfs/Makefile
> index 99373bbc1478..30802347a020 100644
> --- a/fs/overlayfs/Makefile
> +++ b/fs/overlayfs/Makefile
> @@ -4,4 +4,5 @@
>
>  obj-$(CONFIG_OVERLAY_FS) += overlay.o
>
> -overlay-objs := super.o namei.o util.o inode.o dir.o readdir.o copy_up.o
> +overlay-objs := super.o namei.o util.o inode.o dir.o readdir.o copy_up.o \
> +               export.o
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> new file mode 100644
> index 000000000000..58c4f5e8a67e
> --- /dev/null
> +++ b/fs/overlayfs/export.c
> @@ -0,0 +1,98 @@
> +/*
> + * Overlayfs NFS export support.
> + *
> + * Amir Goldstein <amir73il@gmail.com>
> + *
> + * Copyright (C) 2017 CTERA Networks. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/cred.h>
> +#include <linux/mount.h>
> +#include <linux/namei.h>
> +#include <linux/xattr.h>
> +#include <linux/exportfs.h>
> +#include <linux/ratelimit.h>
> +#include "overlayfs.h"
> +
> +int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)

static ...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 09/17] ovl: decode indexed non-dir file handles
  2018-01-04 17:20 ` [PATCH v2 09/17] ovl: decode indexed " Amir Goldstein
@ 2018-01-18 13:11   ` Miklos Szeredi
  0 siblings, 0 replies; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-18 13:11 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, linux-fsdevel

On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Decoding an indexed non-dir file handle is similar to decoding a lower
> non-dir file handle, but additionally, we lookup the file handle in index
> dir by name to find the real upper inode.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/export.c | 72 +++++++++++++++++++++++++++++++++------------------
>  1 file changed, 47 insertions(+), 25 deletions(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 6b359f968c01..602bada474ba 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -168,22 +168,24 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
>  }
>
>  /*
> - * Find or instantiate an overlay dentry from real dentries.
> + * Find or instantiate an overlay dentry from real dentries and index.
>   */
>  static struct dentry *ovl_obtain_alias(struct super_block *sb,
> -                                      struct dentry *upper,
> -                                      struct ovl_path *lowerpath)
> +                                      struct dentry *upper_alias,
> +                                      struct ovl_path *lowerpath,
> +                                      struct dentry *index)
>  {
>         struct dentry *lower = lowerpath ? lowerpath->dentry : NULL;
> +       struct dentry *upper = upper_alias ?: index;
>         struct dentry *dentry;
>         struct inode *inode;
>         struct ovl_entry *oe;
>
> -       /* TODO: obtain an indexed non-dir upper with origin */
> -       if (lower && (upper || d_is_dir(lower)))
> +       /* We get overlay directory dentries with ovl_lookup_real() */
> +       if (d_is_dir(upper ?: lower))
>                 return ERR_PTR(-EIO);
>
> -       inode = ovl_get_inode(sb, dget(upper), lower, NULL, !!lower);
> +       inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower);
>         if (IS_ERR(inode)) {
>                 dput(upper);
>                 return ERR_CAST(inode);
> @@ -200,13 +202,16 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
>         }
>
>         dentry->d_fsdata = oe;
> -       if (upper)
> +       if (upper_alias)
>                 ovl_dentry_set_upper_alias(dentry);
>         if (lower) {
>                 oe->lowerstack->dentry = dget(lower);
>                 oe->lowerstack->layer = lowerpath->layer;
>         }
>
> +       if (index)
> +               ovl_set_flag(OVL_INDEX, inode);
> +
>         return dentry;
>  }
>
> @@ -321,30 +326,26 @@ static struct dentry *ovl_lookup_real(struct super_block *sb,
>  }
>
>  /*
> - * Get an overlay dentry from upper/lower real dentries.
> + * Get an overlay dentry from upper/lower real dentries and index.
>   */
>  static struct dentry *ovl_get_dentry(struct super_block *sb,
>                                      struct dentry *upper,
> -                                    struct ovl_path *lowerpath)
> +                                    struct ovl_path *lowerpath,
> +                                    struct dentry *index)
>  {
> +       struct dentry *real = upper ?: (index ?: lowerpath->dentry);
> +
>         /*
> -        * Obtain a disconnected overlay dentry from a disconnected non-dir
> -        * real lower dentry.
> +        * Obtain a disconnected overlay dentry from a non-dir real dentry
> +        * and index.
>          */
> -       if (!upper && !d_is_dir(lowerpath->dentry))
> -               return ovl_obtain_alias(sb, NULL, lowerpath);
> +       if (!d_is_dir(real))
> +               return ovl_obtain_alias(sb, upper, lowerpath, index);
>
>         /* TODO: lookup connected dir from real lower dir */
>         if (!upper)
>                 return ERR_PTR(-EACCES);
>
> -       /*
> -        * Obtain a disconnected overlay dentry from a non-dir real upper
> -        * dentry.
> -        */
> -       if (!d_is_dir(upper))
> -               return ovl_obtain_alias(sb, upper, NULL);
> -
>         /* Removed empty directory? */
>         if ((upper->d_flags & DCACHE_DISCONNECTED) || d_unhashed(upper))
>                 return ERR_PTR(-ENOENT);
> @@ -370,7 +371,7 @@ static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
>         if (IS_ERR_OR_NULL(upper))
>                 return upper;
>
> -       dentry = ovl_get_dentry(sb, upper, NULL);
> +       dentry = ovl_get_dentry(sb, upper, NULL, NULL);
>         dput(upper);
>
>         return dentry;
> @@ -383,17 +384,38 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
>         struct ovl_path origin = { };
>         struct ovl_path *stack = &origin;
>         struct dentry *dentry = NULL;
> +       struct dentry *index = NULL;
>         int err;
>
> +       /* First lookup indexed upper by fh */
> +       index = ovl_get_index_fh(ofs, fh);
> +       err = PTR_ERR(index);
> +       if (IS_ERR(index))
> +               return ERR_PTR(err);
> +
> +       /* Then lookup origin by fh */
>         err = ovl_check_origin_fh(fh, NULL, ofs->lower_layers, ofs->numlower,
>                                   &stack);
> -       if (err)
> -               return ERR_PTR(err);
> +       if (err) {
> +               goto out_err;
> +       } else if (!index && !origin.dentry) {

origin.dentry is not going to be NULL if ovl_check_origin_fh()
returns.  So checking for non-NULL is not needed here

> +               return NULL;
> +       } else if (index && origin.dentry) {

here, and following patch.

> +               err = ovl_verify_origin(index, origin.dentry, false, false);
> +               if (err)
> +                       goto out_err;
> +       }
>
> -       dentry = ovl_get_dentry(sb, NULL, &origin);
> -       dput(origin.dentry);
> +       dentry = ovl_get_dentry(sb, NULL, &origin, index);
>
> +out:
> +       dput(origin.dentry);
> +       dput(index);
>         return dentry;
> +
> +out_err:
> +       dentry = ERR_PTR(err);
> +       goto out;
>  }
>
>  static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-04 17:20 ` [PATCH v2 03/17] ovl: decode " Amir Goldstein
@ 2018-01-18 14:09   ` Miklos Szeredi
  2018-01-18 14:34     ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-18 14:09 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, linux-fsdevel, Al Viro

[Added Al Viro]

On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Decoding an upper file handle is done by decoding the upper dentry from
> underlying upper fs, finding or allocating an overlay inode that is
> hashed by the real upper inode and instantiating an overlay dentry with
> that inode.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/export.c    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/overlayfs/namei.c     |  4 +--
>  fs/overlayfs/overlayfs.h |  2 ++
>  3 files changed, 95 insertions(+), 2 deletions(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 58c4f5e8a67e..5c72784a0b4d 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -93,6 +93,97 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
>         return type;
>  }
>
> +/*
> + * Find or instantiate an overlay dentry from real dentries.
> + */
> +static struct dentry *ovl_obtain_alias(struct super_block *sb,
> +                                      struct dentry *upper,
> +                                      struct ovl_path *lowerpath)
> +{
> +       struct inode *inode;
> +       struct dentry *dentry;
> +       struct ovl_entry *oe;
> +
> +       /* TODO: obtain non pure-upper */
> +       if (lowerpath)
> +               return ERR_PTR(-EIO);
> +
> +       inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
> +       if (IS_ERR(inode)) {
> +               dput(upper);
> +               return ERR_CAST(inode);
> +       }
> +
> +       dentry = d_obtain_alias(inode);
> +       if (IS_ERR(dentry) || dentry->d_fsdata)

Racing two instances of this code, each thinking it got a new alias
and trying to fill it, results in a memory leak.

Haven't checked in too much depth, but apparently other filesystems
are not affected, so we need something special here.

One solution: split d_instantiate_anon(dentry, inode) out of
__d_obtain_alias() and supply that with the already initialized
dentry.

Al?

Thanks,
Miklos

> +               return dentry;
> +
> +       oe = ovl_alloc_entry(0);
> +       if (!oe) {
> +               dput(dentry);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +       dentry->d_fsdata = oe;
> +       ovl_dentry_set_upper_alias(dentry);
> +
> +       return dentry;
> +}
> +
> +static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
> +                                       struct ovl_fh *fh)
> +{
> +       struct ovl_fs *ofs = sb->s_fs_info;
> +       struct dentry *dentry;
> +       struct dentry *upper;
> +
> +       if (!ofs->upper_mnt)
> +               return ERR_PTR(-EACCES);
> +
> +       upper = ovl_decode_fh(fh, ofs->upper_mnt);
> +       if (IS_ERR_OR_NULL(upper))
> +               return upper;
> +
> +       dentry = ovl_obtain_alias(sb, upper, NULL);
> +       dput(upper);
> +
> +       return dentry;
> +}
> +
> +static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
> +                                      int fh_len, int fh_type)
> +{
> +       struct dentry *dentry = NULL;
> +       struct ovl_fh *fh = (struct ovl_fh *) fid;
> +       int len = fh_len << 2;
> +       unsigned int flags = 0;
> +       int err;
> +
> +       err = -EINVAL;
> +       if (fh_type != OVL_FILEID)
> +               goto out_err;
> +
> +       err = ovl_check_fh_len(fh, len);
> +       if (err)
> +               goto out_err;
> +
> +       /* TODO: decode non-upper */
> +       flags = fh->flags;
> +       if (flags & OVL_FH_FLAG_PATH_UPPER)
> +               dentry = ovl_upper_fh_to_d(sb, fh);
> +       err = PTR_ERR(dentry);
> +       if (IS_ERR(dentry) && err != -ESTALE)
> +               goto out_err;
> +
> +       return dentry;
> +
> +out_err:
> +       pr_warn_ratelimited("overlayfs: failed to decode file handle (len=%d, type=%d, flags=%x, err=%i)\n",
> +                           len, fh_type, flags, err);
> +       return ERR_PTR(err);
> +}
> +
>  const struct export_operations ovl_export_operations = {
>         .encode_fh      = ovl_encode_inode_fh,
> +       .fh_to_dentry   = ovl_fh_to_dentry,
>  };
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index a69cedf06000..87d39384dc55 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -107,7 +107,7 @@ static int ovl_acceptable(void *ctx, struct dentry *dentry)
>   * Return -ENODATA for "origin unknown".
>   * Return <0 for an invalid file handle.
>   */
> -static int ovl_check_fh_len(struct ovl_fh *fh, int fh_len)
> +int ovl_check_fh_len(struct ovl_fh *fh, int fh_len)
>  {
>         if (fh_len < sizeof(struct ovl_fh) || fh_len < fh->len)
>                 return -EINVAL;
> @@ -171,7 +171,7 @@ static struct ovl_fh *ovl_get_origin_fh(struct dentry *dentry)
>         goto out;
>  }
>
> -static struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt)
> +struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt)
>  {
>         struct dentry *origin;
>         int bytes;
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index f6fd999cb98e..c4f8e98e209e 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -258,6 +258,8 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
>
>
>  /* namei.c */
> +int ovl_check_fh_len(struct ovl_fh *fh, int fh_len);
> +struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt);
>  int ovl_verify_origin(struct dentry *dentry, struct dentry *origin,
>                       bool is_upper, bool set);
>  int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index);
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files
  2018-01-04 17:20 ` [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files Amir Goldstein
  2018-01-16  9:16   ` Miklos Szeredi
@ 2018-01-18 14:18   ` Amir Goldstein
  2018-02-27 11:35     ` Amir Goldstein
  1 sibling, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-18 14:18 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 4, 2018 at 7:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Lookup overlay inode in cache by origin inode, so we can decode a file
> handle of an open file even if the index has a whiteout index entry to
> mark this overlay inode was unlinked.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/export.c    | 22 ++++++++++++++++++++--
>  fs/overlayfs/inode.c     | 16 ++++++++++++++++
>  fs/overlayfs/overlayfs.h |  1 +
>  3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 602bada474ba..6ecb54d4b52c 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -385,13 +385,21 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
>         struct ovl_path *stack = &origin;
>         struct dentry *dentry = NULL;
>         struct dentry *index = NULL;
> +       struct inode *inode = NULL;
> +       bool is_deleted = false;
>         int err;
>
>         /* First lookup indexed upper by fh */
>         index = ovl_get_index_fh(ofs, fh);
>         err = PTR_ERR(index);
> -       if (IS_ERR(index))
> -               return ERR_PTR(err);
> +       if (IS_ERR(index)) {
> +               if (err != -ESTALE)
> +                       return ERR_PTR(err);
> +
> +               /* Found a whiteout index - treat as deleted inode */
> +               is_deleted = true;
> +               index = NULL;

Ouch! it seems I was misleading you.
If we find a whiteout index for dir, we *do* decode+reconnect origin,
because we want to find out if this is an unlinked but open non-dir.
I guess there are 2 ways to avoid this unneeded decode:
1. mark a "directory index whiteout" differently than "non-dir index whiteout"
2. lookup icache by file handle

> +       }
>
>         /* Then lookup origin by fh */
>         err = ovl_check_origin_fh(fh, NULL, ofs->lower_layers, ofs->numlower,
> @@ -404,6 +412,15 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
>                 err = ovl_verify_origin(index, origin.dentry, false, false);
>                 if (err)
>                         goto out_err;
> +       } else if (is_deleted && origin.dentry && !d_is_dir(origin.dentry)) {
> +               /* Lookup deleted overlay inode by origin inode */
> +               inode = ovl_lookup_inode(sb, origin.dentry);
> +               err = -ESTALE;
> +               if (!inode || atomic_read(&inode->i_count) == 1)
> +                       goto out_err;
> +
> +               /* Deleted but still open? */
> +               index = dget(ovl_i_dentry_upper(inode));
>         }

And to top that up, we even try to lookup the origin dir path in overlay
instead of returning ESTALE right away.
Sheesh! that's embarrassing.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-18 14:09   ` Miklos Szeredi
@ 2018-01-18 14:34     ` Amir Goldstein
  2018-01-18 14:39       ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-18 14:34 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, Al Viro

On Thu, Jan 18, 2018 at 4:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> [Added Al Viro]
>
> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Decoding an upper file handle is done by decoding the upper dentry from
>> underlying upper fs, finding or allocating an overlay inode that is
>> hashed by the real upper inode and instantiating an overlay dentry with
>> that inode.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/export.c    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/overlayfs/namei.c     |  4 +--
>>  fs/overlayfs/overlayfs.h |  2 ++
>>  3 files changed, 95 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>> index 58c4f5e8a67e..5c72784a0b4d 100644
>> --- a/fs/overlayfs/export.c
>> +++ b/fs/overlayfs/export.c
>> @@ -93,6 +93,97 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
>>         return type;
>>  }
>>
>> +/*
>> + * Find or instantiate an overlay dentry from real dentries.
>> + */
>> +static struct dentry *ovl_obtain_alias(struct super_block *sb,
>> +                                      struct dentry *upper,
>> +                                      struct ovl_path *lowerpath)
>> +{
>> +       struct inode *inode;
>> +       struct dentry *dentry;
>> +       struct ovl_entry *oe;
>> +
>> +       /* TODO: obtain non pure-upper */
>> +       if (lowerpath)
>> +               return ERR_PTR(-EIO);
>> +
>> +       inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
>> +       if (IS_ERR(inode)) {
>> +               dput(upper);
>> +               return ERR_CAST(inode);
>> +       }
>> +
>> +       dentry = d_obtain_alias(inode);
>> +       if (IS_ERR(dentry) || dentry->d_fsdata)
>
> Racing two instances of this code, each thinking it got a new alias
> and trying to fill it, results in a memory leak.
>
> Haven't checked in too much depth, but apparently other filesystems
> are not affected, so we need something special here.
>
> One solution: split d_instantiate_anon(dentry, inode) out of
> __d_obtain_alias() and supply that with the already initialized
> dentry.
>

Can't we use &OVL_I(inode)->lock to avoid the race?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-18 14:34     ` Amir Goldstein
@ 2018-01-18 14:39       ` Miklos Szeredi
  2018-01-18 19:49         ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-18 14:39 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, linux-fsdevel, Al Viro

On Thu, Jan 18, 2018 at 3:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Jan 18, 2018 at 4:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> [Added Al Viro]
>>
>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Decoding an upper file handle is done by decoding the upper dentry from
>>> underlying upper fs, finding or allocating an overlay inode that is
>>> hashed by the real upper inode and instantiating an overlay dentry with
>>> that inode.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>>  fs/overlayfs/export.c    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>  fs/overlayfs/namei.c     |  4 +--
>>>  fs/overlayfs/overlayfs.h |  2 ++
>>>  3 files changed, 95 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>> index 58c4f5e8a67e..5c72784a0b4d 100644
>>> --- a/fs/overlayfs/export.c
>>> +++ b/fs/overlayfs/export.c
>>> @@ -93,6 +93,97 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
>>>         return type;
>>>  }
>>>
>>> +/*
>>> + * Find or instantiate an overlay dentry from real dentries.
>>> + */
>>> +static struct dentry *ovl_obtain_alias(struct super_block *sb,
>>> +                                      struct dentry *upper,
>>> +                                      struct ovl_path *lowerpath)
>>> +{
>>> +       struct inode *inode;
>>> +       struct dentry *dentry;
>>> +       struct ovl_entry *oe;
>>> +
>>> +       /* TODO: obtain non pure-upper */
>>> +       if (lowerpath)
>>> +               return ERR_PTR(-EIO);
>>> +
>>> +       inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
>>> +       if (IS_ERR(inode)) {
>>> +               dput(upper);
>>> +               return ERR_CAST(inode);
>>> +       }
>>> +
>>> +       dentry = d_obtain_alias(inode);
>>> +       if (IS_ERR(dentry) || dentry->d_fsdata)
>>
>> Racing two instances of this code, each thinking it got a new alias
>> and trying to fill it, results in a memory leak.
>>
>> Haven't checked in too much depth, but apparently other filesystems
>> are not affected, so we need something special here.
>>
>> One solution: split d_instantiate_anon(dentry, inode) out of
>> __d_obtain_alias() and supply that with the already initialized
>> dentry.
>>
>
> Can't we use &OVL_I(inode)->lock to avoid the race?

We could.  But then d_splice_alias() will find our half baked dentry
and return that from ovl_lookup().  So we do need to have the dentry
fully initialized by the time it's added into the inode's alias list.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-18 14:39       ` Miklos Szeredi
@ 2018-01-18 19:49         ` Amir Goldstein
  2018-01-18 20:10           ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-18 19:49 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, Al Viro

On Thu, Jan 18, 2018 at 4:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jan 18, 2018 at 3:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Thu, Jan 18, 2018 at 4:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> [Added Al Viro]
>>>
>>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> Decoding an upper file handle is done by decoding the upper dentry from
>>>> underlying upper fs, finding or allocating an overlay inode that is
>>>> hashed by the real upper inode and instantiating an overlay dentry with
>>>> that inode.
>>>>
>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>> ---
>>>>  fs/overlayfs/export.c    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  fs/overlayfs/namei.c     |  4 +--
>>>>  fs/overlayfs/overlayfs.h |  2 ++
>>>>  3 files changed, 95 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>>> index 58c4f5e8a67e..5c72784a0b4d 100644
>>>> --- a/fs/overlayfs/export.c
>>>> +++ b/fs/overlayfs/export.c
>>>> @@ -93,6 +93,97 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
>>>>         return type;
>>>>  }
>>>>
>>>> +/*
>>>> + * Find or instantiate an overlay dentry from real dentries.
>>>> + */
>>>> +static struct dentry *ovl_obtain_alias(struct super_block *sb,
>>>> +                                      struct dentry *upper,
>>>> +                                      struct ovl_path *lowerpath)
>>>> +{
>>>> +       struct inode *inode;
>>>> +       struct dentry *dentry;
>>>> +       struct ovl_entry *oe;
>>>> +
>>>> +       /* TODO: obtain non pure-upper */
>>>> +       if (lowerpath)
>>>> +               return ERR_PTR(-EIO);
>>>> +
>>>> +       inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
>>>> +       if (IS_ERR(inode)) {
>>>> +               dput(upper);
>>>> +               return ERR_CAST(inode);
>>>> +       }
>>>> +
>>>> +       dentry = d_obtain_alias(inode);
>>>> +       if (IS_ERR(dentry) || dentry->d_fsdata)
>>>
>>> Racing two instances of this code, each thinking it got a new alias
>>> and trying to fill it, results in a memory leak.
>>>
>>> Haven't checked in too much depth, but apparently other filesystems
>>> are not affected, so we need something special here.
>>>
>>> One solution: split d_instantiate_anon(dentry, inode) out of
>>> __d_obtain_alias() and supply that with the already initialized
>>> dentry.
>>>
>>
>> Can't we use &OVL_I(inode)->lock to avoid the race?
>
> We could.  But then d_splice_alias() will find our half baked dentry
> and return that from ovl_lookup().

No it won't, because we do not obtain dir dentries this way.
We actually do in this patch [3/17], but since patch [4/17] we don't,
so I only need to fix this patch not to obtain dir dentry and to
protect concurrent decode of non-dir with &OVL_I(inode)->lock.

> So we do need to have the dentry
> fully initialized by the time it's added into the inode's alias list.
>

The only problems I see with adding a non-dir disconnected alias
that is not fully baked are:
1. We can get it in ovl_encode_inode_fh() from d_find_any_alias()
2. nfsd can get it in exportfs_decode_fh() from find_acceptable_alias()
    in a weird hypothetical case where the fully baked dentry we just
    returned from ovl_obtain_alias() in NOT acceptable by nfsd but
    the half baked dentry IS acceptable
3. Another kernel user that uses d_find_any_alias() or one of the use
    case that only Al can think of...

Cases 2 and 3, I don't know if they are for real.

Case 1 is only a problem due to lack of export_operations method
'dentry_to_fh'. exportfs_encode_fh() has the right dentry, but it does
not pass it to the filesystem for encoding, so I think it should be
solved by adding this method.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-18 19:49         ` Amir Goldstein
@ 2018-01-18 20:10           ` Miklos Szeredi
  2018-01-18 20:35             ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-18 20:10 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, linux-fsdevel, Al Viro

On Thu, Jan 18, 2018 at 8:49 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Jan 18, 2018 at 4:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Jan 18, 2018 at 3:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Thu, Jan 18, 2018 at 4:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>> [Added Al Viro]
>>>>
>>>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>> Decoding an upper file handle is done by decoding the upper dentry from
>>>>> underlying upper fs, finding or allocating an overlay inode that is
>>>>> hashed by the real upper inode and instantiating an overlay dentry with
>>>>> that inode.
>>>>>
>>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>>> ---
>>>>>  fs/overlayfs/export.c    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  fs/overlayfs/namei.c     |  4 +--
>>>>>  fs/overlayfs/overlayfs.h |  2 ++
>>>>>  3 files changed, 95 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>>>> index 58c4f5e8a67e..5c72784a0b4d 100644
>>>>> --- a/fs/overlayfs/export.c
>>>>> +++ b/fs/overlayfs/export.c
>>>>> @@ -93,6 +93,97 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
>>>>>         return type;
>>>>>  }
>>>>>
>>>>> +/*
>>>>> + * Find or instantiate an overlay dentry from real dentries.
>>>>> + */
>>>>> +static struct dentry *ovl_obtain_alias(struct super_block *sb,
>>>>> +                                      struct dentry *upper,
>>>>> +                                      struct ovl_path *lowerpath)
>>>>> +{
>>>>> +       struct inode *inode;
>>>>> +       struct dentry *dentry;
>>>>> +       struct ovl_entry *oe;
>>>>> +
>>>>> +       /* TODO: obtain non pure-upper */
>>>>> +       if (lowerpath)
>>>>> +               return ERR_PTR(-EIO);
>>>>> +
>>>>> +       inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
>>>>> +       if (IS_ERR(inode)) {
>>>>> +               dput(upper);
>>>>> +               return ERR_CAST(inode);
>>>>> +       }
>>>>> +
>>>>> +       dentry = d_obtain_alias(inode);
>>>>> +       if (IS_ERR(dentry) || dentry->d_fsdata)
>>>>
>>>> Racing two instances of this code, each thinking it got a new alias
>>>> and trying to fill it, results in a memory leak.
>>>>
>>>> Haven't checked in too much depth, but apparently other filesystems
>>>> are not affected, so we need something special here.
>>>>
>>>> One solution: split d_instantiate_anon(dentry, inode) out of
>>>> __d_obtain_alias() and supply that with the already initialized
>>>> dentry.
>>>>
>>>
>>> Can't we use &OVL_I(inode)->lock to avoid the race?
>>
>> We could.  But then d_splice_alias() will find our half baked dentry
>> and return that from ovl_lookup().
>
> No it won't, because we do not obtain dir dentries this way.
> We actually do in this patch [3/17], but since patch [4/17] we don't,
> so I only need to fix this patch not to obtain dir dentry and to
> protect concurrent decode of non-dir with &OVL_I(inode)->lock.
>
>> So we do need to have the dentry
>> fully initialized by the time it's added into the inode's alias list.
>>
>
> The only problems I see with adding a non-dir disconnected alias
> that is not fully baked are:
> 1. We can get it in ovl_encode_inode_fh() from d_find_any_alias()
> 2. nfsd can get it in exportfs_decode_fh() from find_acceptable_alias()
>     in a weird hypothetical case where the fully baked dentry we just
>     returned from ovl_obtain_alias() in NOT acceptable by nfsd but
>     the half baked dentry IS acceptable
> 3. Another kernel user that uses d_find_any_alias() or one of the use
>     case that only Al can think of...
>
> Cases 2 and 3, I don't know if they are for real.
>
> Case 1 is only a problem due to lack of export_operations method
> 'dentry_to_fh'. exportfs_encode_fh() has the right dentry, but it does
> not pass it to the filesystem for encoding, so I think it should be
> solved by adding this method.

I agree with your analysis.

However, I don't see what's wrong with adding fully baked dentries to
the inode.  To me having the dentry in a consistent state when it's
linked to the inode looks far safer and easier than trying to work
around inconsistent dentries by creating new interfaces.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-18 20:10           ` Miklos Szeredi
@ 2018-01-18 20:35             ` Amir Goldstein
  2018-01-18 22:57               ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-18 20:35 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, Al Viro

On Thu, Jan 18, 2018 at 10:10 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jan 18, 2018 at 8:49 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Thu, Jan 18, 2018 at 4:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Thu, Jan 18, 2018 at 3:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> On Thu, Jan 18, 2018 at 4:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>> [Added Al Viro]
>>>>>
>>>>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>>> Decoding an upper file handle is done by decoding the upper dentry from
>>>>>> underlying upper fs, finding or allocating an overlay inode that is
>>>>>> hashed by the real upper inode and instantiating an overlay dentry with
>>>>>> that inode.
>>>>>>
>>>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>>>> ---
>>>>>>  fs/overlayfs/export.c    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  fs/overlayfs/namei.c     |  4 +--
>>>>>>  fs/overlayfs/overlayfs.h |  2 ++
>>>>>>  3 files changed, 95 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>>>>> index 58c4f5e8a67e..5c72784a0b4d 100644
>>>>>> --- a/fs/overlayfs/export.c
>>>>>> +++ b/fs/overlayfs/export.c
>>>>>> @@ -93,6 +93,97 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
>>>>>>         return type;
>>>>>>  }
>>>>>>
>>>>>> +/*
>>>>>> + * Find or instantiate an overlay dentry from real dentries.
>>>>>> + */
>>>>>> +static struct dentry *ovl_obtain_alias(struct super_block *sb,
>>>>>> +                                      struct dentry *upper,
>>>>>> +                                      struct ovl_path *lowerpath)
>>>>>> +{
>>>>>> +       struct inode *inode;
>>>>>> +       struct dentry *dentry;
>>>>>> +       struct ovl_entry *oe;
>>>>>> +
>>>>>> +       /* TODO: obtain non pure-upper */
>>>>>> +       if (lowerpath)
>>>>>> +               return ERR_PTR(-EIO);
>>>>>> +
>>>>>> +       inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
>>>>>> +       if (IS_ERR(inode)) {
>>>>>> +               dput(upper);
>>>>>> +               return ERR_CAST(inode);
>>>>>> +       }
>>>>>> +
>>>>>> +       dentry = d_obtain_alias(inode);
>>>>>> +       if (IS_ERR(dentry) || dentry->d_fsdata)
>>>>>
>>>>> Racing two instances of this code, each thinking it got a new alias
>>>>> and trying to fill it, results in a memory leak.
>>>>>
>>>>> Haven't checked in too much depth, but apparently other filesystems
>>>>> are not affected, so we need something special here.
>>>>>
>>>>> One solution: split d_instantiate_anon(dentry, inode) out of
>>>>> __d_obtain_alias() and supply that with the already initialized
>>>>> dentry.
>>>>>
>>>>
>>>> Can't we use &OVL_I(inode)->lock to avoid the race?
>>>
>>> We could.  But then d_splice_alias() will find our half baked dentry
>>> and return that from ovl_lookup().
>>
>> No it won't, because we do not obtain dir dentries this way.
>> We actually do in this patch [3/17], but since patch [4/17] we don't,
>> so I only need to fix this patch not to obtain dir dentry and to
>> protect concurrent decode of non-dir with &OVL_I(inode)->lock.
>>
>>> So we do need to have the dentry
>>> fully initialized by the time it's added into the inode's alias list.
>>>
>>
>> The only problems I see with adding a non-dir disconnected alias
>> that is not fully baked are:
>> 1. We can get it in ovl_encode_inode_fh() from d_find_any_alias()
>> 2. nfsd can get it in exportfs_decode_fh() from find_acceptable_alias()
>>     in a weird hypothetical case where the fully baked dentry we just
>>     returned from ovl_obtain_alias() in NOT acceptable by nfsd but
>>     the half baked dentry IS acceptable
>> 3. Another kernel user that uses d_find_any_alias() or one of the use
>>     case that only Al can think of...
>>
>> Cases 2 and 3, I don't know if they are for real.
>>
>> Case 1 is only a problem due to lack of export_operations method
>> 'dentry_to_fh'. exportfs_encode_fh() has the right dentry, but it does
>> not pass it to the filesystem for encoding, so I think it should be
>> solved by adding this method.
>
> I agree with your analysis.
>
> However, I don't see what's wrong with adding fully baked dentries to
> the inode.

I agree that adding half baked dentries is not a good practice and
we should avoid it.

> To me having the dentry in a consistent state when it's
> linked to the inode looks far safer and easier than trying to work
> around inconsistent dentries by creating new interfaces.
>

Actually, this interface is something I wanted to add to begin with.
I think the current implementation that uses d_find_any_alias()
is probably sub-optimal. When I tried to implement connectable
(non-dir) file handles, I had to write a special helper to find an
alias of inode whose parent dir is 'parent'. All this instead of passing
dentry on the interface? and why? because traditionally file systems
only needed inode and parent inode to encode?
It's not breaking any abstraction layer to pass in dentry, in fact that is
the API that nfsd uses    exportfs_encode_fh(dentry, fh...), why not
pass the same API to filesystem.

I am going to post this new interface, because I think it is the right
interface to use.

I will take care of the memory leak and add will leave it to you and Al to
come up with the solution for half baked dentries. When you agree
on a solution I can implement it.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-18 20:35             ` Amir Goldstein
@ 2018-01-18 22:57               ` Amir Goldstein
  2018-01-19  0:23                 ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-18 22:57 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, Al Viro

On Thu, Jan 18, 2018 at 10:35 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Jan 18, 2018 at 10:10 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Jan 18, 2018 at 8:49 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Thu, Jan 18, 2018 at 4:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>> On Thu, Jan 18, 2018 at 3:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>> On Thu, Jan 18, 2018 at 4:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>> [Added Al Viro]
>>>>>>
>>>>>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>>>> Decoding an upper file handle is done by decoding the upper dentry from
>>>>>>> underlying upper fs, finding or allocating an overlay inode that is
>>>>>>> hashed by the real upper inode and instantiating an overlay dentry with
>>>>>>> that inode.
>>>>>>>
>>>>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>>>>> ---
>>>>>>>  fs/overlayfs/export.c    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  fs/overlayfs/namei.c     |  4 +--
>>>>>>>  fs/overlayfs/overlayfs.h |  2 ++
>>>>>>>  3 files changed, 95 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>>>>>> index 58c4f5e8a67e..5c72784a0b4d 100644
>>>>>>> --- a/fs/overlayfs/export.c
>>>>>>> +++ b/fs/overlayfs/export.c
>>>>>>> @@ -93,6 +93,97 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
>>>>>>>         return type;
>>>>>>>  }
>>>>>>>
>>>>>>> +/*
>>>>>>> + * Find or instantiate an overlay dentry from real dentries.
>>>>>>> + */
>>>>>>> +static struct dentry *ovl_obtain_alias(struct super_block *sb,
>>>>>>> +                                      struct dentry *upper,
>>>>>>> +                                      struct ovl_path *lowerpath)
>>>>>>> +{
>>>>>>> +       struct inode *inode;
>>>>>>> +       struct dentry *dentry;
>>>>>>> +       struct ovl_entry *oe;
>>>>>>> +
>>>>>>> +       /* TODO: obtain non pure-upper */
>>>>>>> +       if (lowerpath)
>>>>>>> +               return ERR_PTR(-EIO);
>>>>>>> +
>>>>>>> +       inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
>>>>>>> +       if (IS_ERR(inode)) {
>>>>>>> +               dput(upper);
>>>>>>> +               return ERR_CAST(inode);
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       dentry = d_obtain_alias(inode);
>>>>>>> +       if (IS_ERR(dentry) || dentry->d_fsdata)
>>>>>>
>>>>>> Racing two instances of this code, each thinking it got a new alias
>>>>>> and trying to fill it, results in a memory leak.
>>>>>>
>>>>>> Haven't checked in too much depth, but apparently other filesystems
>>>>>> are not affected, so we need something special here.
>>>>>>
>>>>>> One solution: split d_instantiate_anon(dentry, inode) out of
>>>>>> __d_obtain_alias() and supply that with the already initialized
>>>>>> dentry.
>>>>>>
>>>>>
>>>>> Can't we use &OVL_I(inode)->lock to avoid the race?
>>>>
>>>> We could.  But then d_splice_alias() will find our half baked dentry
>>>> and return that from ovl_lookup().
>>>
>>> No it won't, because we do not obtain dir dentries this way.
>>> We actually do in this patch [3/17], but since patch [4/17] we don't,
>>> so I only need to fix this patch not to obtain dir dentry and to
>>> protect concurrent decode of non-dir with &OVL_I(inode)->lock.
>>>
>>>> So we do need to have the dentry
>>>> fully initialized by the time it's added into the inode's alias list.
>>>>
>>>
>>> The only problems I see with adding a non-dir disconnected alias
>>> that is not fully baked are:
>>> 1. We can get it in ovl_encode_inode_fh() from d_find_any_alias()
>>> 2. nfsd can get it in exportfs_decode_fh() from find_acceptable_alias()
>>>     in a weird hypothetical case where the fully baked dentry we just
>>>     returned from ovl_obtain_alias() in NOT acceptable by nfsd but
>>>     the half baked dentry IS acceptable
>>> 3. Another kernel user that uses d_find_any_alias() or one of the use
>>>     case that only Al can think of...
>>>
>>> Cases 2 and 3, I don't know if they are for real.
>>>
>>> Case 1 is only a problem due to lack of export_operations method
>>> 'dentry_to_fh'. exportfs_encode_fh() has the right dentry, but it does
>>> not pass it to the filesystem for encoding, so I think it should be
>>> solved by adding this method.
>>
>> I agree with your analysis.
>>
>> However, I don't see what's wrong with adding fully baked dentries to
>> the inode.
>
> I agree that adding half baked dentries is not a good practice and
> we should avoid it.
>

How is this for an option?

===========================================

+/*
+ * Find or instantiate an overlay dentry from real dentries.
+ */
+static struct dentry *ovl_obtain_alias(struct super_block *sb,
+                                      struct dentry *upper,
+                                      struct ovl_path *lowerpath)
+{
+       struct inode *inode;
+       struct dentry *dentry;
+       struct ovl_entry *oe;
+       void *fsdata = &oe;
+
+       /* TODO: obtain non pure-upper */
+       if (lowerpath)
+               return ERR_PTR(-EIO);
+
+       inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
+       if (IS_ERR(inode)) {
+               dput(upper);
+               return ERR_CAST(inode);
+       }
+
+       oe = ovl_alloc_entry(0);
+       if (!oe) {
+               iput(inode);
+               return ERR_PTR(-ENOMEM);
+       }
+       oe->has_upper = true;
+
+       dentry = d_obtain_alias_fsdata(inode, fsdata);
+       /* A new allocated dentry assigns *fsdata and sets it to NULL */
+       if (oe)
+               kfree(oe);
+
+       return dentry;
+}
+


...


-static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
+static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected,
+                                      void **fsdata)
 {
        struct dentry *tmp;
        struct dentry *res;
@@ -1962,6 +1963,12 @@ static struct dentry *__d_obtain_alias(struct
inode *inode, int disconnected)
        if (disconnected)
                add_flags |= DCACHE_DISCONNECTED;

+       /* Take ownership of pre-allocated fs-specific data */
+       if (fsdata) {
+               tmp->d_fsdata = fsdata;
+               *fsdata = NULL;
+       }
+
        spin_lock(&tmp->d_lock);
        __d_set_inode_and_type(tmp, inode, add_flags);
        hlist_add_head(&tmp->d_u.d_alias, &inode->i_dentry);
@@ -1998,7 +2005,13 @@ static struct dentry *__d_obtain_alias(struct
inode *inode, int disconnected)
  */
 struct dentry *d_obtain_alias(struct inode *inode)
 {
-       return __d_obtain_alias(inode, 1);
+       return __d_obtain_alias(inode, 1, NULL);
+}
+EXPORT_SYMBOL(d_obtain_alias);
+
+struct dentry *d_obtain_alias_fsdata(struct inode *inode, void **fsdata)
+{
+       return __d_obtain_alias(inode, 1, fsdata);
 }
 EXPORT_SYMBOL(d_obtain_alias);

@@ -2019,7 +2032,7 @@ EXPORT_SYMBOL(d_obtain_alias);
  */
 struct dentry *d_obtain_root(struct inode *inode)
 {
-       return __d_obtain_alias(inode, 0);
+       return __d_obtain_alias(inode, 0, NULL);
 }
 EXPORT_SYMBOL(d_obtain_root);

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-18 22:57               ` Amir Goldstein
@ 2018-01-19  0:23                 ` Amir Goldstein
  2018-01-19 10:39                   ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-19  0:23 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, Al Viro

On Fri, Jan 19, 2018 at 12:57 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Jan 18, 2018 at 10:35 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Thu, Jan 18, 2018 at 10:10 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Thu, Jan 18, 2018 at 8:49 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> On Thu, Jan 18, 2018 at 4:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>> On Thu, Jan 18, 2018 at 3:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>>> On Thu, Jan 18, 2018 at 4:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>>> [Added Al Viro]
>>>>>>>
>>>>>>> On Thu, Jan 4, 2018 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>>>>> Decoding an upper file handle is done by decoding the upper dentry from
>>>>>>>> underlying upper fs, finding or allocating an overlay inode that is
>>>>>>>> hashed by the real upper inode and instantiating an overlay dentry with
>>>>>>>> that inode.
>>>>>>>>
>>>>>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>>>>>> ---
>>>>>>>>  fs/overlayfs/export.c    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>  fs/overlayfs/namei.c     |  4 +--
>>>>>>>>  fs/overlayfs/overlayfs.h |  2 ++
>>>>>>>>  3 files changed, 95 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>>>>>>>> index 58c4f5e8a67e..5c72784a0b4d 100644
>>>>>>>> --- a/fs/overlayfs/export.c
>>>>>>>> +++ b/fs/overlayfs/export.c
>>>>>>>> @@ -93,6 +93,97 @@ static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len,
>>>>>>>>         return type;
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +/*
>>>>>>>> + * Find or instantiate an overlay dentry from real dentries.
>>>>>>>> + */
>>>>>>>> +static struct dentry *ovl_obtain_alias(struct super_block *sb,
>>>>>>>> +                                      struct dentry *upper,
>>>>>>>> +                                      struct ovl_path *lowerpath)
>>>>>>>> +{
>>>>>>>> +       struct inode *inode;
>>>>>>>> +       struct dentry *dentry;
>>>>>>>> +       struct ovl_entry *oe;
>>>>>>>> +
>>>>>>>> +       /* TODO: obtain non pure-upper */
>>>>>>>> +       if (lowerpath)
>>>>>>>> +               return ERR_PTR(-EIO);
>>>>>>>> +
>>>>>>>> +       inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
>>>>>>>> +       if (IS_ERR(inode)) {
>>>>>>>> +               dput(upper);
>>>>>>>> +               return ERR_CAST(inode);
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       dentry = d_obtain_alias(inode);
>>>>>>>> +       if (IS_ERR(dentry) || dentry->d_fsdata)
>>>>>>>
>>>>>>> Racing two instances of this code, each thinking it got a new alias
>>>>>>> and trying to fill it, results in a memory leak.
>>>>>>>
>>>>>>> Haven't checked in too much depth, but apparently other filesystems
>>>>>>> are not affected, so we need something special here.
>>>>>>>
>>>>>>> One solution: split d_instantiate_anon(dentry, inode) out of
>>>>>>> __d_obtain_alias() and supply that with the already initialized
>>>>>>> dentry.
>>>>>>>
>>>>>>
>>>>>> Can't we use &OVL_I(inode)->lock to avoid the race?
>>>>>
>>>>> We could.  But then d_splice_alias() will find our half baked dentry
>>>>> and return that from ovl_lookup().
>>>>
>>>> No it won't, because we do not obtain dir dentries this way.
>>>> We actually do in this patch [3/17], but since patch [4/17] we don't,
>>>> so I only need to fix this patch not to obtain dir dentry and to
>>>> protect concurrent decode of non-dir with &OVL_I(inode)->lock.
>>>>
>>>>> So we do need to have the dentry
>>>>> fully initialized by the time it's added into the inode's alias list.
>>>>>
>>>>
>>>> The only problems I see with adding a non-dir disconnected alias
>>>> that is not fully baked are:
>>>> 1. We can get it in ovl_encode_inode_fh() from d_find_any_alias()
>>>> 2. nfsd can get it in exportfs_decode_fh() from find_acceptable_alias()
>>>>     in a weird hypothetical case where the fully baked dentry we just
>>>>     returned from ovl_obtain_alias() in NOT acceptable by nfsd but
>>>>     the half baked dentry IS acceptable
>>>> 3. Another kernel user that uses d_find_any_alias() or one of the use
>>>>     case that only Al can think of...
>>>>
>>>> Cases 2 and 3, I don't know if they are for real.
>>>>
>>>> Case 1 is only a problem due to lack of export_operations method
>>>> 'dentry_to_fh'. exportfs_encode_fh() has the right dentry, but it does
>>>> not pass it to the filesystem for encoding, so I think it should be
>>>> solved by adding this method.
>>>
>>> I agree with your analysis.
>>>
>>> However, I don't see what's wrong with adding fully baked dentries to
>>> the inode.
>>
>> I agree that adding half baked dentries is not a good practice and
>> we should avoid it.
>>
>
> How is this for an option?
>
> ===========================================
>
> +/*
> + * Find or instantiate an overlay dentry from real dentries.
> + */
> +static struct dentry *ovl_obtain_alias(struct super_block *sb,
> +                                      struct dentry *upper,
> +                                      struct ovl_path *lowerpath)
> +{
> +       struct inode *inode;
> +       struct dentry *dentry;
> +       struct ovl_entry *oe;
> +       void *fsdata = &oe;
> +
> +       /* TODO: obtain non pure-upper */
> +       if (lowerpath)
> +               return ERR_PTR(-EIO);
> +
> +       inode = ovl_get_inode(sb, dget(upper), NULL, NULL, 0);
> +       if (IS_ERR(inode)) {
> +               dput(upper);
> +               return ERR_CAST(inode);
> +       }
> +

With optimistic find:

+       /* First try our luck to find a cached dentry */
+       dentry = d_find_any_alias(inode);
+       if (dentry) {
+               iput(inode);
+               return dentry;
+       }
+
+       /* Then allocate ovl_entry, but free it if we do find a cached dentry */

> +       oe = ovl_alloc_entry(0);
> +       if (!oe) {
> +               iput(inode);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +       oe->has_upper = true;
> +
> +       dentry = d_obtain_alias_fsdata(inode, fsdata);
> +       /* A new allocated dentry assigns *fsdata and sets it to NULL */
> +       if (oe)
> +               kfree(oe);
> +
> +       return dentry;
> +}
> +
>
>
> ...
>
>
> -static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
> +static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected,
> +                                      void **fsdata)
>  {
>         struct dentry *tmp;
>         struct dentry *res;
> @@ -1962,6 +1963,12 @@ static struct dentry *__d_obtain_alias(struct
> inode *inode, int disconnected)
>         if (disconnected)
>                 add_flags |= DCACHE_DISCONNECTED;
>
> +       /* Take ownership of pre-allocated fs-specific data */
> +       if (fsdata) {
> +               tmp->d_fsdata = fsdata;

And without the bug:
 +               tmp->d_fsdata = *fsdata;


> +               *fsdata = NULL;
> +       }
> +
>         spin_lock(&tmp->d_lock);
>         __d_set_inode_and_type(tmp, inode, add_flags);
>         hlist_add_head(&tmp->d_u.d_alias, &inode->i_dentry);
> @@ -1998,7 +2005,13 @@ static struct dentry *__d_obtain_alias(struct
> inode *inode, int disconnected)
>   */
>  struct dentry *d_obtain_alias(struct inode *inode)
>  {
> -       return __d_obtain_alias(inode, 1);
> +       return __d_obtain_alias(inode, 1, NULL);
> +}
> +EXPORT_SYMBOL(d_obtain_alias);
> +
> +struct dentry *d_obtain_alias_fsdata(struct inode *inode, void **fsdata)
> +{
> +       return __d_obtain_alias(inode, 1, fsdata);
>  }
>  EXPORT_SYMBOL(d_obtain_alias);
>
> @@ -2019,7 +2032,7 @@ EXPORT_SYMBOL(d_obtain_alias);
>   */
>  struct dentry *d_obtain_root(struct inode *inode)
>  {
> -       return __d_obtain_alias(inode, 0);
> +       return __d_obtain_alias(inode, 0, NULL);
>  }
>  EXPORT_SYMBOL(d_obtain_root);

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-19  0:23                 ` Amir Goldstein
@ 2018-01-19 10:39                   ` Miklos Szeredi
  2018-01-19 11:07                     ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-19 10:39 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, linux-fsdevel, Al Viro

On Fri, Jan 19, 2018 at 02:23:35AM +0200, Amir Goldstein wrote:
> > How is this for an option?
[...]
> > +struct dentry *d_obtain_alias_fsdata(struct inode *inode, void **fsdata)
> > +{
> > +       return __d_obtain_alias(inode, 1, fsdata);
> >  }
> >  EXPORT_SYMBOL(d_obtain_alias);

It would work, but I like this interface better:

+extern struct dentry * d_alloc_anon(struct super_block *);
+extern struct dentry * d_instantiate_anon(struct dentry *, struct inode *);

And full patch:

diff --git a/fs/dcache.c b/fs/dcache.c
index b5d5ea984ac4..15dc32178813 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1699,9 +1699,15 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 }
 EXPORT_SYMBOL(d_alloc);
 
+struct dentry *d_alloc_anon(struct super_block *sb)
+{
+	return __d_alloc(sb, NULL);
+}
+EXPORT_SYMBOL(d_alloc_anon);
+
 struct dentry *d_alloc_cursor(struct dentry * parent)
 {
-	struct dentry *dentry = __d_alloc(parent->d_sb, NULL);
+	struct dentry *dentry = d_alloc_anon(parent->d_sb);
 	if (dentry) {
 		dentry->d_flags |= DCACHE_RCUACCESS | DCACHE_DENTRY_CURSOR;
 		dentry->d_parent = dget(parent);
@@ -1887,7 +1893,7 @@ struct dentry *d_make_root(struct inode *root_inode)
 	struct dentry *res = NULL;
 
 	if (root_inode) {
-		res = __d_alloc(root_inode->i_sb, NULL);
+		res = d_alloc_anon(root_inode->i_sb);
 		if (res)
 			d_instantiate(res, root_inode);
 		else
@@ -1926,33 +1932,18 @@ struct dentry *d_find_any_alias(struct inode *inode)
 }
 EXPORT_SYMBOL(d_find_any_alias);
 
-static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
+static struct dentry *__d_instantiate_anon(struct dentry *dentry,
+					   struct inode *inode,
+					   bool disconnected)
 {
-	struct dentry *tmp;
 	struct dentry *res;
-	unsigned add_flags;
-
-	if (!inode)
-		return ERR_PTR(-ESTALE);
-	if (IS_ERR(inode))
-		return ERR_CAST(inode);
-
-	res = d_find_any_alias(inode);
-	if (res)
-		goto out_iput;
 
-	tmp = __d_alloc(inode->i_sb, NULL);
-	if (!tmp) {
-		res = ERR_PTR(-ENOMEM);
-		goto out_iput;
-	}
-
-	security_d_instantiate(tmp, inode);
+	security_d_instantiate(dentry, inode);
 	spin_lock(&inode->i_lock);
 	res = __d_find_any_alias(inode);
 	if (res) {
 		spin_unlock(&inode->i_lock);
-		dput(tmp);
+		dput(dentry);
 		goto out_iput;
 	}
 
@@ -1962,22 +1953,56 @@ static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
 	if (disconnected)
 		add_flags |= DCACHE_DISCONNECTED;
 
-	spin_lock(&tmp->d_lock);
-	__d_set_inode_and_type(tmp, inode, add_flags);
-	hlist_add_head(&tmp->d_u.d_alias, &inode->i_dentry);
-	hlist_bl_lock(&tmp->d_sb->s_anon);
-	hlist_bl_add_head(&tmp->d_hash, &tmp->d_sb->s_anon);
-	hlist_bl_unlock(&tmp->d_sb->s_anon);
-	spin_unlock(&tmp->d_lock);
+	spin_lock(&dentry->d_lock);
+	__d_set_inode_and_type(dentry, inode, add_flags);
+	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
+	hlist_bl_lock(&dentry->d_sb->s_anon);
+	hlist_bl_add_head(&dentry->d_hash, &dentry->d_sb->s_anon);
+	hlist_bl_unlock(&dentry->d_sb->s_anon);
+	spin_unlock(&dentry->d_lock);
 	spin_unlock(&inode->i_lock);
 
-	return tmp;
+	return dentry;
 
  out_iput:
 	iput(inode);
 	return res;
 }
 
+struct dentry *d_instantiate_anon(struct dentry *dentry, struct inode *inode)
+{
+	return __d_instantiate_anon(dentry, inode, true);
+}
+EXPORT_SYMBOL(d_instantiate_anon);
+
+static struct dentry *__d_obtain_alias(struct inode *inode, bool disconnected)
+{
+	struct dentry *tmp;
+	struct dentry *res;
+	unsigned add_flags;
+
+	if (!inode)
+		return ERR_PTR(-ESTALE);
+	if (IS_ERR(inode))
+		return ERR_CAST(inode);
+
+	res = d_find_any_alias(inode);
+	if (res)
+		goto out_iput;
+
+	tmp = d_alloc_anon(inode->i_sb);
+	if (!tmp) {
+		res = ERR_PTR(-ENOMEM);
+		goto out_iput;
+	}
+
+	return __d_instantiate_anon(tmp, inode, disconnected);
+
+out_iput:
+	iput(inode);
+	return res;
+}
+
 /**
  * d_obtain_alias - find or allocate a DISCONNECTED dentry for a given inode
  * @inode: inode to allocate the dentry for
@@ -1998,7 +2023,7 @@ static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
  */
 struct dentry *d_obtain_alias(struct inode *inode)
 {
-	return __d_obtain_alias(inode, 1);
+	return __d_obtain_alias(inode, true);
 }
 EXPORT_SYMBOL(d_obtain_alias);
 
@@ -2019,7 +2044,7 @@ EXPORT_SYMBOL(d_obtain_alias);
  */
 struct dentry *d_obtain_root(struct inode *inode)
 {
-	return __d_obtain_alias(inode, 0);
+	return __d_obtain_alias(inode, false);
 }
 EXPORT_SYMBOL(d_obtain_root);
 
diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 25461781c103..7477f28cb99b 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -184,28 +184,32 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
 		return ERR_CAST(inode);
 	}
 
-	dentry = d_obtain_alias(inode);
-	if (IS_ERR(dentry) || dentry->d_fsdata)
-		return dentry;
-
-	oe = ovl_alloc_entry(!!lower);
-	if (!oe) {
-		dput(dentry);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (index)
+		ovl_set_flag(OVL_INDEX, inode);
 
-	dentry->d_fsdata = oe;
-	if (upper_alias)
-		ovl_dentry_set_upper_alias(dentry);
-	if (lower) {
-		oe->lowerstack->dentry = dget(lower);
-		oe->lowerstack->layer = lowerpath->layer;
+	dentry = d_find_any_alias(inode);
+	if (!dentry) {
+		dentry = d_alloc_anon(inode->i_sb);
+		if (!dentry)
+			goto nomem;
+		oe = ovl_alloc_entry(lower ? 1 : 0);
+		if (!oe)
+			goto nomem;
+		if (lower) {
+			oe->lowerstack->dentry = dget(lower);
+			oe->lowerstack->layer = lowerpath->layer;
+		}
+		dentry->d_fsdata = oe;
+		if (upper_alias)
+			ovl_dentry_set_upper_alias(dentry);
 	}
 
-	if (index)
-		ovl_set_flag(OVL_INDEX, inode);
+	return d_instantiate_anon(dentry, inode);
 
-	return dentry;
+nomem:
+	iput(inode);
+	dput(dentry);
+	return ERR_PTR(-ENOMEM);
 }
 
 /* Get the upper or lower dentry in stach whose on layer @idx */
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 65cd8ab60b7a..82a99d366aec 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -227,6 +227,7 @@ extern seqlock_t rename_lock;
  */
 extern void d_instantiate(struct dentry *, struct inode *);
 extern struct dentry * d_instantiate_unique(struct dentry *, struct inode *);
+extern struct dentry * d_instantiate_anon(struct dentry *, struct inode *);
 extern int d_instantiate_no_diralias(struct dentry *, struct inode *);
 extern void __d_drop(struct dentry *dentry);
 extern void d_drop(struct dentry *dentry);
@@ -235,6 +236,7 @@ extern void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op
 
 /* allocate/de-allocate */
 extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
+extern struct dentry * d_alloc_anon(struct super_block *);
 extern struct dentry * d_alloc_pseudo(struct super_block *, const struct qstr *);
 extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *,
 					wait_queue_head_t *);

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-19 10:39                   ` Miklos Szeredi
@ 2018-01-19 11:07                     ` Amir Goldstein
  2018-01-19 20:10                       ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-19 11:07 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, Al Viro

On Fri, Jan 19, 2018 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Jan 19, 2018 at 02:23:35AM +0200, Amir Goldstein wrote:
>> > How is this for an option?
> [...]
>> > +struct dentry *d_obtain_alias_fsdata(struct inode *inode, void **fsdata)
>> > +{
>> > +       return __d_obtain_alias(inode, 1, fsdata);
>> >  }
>> >  EXPORT_SYMBOL(d_obtain_alias);
>
> It would work, but I like this interface better:
>
> +extern struct dentry * d_alloc_anon(struct super_block *);
> +extern struct dentry * d_instantiate_anon(struct dentry *, struct inode *);
>

OK. Thanks for the patch!

> And full patch:
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b5d5ea984ac4..15dc32178813 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1699,9 +1699,15 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
>  }
>  EXPORT_SYMBOL(d_alloc);
>
> +struct dentry *d_alloc_anon(struct super_block *sb)
> +{
> +       return __d_alloc(sb, NULL);
> +}
> +EXPORT_SYMBOL(d_alloc_anon);
> +
>  struct dentry *d_alloc_cursor(struct dentry * parent)
>  {
> -       struct dentry *dentry = __d_alloc(parent->d_sb, NULL);
> +       struct dentry *dentry = d_alloc_anon(parent->d_sb);
>         if (dentry) {
>                 dentry->d_flags |= DCACHE_RCUACCESS | DCACHE_DENTRY_CURSOR;
>                 dentry->d_parent = dget(parent);
> @@ -1887,7 +1893,7 @@ struct dentry *d_make_root(struct inode *root_inode)
>         struct dentry *res = NULL;
>
>         if (root_inode) {
> -               res = __d_alloc(root_inode->i_sb, NULL);
> +               res = d_alloc_anon(root_inode->i_sb);
>                 if (res)
>                         d_instantiate(res, root_inode);
>                 else
> @@ -1926,33 +1932,18 @@ struct dentry *d_find_any_alias(struct inode *inode)
>  }
>  EXPORT_SYMBOL(d_find_any_alias);
>
> -static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
> +static struct dentry *__d_instantiate_anon(struct dentry *dentry,
> +                                          struct inode *inode,
> +                                          bool disconnected)
>  {
> -       struct dentry *tmp;
>         struct dentry *res;
> -       unsigned add_flags;
> -
> -       if (!inode)
> -               return ERR_PTR(-ESTALE);
> -       if (IS_ERR(inode))
> -               return ERR_CAST(inode);
> -
> -       res = d_find_any_alias(inode);
> -       if (res)
> -               goto out_iput;
>
> -       tmp = __d_alloc(inode->i_sb, NULL);
> -       if (!tmp) {
> -               res = ERR_PTR(-ENOMEM);
> -               goto out_iput;
> -       }
> -
> -       security_d_instantiate(tmp, inode);
> +       security_d_instantiate(dentry, inode);
>         spin_lock(&inode->i_lock);
>         res = __d_find_any_alias(inode);
>         if (res) {
>                 spin_unlock(&inode->i_lock);
> -               dput(tmp);
> +               dput(dentry);
>                 goto out_iput;
>         }
>
> @@ -1962,22 +1953,56 @@ static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
>         if (disconnected)
>                 add_flags |= DCACHE_DISCONNECTED;
>
> -       spin_lock(&tmp->d_lock);
> -       __d_set_inode_and_type(tmp, inode, add_flags);
> -       hlist_add_head(&tmp->d_u.d_alias, &inode->i_dentry);
> -       hlist_bl_lock(&tmp->d_sb->s_anon);
> -       hlist_bl_add_head(&tmp->d_hash, &tmp->d_sb->s_anon);
> -       hlist_bl_unlock(&tmp->d_sb->s_anon);
> -       spin_unlock(&tmp->d_lock);
> +       spin_lock(&dentry->d_lock);
> +       __d_set_inode_and_type(dentry, inode, add_flags);
> +       hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
> +       hlist_bl_lock(&dentry->d_sb->s_anon);
> +       hlist_bl_add_head(&dentry->d_hash, &dentry->d_sb->s_anon);
> +       hlist_bl_unlock(&dentry->d_sb->s_anon);
> +       spin_unlock(&dentry->d_lock);
>         spin_unlock(&inode->i_lock);
>
> -       return tmp;
> +       return dentry;
>
>   out_iput:
>         iput(inode);
>         return res;
>  }
>
> +struct dentry *d_instantiate_anon(struct dentry *dentry, struct inode *inode)
> +{
> +       return __d_instantiate_anon(dentry, inode, true);
> +}
> +EXPORT_SYMBOL(d_instantiate_anon);
> +
> +static struct dentry *__d_obtain_alias(struct inode *inode, bool disconnected)
> +{
> +       struct dentry *tmp;
> +       struct dentry *res;
> +       unsigned add_flags;
> +
> +       if (!inode)
> +               return ERR_PTR(-ESTALE);
> +       if (IS_ERR(inode))
> +               return ERR_CAST(inode);
> +
> +       res = d_find_any_alias(inode);
> +       if (res)
> +               goto out_iput;
> +
> +       tmp = d_alloc_anon(inode->i_sb);
> +       if (!tmp) {
> +               res = ERR_PTR(-ENOMEM);
> +               goto out_iput;
> +       }
> +
> +       return __d_instantiate_anon(tmp, inode, disconnected);
> +
> +out_iput:
> +       iput(inode);
> +       return res;
> +}
> +
>  /**
>   * d_obtain_alias - find or allocate a DISCONNECTED dentry for a given inode
>   * @inode: inode to allocate the dentry for
> @@ -1998,7 +2023,7 @@ static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected)
>   */
>  struct dentry *d_obtain_alias(struct inode *inode)
>  {
> -       return __d_obtain_alias(inode, 1);
> +       return __d_obtain_alias(inode, true);
>  }
>  EXPORT_SYMBOL(d_obtain_alias);
>
> @@ -2019,7 +2044,7 @@ EXPORT_SYMBOL(d_obtain_alias);
>   */
>  struct dentry *d_obtain_root(struct inode *inode)
>  {
> -       return __d_obtain_alias(inode, 0);
> +       return __d_obtain_alias(inode, false);
>  }
>  EXPORT_SYMBOL(d_obtain_root);
>
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 25461781c103..7477f28cb99b 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -184,28 +184,32 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
>                 return ERR_CAST(inode);
>         }
>
> -       dentry = d_obtain_alias(inode);
> -       if (IS_ERR(dentry) || dentry->d_fsdata)
> -               return dentry;
> -
> -       oe = ovl_alloc_entry(!!lower);
> -       if (!oe) {
> -               dput(dentry);
> -               return ERR_PTR(-ENOMEM);
> -       }
> +       if (index)
> +               ovl_set_flag(OVL_INDEX, inode);
>
> -       dentry->d_fsdata = oe;
> -       if (upper_alias)
> -               ovl_dentry_set_upper_alias(dentry);
> -       if (lower) {
> -               oe->lowerstack->dentry = dget(lower);
> -               oe->lowerstack->layer = lowerpath->layer;
> +       dentry = d_find_any_alias(inode);
> +       if (!dentry) {
> +               dentry = d_alloc_anon(inode->i_sb);
> +               if (!dentry)
> +                       goto nomem;
> +               oe = ovl_alloc_entry(lower ? 1 : 0);
> +               if (!oe)
> +                       goto nomem;
> +               if (lower) {
> +                       oe->lowerstack->dentry = dget(lower);
> +                       oe->lowerstack->layer = lowerpath->layer;
> +               }
> +               dentry->d_fsdata = oe;
> +               if (upper_alias)
> +                       ovl_dentry_set_upper_alias(dentry);
>         }
>
> -       if (index)
> -               ovl_set_flag(OVL_INDEX, inode);
> +       return d_instantiate_anon(dentry, inode);
>
> -       return dentry;
> +nomem:
> +       iput(inode);
> +       dput(dentry);
> +       return ERR_PTR(-ENOMEM);
>  }
>
>  /* Get the upper or lower dentry in stach whose on layer @idx */
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 65cd8ab60b7a..82a99d366aec 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -227,6 +227,7 @@ extern seqlock_t rename_lock;
>   */
>  extern void d_instantiate(struct dentry *, struct inode *);
>  extern struct dentry * d_instantiate_unique(struct dentry *, struct inode *);
> +extern struct dentry * d_instantiate_anon(struct dentry *, struct inode *);
>  extern int d_instantiate_no_diralias(struct dentry *, struct inode *);
>  extern void __d_drop(struct dentry *dentry);
>  extern void d_drop(struct dentry *dentry);
> @@ -235,6 +236,7 @@ extern void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op
>
>  /* allocate/de-allocate */
>  extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
> +extern struct dentry * d_alloc_anon(struct super_block *);
>  extern struct dentry * d_alloc_pseudo(struct super_block *, const struct qstr *);
>  extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *,
>                                         wait_queue_head_t *);
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-19 11:07                     ` Amir Goldstein
@ 2018-01-19 20:10                       ` Amir Goldstein
  2018-01-24 10:34                         ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-19 20:10 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, Al Viro

On Fri, Jan 19, 2018 at 1:07 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, Jan 19, 2018 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Fri, Jan 19, 2018 at 02:23:35AM +0200, Amir Goldstein wrote:
>>> > How is this for an option?
>> [...]
>>> > +struct dentry *d_obtain_alias_fsdata(struct inode *inode, void **fsdata)
>>> > +{
>>> > +       return __d_obtain_alias(inode, 1, fsdata);
>>> >  }
>>> >  EXPORT_SYMBOL(d_obtain_alias);
>>
>> It would work, but I like this interface better:
>>
>> +extern struct dentry * d_alloc_anon(struct super_block *);
>> +extern struct dentry * d_instantiate_anon(struct dentry *, struct inode *);
>>
>
> OK. Thanks for the patch!
>

Added your dcache patch to the series and reworked my patches
to use the new helpers.

Tested result is pushed to:
https://github.com/amir73il/linux/commits/ovl-nfs-export-v3

Prep patches changes since v2:
- Rebased over fix patch "hash all directory inodes for fsnotify"
- Rename mount/config option from "verify" to "nfs_export"
- Force r/o mount when index dir creation fails
- Allow enabling "nfs_export" for non-upper mount
- Require "redirect_dir=nofollow" for non-upper mount
- Rename dir index entries xattr from ".origin" to ".upper"
- Re-factor ovl_{get|set|verify}_origin() helpers
- Simplify test for temp index name (starts with #)
- Abandon ovl_dentry_is_renamed() test for lower st_ino
- Document overhead on mount with full index
- Document change of behavior when verifying lower origin
- Added patch to make room in ovl_entry struct

NFS export changes since v2:
- Fix exportfs ops for r/o overlay with no upperdir
- Document reason for copy up directory on encode
- Take care of racing with rename while connecting dir
- Explain the reasons for choosing the 'connected' dir approach
- Do not add dentry without ovl_entry to dcache

Optimizations TODO:
- Copy up on encode only when lower ancestor is below middle layer redirect
- Hash inode by fh to avoid origin decode of whiteout fh

As far as I know, the series is now functionally correct and all comments
so far addressed. The remaining optimizations will be done on top of this
series.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-19 20:10                       ` Amir Goldstein
@ 2018-01-24 10:34                         ` Miklos Szeredi
  2018-01-24 11:04                           ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2018-01-24 10:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs, linux-fsdevel, Al Viro

On Fri, Jan 19, 2018 at 9:10 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, Jan 19, 2018 at 1:07 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Fri, Jan 19, 2018 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Fri, Jan 19, 2018 at 02:23:35AM +0200, Amir Goldstein wrote:
>>>> > How is this for an option?
>>> [...]
>>>> > +struct dentry *d_obtain_alias_fsdata(struct inode *inode, void **fsdata)
>>>> > +{
>>>> > +       return __d_obtain_alias(inode, 1, fsdata);
>>>> >  }
>>>> >  EXPORT_SYMBOL(d_obtain_alias);
>>>
>>> It would work, but I like this interface better:
>>>
>>> +extern struct dentry * d_alloc_anon(struct super_block *);
>>> +extern struct dentry * d_instantiate_anon(struct dentry *, struct inode *);
>>>
>>
>> OK. Thanks for the patch!
>>
>
> Added your dcache patch to the series and reworked my patches
> to use the new helpers.
>
> Tested result is pushed to:
> https://github.com/amir73il/linux/commits/ovl-nfs-export-v3
>
> Prep patches changes since v2:
> - Rebased over fix patch "hash all directory inodes for fsnotify"
> - Rename mount/config option from "verify" to "nfs_export"
> - Force r/o mount when index dir creation fails
> - Allow enabling "nfs_export" for non-upper mount
> - Require "redirect_dir=nofollow" for non-upper mount
> - Rename dir index entries xattr from ".origin" to ".upper"
> - Re-factor ovl_{get|set|verify}_origin() helpers
> - Simplify test for temp index name (starts with #)
> - Abandon ovl_dentry_is_renamed() test for lower st_ino
> - Document overhead on mount with full index
> - Document change of behavior when verifying lower origin
> - Added patch to make room in ovl_entry struct
>
> NFS export changes since v2:
> - Fix exportfs ops for r/o overlay with no upperdir
> - Document reason for copy up directory on encode
> - Take care of racing with rename while connecting dir
> - Explain the reasons for choosing the 'connected' dir approach
> - Do not add dentry without ovl_entry to dcache
>
> Optimizations TODO:
> - Copy up on encode only when lower ancestor is below middle layer redirect
> - Hash inode by fh to avoid origin decode of whiteout fh
>
> As far as I know, the series is now functionally correct and all comments
> so far addressed. The remaining optimizations will be done on top of this
> series.

Pushed to overlayfs-next with one fix (do not warn about falling back
to nfs_export=off if nfs_export is already off).

That spurious warning makes me wonder: how much of the option matrix is tested?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-24 10:34                         ` Miklos Szeredi
@ 2018-01-24 11:04                           ` Amir Goldstein
  2018-01-24 11:18                             ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-24 11:04 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, Al Viro, Eryu Guan

On Wed, Jan 24, 2018 at 12:34 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Jan 19, 2018 at 9:10 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Fri, Jan 19, 2018 at 1:07 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Fri, Jan 19, 2018 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>> On Fri, Jan 19, 2018 at 02:23:35AM +0200, Amir Goldstein wrote:
>>>>> > How is this for an option?
>>>> [...]
>>>>> > +struct dentry *d_obtain_alias_fsdata(struct inode *inode, void **fsdata)
>>>>> > +{
>>>>> > +       return __d_obtain_alias(inode, 1, fsdata);
>>>>> >  }
>>>>> >  EXPORT_SYMBOL(d_obtain_alias);
>>>>
>>>> It would work, but I like this interface better:
>>>>
>>>> +extern struct dentry * d_alloc_anon(struct super_block *);
>>>> +extern struct dentry * d_instantiate_anon(struct dentry *, struct inode *);
>>>>
>>>
>>> OK. Thanks for the patch!
>>>
>>
>> Added your dcache patch to the series and reworked my patches
>> to use the new helpers.
>>
>> Tested result is pushed to:
>> https://github.com/amir73il/linux/commits/ovl-nfs-export-v3
>>
>> Prep patches changes since v2:
>> - Rebased over fix patch "hash all directory inodes for fsnotify"
>> - Rename mount/config option from "verify" to "nfs_export"
>> - Force r/o mount when index dir creation fails
>> - Allow enabling "nfs_export" for non-upper mount
>> - Require "redirect_dir=nofollow" for non-upper mount
>> - Rename dir index entries xattr from ".origin" to ".upper"
>> - Re-factor ovl_{get|set|verify}_origin() helpers
>> - Simplify test for temp index name (starts with #)
>> - Abandon ovl_dentry_is_renamed() test for lower st_ino
>> - Document overhead on mount with full index
>> - Document change of behavior when verifying lower origin
>> - Added patch to make room in ovl_entry struct
>>
>> NFS export changes since v2:
>> - Fix exportfs ops for r/o overlay with no upperdir
>> - Document reason for copy up directory on encode
>> - Take care of racing with rename while connecting dir
>> - Explain the reasons for choosing the 'connected' dir approach
>> - Do not add dentry without ovl_entry to dcache
>>
>> Optimizations TODO:
>> - Copy up on encode only when lower ancestor is below middle layer redirect
>> - Hash inode by fh to avoid origin decode of whiteout fh
>>
>> As far as I know, the series is now functionally correct and all comments
>> so far addressed. The remaining optimizations will be done on top of this
>> series.
>
> Pushed to overlayfs-next with one fix (do not warn about falling back
> to nfs_export=off if nfs_export is already off).

Good fix.
That warning was a late addition to V3 after allowing nfs_export without
index for on-upper r/o mount.

>
> That spurious warning makes me wonder: how much of the option matrix is tested?
>

Good question... (adding Eryu in CC)

I know Eryu tested with default kernel configuration, because one
of my tests found a bug with NFS export and redirect_dir=off.
(I fixed the specific test to require redirect_dir)

I have posted xfstests for overlay NFS export support with samefs
and non-samefs configuration and for non-samefs, there are two
lower layers not on the same fs. I also have posted a test for non-upper
overlay with two lower layers on samefs and non-samefs.

One of the overlay xfstests explicitly turns off index with index=off,
(overlay/036), so we have coverage for falling back to nfs_export=off
when NFS export is enabled by default (as it usually is in my setup).

There is an overlay xfstest with multiple lower layers and no-upper
and that test shows the warning about falling back to nfs_export=off
due to redirect_dir=nofollow requirement (with default kernel configuration).

I did not specifically test underlying fs with no xattr support and no
file handle support, though the latter was already tested for index
with squashfs (no uuid) and the nfs_export=off code is piggy backed
in the same place as index=off.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-24 11:04                           ` Amir Goldstein
@ 2018-01-24 11:18                             ` Amir Goldstein
  2018-01-24 11:55                               ` Amir Goldstein
  0 siblings, 1 reply; 68+ messages in thread
From: Amir Goldstein @ 2018-01-24 11:18 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, Al Viro, Eryu Guan

On Wed, Jan 24, 2018 at 1:04 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Jan 24, 2018 at 12:34 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Fri, Jan 19, 2018 at 9:10 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Fri, Jan 19, 2018 at 1:07 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> On Fri, Jan 19, 2018 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>> On Fri, Jan 19, 2018 at 02:23:35AM +0200, Amir Goldstein wrote:
>>>>>> > How is this for an option?
>>>>> [...]
>>>>>> > +struct dentry *d_obtain_alias_fsdata(struct inode *inode, void **fsdata)
>>>>>> > +{
>>>>>> > +       return __d_obtain_alias(inode, 1, fsdata);
>>>>>> >  }
>>>>>> >  EXPORT_SYMBOL(d_obtain_alias);
>>>>>
>>>>> It would work, but I like this interface better:
>>>>>
>>>>> +extern struct dentry * d_alloc_anon(struct super_block *);
>>>>> +extern struct dentry * d_instantiate_anon(struct dentry *, struct inode *);
>>>>>
>>>>
>>>> OK. Thanks for the patch!
>>>>
>>>
>>> Added your dcache patch to the series and reworked my patches
>>> to use the new helpers.
>>>
>>> Tested result is pushed to:
>>> https://github.com/amir73il/linux/commits/ovl-nfs-export-v3
>>>
>>> Prep patches changes since v2:
>>> - Rebased over fix patch "hash all directory inodes for fsnotify"
>>> - Rename mount/config option from "verify" to "nfs_export"
>>> - Force r/o mount when index dir creation fails
>>> - Allow enabling "nfs_export" for non-upper mount
>>> - Require "redirect_dir=nofollow" for non-upper mount
>>> - Rename dir index entries xattr from ".origin" to ".upper"
>>> - Re-factor ovl_{get|set|verify}_origin() helpers
>>> - Simplify test for temp index name (starts with #)
>>> - Abandon ovl_dentry_is_renamed() test for lower st_ino
>>> - Document overhead on mount with full index
>>> - Document change of behavior when verifying lower origin
>>> - Added patch to make room in ovl_entry struct
>>>
>>> NFS export changes since v2:
>>> - Fix exportfs ops for r/o overlay with no upperdir
>>> - Document reason for copy up directory on encode
>>> - Take care of racing with rename while connecting dir
>>> - Explain the reasons for choosing the 'connected' dir approach
>>> - Do not add dentry without ovl_entry to dcache
>>>
>>> Optimizations TODO:
>>> - Copy up on encode only when lower ancestor is below middle layer redirect
>>> - Hash inode by fh to avoid origin decode of whiteout fh
>>>
>>> As far as I know, the series is now functionally correct and all comments
>>> so far addressed. The remaining optimizations will be done on top of this
>>> series.
>>
>> Pushed to overlayfs-next with one fix (do not warn about falling back
>> to nfs_export=off if nfs_export is already off).
>
> Good fix.
> That warning was a late addition to V3 after allowing nfs_export without
> index for on-upper r/o mount.
>
>>
>> That spurious warning makes me wonder: how much of the option matrix is tested?
>>
>
> Good question... (adding Eryu in CC)
>
> I know Eryu tested with default kernel configuration, because one
> of my tests found a bug with NFS export and redirect_dir=off.
> (I fixed the specific test to require redirect_dir)
>
> I have posted xfstests for overlay NFS export support with samefs
> and non-samefs configuration and for non-samefs, there are two
> lower layers not on the same fs. I also have posted a test for non-upper
> overlay with two lower layers on samefs and non-samefs.
>
> One of the overlay xfstests explicitly turns off index with index=off,
> (overlay/036), so we have coverage for falling back to nfs_export=off
> when NFS export is enabled by default (as it usually is in my setup).
>
> There is an overlay xfstest with multiple lower layers and no-upper
> and that test shows the warning about falling back to nfs_export=off
> due to redirect_dir=nofollow requirement (with default kernel configuration).
>
> I did not specifically test underlying fs with no xattr support and no
> file handle support, though the latter was already tested for index
> with squashfs (no uuid) and the nfs_export=off code is piggy backed
> in the same place as index=off.
>

Clarification: it may be implied when I wrote that "we have test coverage..."
that there are automatic tests to verify falling back to nfs_export=off.
This is not the case. I was only listing "falling back" cases whose warning
I regularly see in my routine overlay/quick xfstest as expected.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/17] ovl: decode pure upper file handles
  2018-01-24 11:18                             ` Amir Goldstein
@ 2018-01-24 11:55                               ` Amir Goldstein
  0 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-01-24 11:55 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, linux-fsdevel, Al Viro, Eryu Guan

On Wed, Jan 24, 2018 at 1:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Jan 24, 2018 at 1:04 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Wed, Jan 24, 2018 at 12:34 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Fri, Jan 19, 2018 at 9:10 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> On Fri, Jan 19, 2018 at 1:07 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>> On Fri, Jan 19, 2018 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>> On Fri, Jan 19, 2018 at 02:23:35AM +0200, Amir Goldstein wrote:
>>>>>>> > How is this for an option?
>>>>>> [...]
>>>>>>> > +struct dentry *d_obtain_alias_fsdata(struct inode *inode, void **fsdata)
>>>>>>> > +{
>>>>>>> > +       return __d_obtain_alias(inode, 1, fsdata);
>>>>>>> >  }
>>>>>>> >  EXPORT_SYMBOL(d_obtain_alias);
>>>>>>
>>>>>> It would work, but I like this interface better:
>>>>>>
>>>>>> +extern struct dentry * d_alloc_anon(struct super_block *);
>>>>>> +extern struct dentry * d_instantiate_anon(struct dentry *, struct inode *);
>>>>>>
>>>>>
>>>>> OK. Thanks for the patch!
>>>>>
>>>>
>>>> Added your dcache patch to the series and reworked my patches
>>>> to use the new helpers.
>>>>
>>>> Tested result is pushed to:
>>>> https://github.com/amir73il/linux/commits/ovl-nfs-export-v3
>>>>
>>>> Prep patches changes since v2:
>>>> - Rebased over fix patch "hash all directory inodes for fsnotify"
>>>> - Rename mount/config option from "verify" to "nfs_export"
>>>> - Force r/o mount when index dir creation fails
>>>> - Allow enabling "nfs_export" for non-upper mount
>>>> - Require "redirect_dir=nofollow" for non-upper mount
>>>> - Rename dir index entries xattr from ".origin" to ".upper"
>>>> - Re-factor ovl_{get|set|verify}_origin() helpers
>>>> - Simplify test for temp index name (starts with #)
>>>> - Abandon ovl_dentry_is_renamed() test for lower st_ino
>>>> - Document overhead on mount with full index
>>>> - Document change of behavior when verifying lower origin
>>>> - Added patch to make room in ovl_entry struct
>>>>
>>>> NFS export changes since v2:
>>>> - Fix exportfs ops for r/o overlay with no upperdir
>>>> - Document reason for copy up directory on encode
>>>> - Take care of racing with rename while connecting dir
>>>> - Explain the reasons for choosing the 'connected' dir approach
>>>> - Do not add dentry without ovl_entry to dcache
>>>>
>>>> Optimizations TODO:
>>>> - Copy up on encode only when lower ancestor is below middle layer redirect
>>>> - Hash inode by fh to avoid origin decode of whiteout fh
>>>>
>>>> As far as I know, the series is now functionally correct and all comments
>>>> so far addressed. The remaining optimizations will be done on top of this
>>>> series.
>>>
>>> Pushed to overlayfs-next with one fix (do not warn about falling back
>>> to nfs_export=off if nfs_export is already off).
>>
>> Good fix.
>> That warning was a late addition to V3 after allowing nfs_export without
>> index for on-upper r/o mount.
>>
>>>
>>> That spurious warning makes me wonder: how much of the option matrix is tested?
>>>
>>
>> Good question... (adding Eryu in CC)
>>
>> I know Eryu tested with default kernel configuration, because one
>> of my tests found a bug with NFS export and redirect_dir=off.
>> (I fixed the specific test to require redirect_dir)
>>
>> I have posted xfstests for overlay NFS export support with samefs
>> and non-samefs configuration and for non-samefs, there are two
>> lower layers not on the same fs. I also have posted a test for non-upper
>> overlay with two lower layers on samefs and non-samefs.
>>
>> One of the overlay xfstests explicitly turns off index with index=off,
>> (overlay/036), so we have coverage for falling back to nfs_export=off
>> when NFS export is enabled by default (as it usually is in my setup).
>>
>> There is an overlay xfstest with multiple lower layers and no-upper
>> and that test shows the warning about falling back to nfs_export=off
>> due to redirect_dir=nofollow requirement (with default kernel configuration).
>>
>> I did not specifically test underlying fs with no xattr support and no
>> file handle support, though the latter was already tested for index
>> with squashfs (no uuid) and the nfs_export=off code is piggy backed
>> in the same place as index=off.
>>

Correction. no file handle support is covered by nested overlay test
(overlay/029), so I do see those "falling back" warnings as well with v3.
(I did not see the warnings in all my tests because I sometimes apply an
 extra patch to support NFS export on nested overlay).

>
> Clarification: it may be implied when I wrote that "we have test coverage..."
> that there are automatic tests to verify falling back to nfs_export=off.
> This is not the case. I was only listing "falling back" cases whose warning
> I regularly see in my routine overlay/quick xfstest as expected.
>
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files
  2018-01-18 14:18   ` Amir Goldstein
@ 2018-02-27 11:35     ` Amir Goldstein
  0 siblings, 0 replies; 68+ messages in thread
From: Amir Goldstein @ 2018-02-27 11:35 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jeff Layton, J . Bruce Fields, overlayfs, linux-fsdevel

On Thu, Jan 18, 2018 at 4:18 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Jan 4, 2018 at 7:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Lookup overlay inode in cache by origin inode, so we can decode a file
>> handle of an open file even if the index has a whiteout index entry to
>> mark this overlay inode was unlinked.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/export.c    | 22 ++++++++++++++++++++--
>>  fs/overlayfs/inode.c     | 16 ++++++++++++++++
>>  fs/overlayfs/overlayfs.h |  1 +
>>  3 files changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
>> index 602bada474ba..6ecb54d4b52c 100644
>> --- a/fs/overlayfs/export.c
>> +++ b/fs/overlayfs/export.c
>> @@ -385,13 +385,21 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
>>         struct ovl_path *stack = &origin;
>>         struct dentry *dentry = NULL;
>>         struct dentry *index = NULL;
>> +       struct inode *inode = NULL;
>> +       bool is_deleted = false;
>>         int err;
>>
>>         /* First lookup indexed upper by fh */
>>         index = ovl_get_index_fh(ofs, fh);
>>         err = PTR_ERR(index);
>> -       if (IS_ERR(index))
>> -               return ERR_PTR(err);
>> +       if (IS_ERR(index)) {
>> +               if (err != -ESTALE)
>> +                       return ERR_PTR(err);
>> +
>> +               /* Found a whiteout index - treat as deleted inode */
>> +               is_deleted = true;
>> +               index = NULL;
>
> Ouch! it seems I was misleading you.
> If we find a whiteout index for dir, we *do* decode+reconnect origin,
> because we want to find out if this is an unlinked but open non-dir.
> I guess there are 2 ways to avoid this unneeded decode:
> 1. mark a "directory index whiteout" differently than "non-dir index whiteout"
> 2. lookup icache by file handle
>

Getting back to this.
I have implemented lookup icache by file handle and realized that it incurs
higher CPU usage in the common case because the hash function is more
expensive. I do not have benchmark numbers to present.

However, I also realized there is a 3rd option, which seems like a
better option:

- Call the underlying fs fh_to_dentry() operation (and not exportfs_decode_fh())
  to get a possibly disconnected origin dentry
- Lookup overlay inode by origin inode
- IF overlay inode not cached lookup index by origin fh
- IF origin dentry is a disconnected directory AND overlay inode is not cached
  AND index is not found, only then call exportfs_decode_fh() of origin fh
  to reconnect the origin dir

I'll try to write this up.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2018-02-27 11:35 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-04 17:20 [PATCH v2 00/17] Overlayfs NFS export support Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 01/17] ovl: document NFS export Amir Goldstein
2018-01-11 16:06   ` Miklos Szeredi
2018-01-11 16:26     ` Amir Goldstein
2018-01-12 15:43       ` Miklos Szeredi
2018-01-12 15:49         ` Miklos Szeredi
2018-01-12 18:50           ` Amir Goldstein
2018-01-13  8:54           ` Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 02/17] ovl: encode pure upper file handles Amir Goldstein
2018-01-18 10:31   ` Miklos Szeredi
2018-01-04 17:20 ` [PATCH v2 03/17] ovl: decode " Amir Goldstein
2018-01-18 14:09   ` Miklos Szeredi
2018-01-18 14:34     ` Amir Goldstein
2018-01-18 14:39       ` Miklos Szeredi
2018-01-18 19:49         ` Amir Goldstein
2018-01-18 20:10           ` Miklos Szeredi
2018-01-18 20:35             ` Amir Goldstein
2018-01-18 22:57               ` Amir Goldstein
2018-01-19  0:23                 ` Amir Goldstein
2018-01-19 10:39                   ` Miklos Szeredi
2018-01-19 11:07                     ` Amir Goldstein
2018-01-19 20:10                       ` Amir Goldstein
2018-01-24 10:34                         ` Miklos Szeredi
2018-01-24 11:04                           ` Amir Goldstein
2018-01-24 11:18                             ` Amir Goldstein
2018-01-24 11:55                               ` Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 04/17] ovl: decode connected upper dir " Amir Goldstein
2018-01-05 12:33   ` Amir Goldstein
2018-01-05 15:18     ` J . Bruce Fields
2018-01-05 15:34       ` Amir Goldstein
2018-01-15 11:41     ` Miklos Szeredi
2018-01-15 11:33   ` Miklos Szeredi
2018-01-15 12:20     ` Amir Goldstein
2018-01-15 14:56       ` Miklos Szeredi
2018-01-17 11:18         ` Amir Goldstein
2018-01-17 12:20           ` Amir Goldstein
2018-01-17 13:29             ` Amir Goldstein
2018-01-17 15:42           ` Miklos Szeredi
2018-01-17 16:34             ` Amir Goldstein
2018-01-17 21:36               ` Amir Goldstein
2018-01-18  8:22               ` Miklos Szeredi
2018-01-18  8:47                 ` Amir Goldstein
2018-01-18  9:12                   ` Miklos Szeredi
2018-01-18 10:28                     ` Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 05/17] ovl: encode non-indexed upper " Amir Goldstein
2018-01-15 11:58   ` Miklos Szeredi
2018-01-15 12:07     ` Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 06/17] ovl: copy up before encoding dir file handle when ofs->numlower > 1 Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 07/17] ovl: encode lower file handles Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 08/17] ovl: decode lower non-dir " Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 09/17] ovl: decode indexed " Amir Goldstein
2018-01-18 13:11   ` Miklos Szeredi
2018-01-04 17:20 ` [PATCH v2 10/17] ovl: decode lower file handles of unlinked but open files Amir Goldstein
2018-01-16  9:16   ` Miklos Szeredi
2018-01-16  9:37     ` Amir Goldstein
2018-01-16 10:10       ` Miklos Szeredi
2018-01-16 10:40         ` Amir Goldstein
2018-01-16 11:07           ` Miklos Szeredi
2018-01-17 21:05         ` Amir Goldstein
2018-01-18 14:18   ` Amir Goldstein
2018-02-27 11:35     ` Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 11/17] ovl: decode indexed dir file handles Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 12/17] ovl: decode pure lower " Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 13/17] ovl: hash directory inodes for NFS export Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 14/17] ovl: lookup connected ancestor of dir in inode cache Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 15/17] ovl: lookup indexed ancestor of lower dir Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 16/17] ovl: wire up NFS export support Amir Goldstein
2018-01-04 17:20 ` [PATCH v2 17/17] nfsd: encode stat->mtime for getattr instead of inode->i_mtime Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).