LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs
@ 2021-09-23 13:08 Chengguang Xu
  2021-09-23 13:08 ` [RFC PATCH v5 01/10] ovl: setup overlayfs' private bdi Chengguang Xu
                   ` (9 more replies)
  0 siblings, 10 replies; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
on upper_sb to synchronize whole dirty inodes in upper filesystem
regardless of the overlay ownership of the inode. In the use case of
container, when multiple containers using the same underlying upper
filesystem, it has some shortcomings as below.

(1) Performance
Synchronization is probably heavy because it actually syncs unnecessary
inodes for target overlayfs.

(2) Interference
Unplanned synchronization will probably impact IO performance of
unrelated container processes on the other overlayfs.

This series try to implement containerized syncfs for overlayfs so that
only sync target dirty upper inodes which are belong to specific overlayfs
instance. By doing this, it is able to reduce cost of synchronization and
will not seriously impact IO performance of unrelated processes.

v1->v2:
- Mark overlayfs' inode dirty itself instead of adding notification
  mechanism to vfs inode.

v2->v3:
- Introduce overlayfs' extra syncfs wait list to wait target upper inodes
in ->sync_fs.

v3->v4:
- Using wait_sb_inodes() to wait syncing upper inodes.
- Mark overlay inode dirty only when having upper inode and  VM_SHARED
flag in ovl_mmap().
- Check upper i_state after checking upper mmap state
in ovl_write_inode.

v4->v5:
- Add underlying inode dirtiness check after mnt_drop_write().
- Handle both wait/no-wait mode of syncfs(2) in overlayfs' ->sync_fs().

Chengguang Xu (10):
  ovl: setup overlayfs' private bdi
  ovl: implement ->writepages operation
  ovl: implement overlayfs' ->evict_inode operation
  ovl: mark overlayfs' inode dirty on modification
  ovl: mark overlayfs' inode dirty on shared mmap
  ovl: implement overlayfs' ->write_inode operation
  ovl: cache dirty overlayfs' inode
  fs: export wait_sb_inodes()
  fs: introduce new helper sync_fs_and_blockdev()
  ovl: implement containerized syncfs for overlayfs

 fs/fs-writeback.c         |  3 +-
 fs/overlayfs/file.c       |  6 ++++
 fs/overlayfs/inode.c      | 14 ++++++++
 fs/overlayfs/overlayfs.h  |  4 +++
 fs/overlayfs/super.c      | 69 ++++++++++++++++++++++++++++++++++-----
 fs/overlayfs/util.c       | 21 ++++++++++++
 fs/sync.c                 | 14 +++++---
 include/linux/fs.h        |  1 +
 include/linux/writeback.h |  1 +
 9 files changed, 120 insertions(+), 13 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH v5 01/10] ovl: setup overlayfs' private bdi
  2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
@ 2021-09-23 13:08 ` Chengguang Xu
  2021-09-23 13:08 ` [RFC PATCH v5 02/10] ovl: implement ->writepages operation Chengguang Xu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

Setup overlayfs' private bdi so that we can collect
overlayfs' own dirty inodes.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 178daa5e82c9..51886ba6130a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1980,6 +1980,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	if (!ofs)
 		goto out;
 
+	err = super_setup_bdi(sb);
+	if (err)
+		goto out_err;
+
 	err = -ENOMEM;
 	ofs->creator_cred = cred = prepare_creds();
 	if (!cred)
-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH v5 02/10] ovl: implement ->writepages operation
  2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
  2021-09-23 13:08 ` [RFC PATCH v5 01/10] ovl: setup overlayfs' private bdi Chengguang Xu
@ 2021-09-23 13:08 ` Chengguang Xu
  2021-09-23 13:08 ` [RFC PATCH v5 03/10] ovl: implement overlayfs' ->evict_inode operation Chengguang Xu
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

Implement overlayfs' ->writepages operation so that
we can sync dirty data/metadata to upper filesystem.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/inode.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 832b17589733..d854e59a3710 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -659,9 +659,22 @@ static const struct inode_operations ovl_special_inode_operations = {
 	.update_time	= ovl_update_time,
 };
 
+static int ovl_writepages(struct address_space *mapping,
+			  struct writeback_control *wbc)
+{
+	struct inode *inode = mapping->host;
+	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+	struct inode *upper = ovl_inode_upper(inode);
+
+	if (!ovl_should_sync(ofs))
+		return 0;
+	return filemap_fdatawrite_wbc(upper->i_mapping, wbc);
+}
+
 static const struct address_space_operations ovl_aops = {
 	/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
 	.direct_IO		= noop_direct_IO,
+	.writepages		= ovl_writepages,
 };
 
 /*
-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH v5 03/10] ovl: implement overlayfs' ->evict_inode operation
  2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
  2021-09-23 13:08 ` [RFC PATCH v5 01/10] ovl: setup overlayfs' private bdi Chengguang Xu
  2021-09-23 13:08 ` [RFC PATCH v5 02/10] ovl: implement ->writepages operation Chengguang Xu
@ 2021-09-23 13:08 ` Chengguang Xu
  2021-10-06 15:33   ` Miklos Szeredi
  2021-09-23 13:08 ` [RFC PATCH v5 04/10] ovl: mark overlayfs' inode dirty on modification Chengguang Xu
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

Implement overlayfs' ->evict_inode operation,
so that we can clear dirty flags of overlayfs inode.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 51886ba6130a..2ab77adf7256 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -406,11 +406,18 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
 	return ret;
 }
 
+static void ovl_evict_inode(struct inode *inode)
+{
+	inode->i_state &= ~I_DIRTY_ALL;
+	clear_inode(inode);
+}
+
 static const struct super_operations ovl_super_operations = {
 	.alloc_inode	= ovl_alloc_inode,
 	.free_inode	= ovl_free_inode,
 	.destroy_inode	= ovl_destroy_inode,
 	.drop_inode	= generic_delete_inode,
+	.evict_inode	= ovl_evict_inode,
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
 	.statfs		= ovl_statfs,
-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH v5 04/10] ovl: mark overlayfs' inode dirty on modification
  2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (2 preceding siblings ...)
  2021-09-23 13:08 ` [RFC PATCH v5 03/10] ovl: implement overlayfs' ->evict_inode operation Chengguang Xu
@ 2021-09-23 13:08 ` Chengguang Xu
  2021-10-07 18:43   ` Miklos Szeredi
  2021-09-23 13:08 ` [RFC PATCH v5 05/10] ovl: mark overlayfs' inode dirty on shared mmap Chengguang Xu
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

Mark overlayfs' inode dirty on modification so that
we can recognize and collect target inodes for syncfs.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/inode.c     |  1 +
 fs/overlayfs/overlayfs.h |  4 ++++
 fs/overlayfs/util.c      | 21 +++++++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index d854e59a3710..4a03aceaeedc 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -478,6 +478,7 @@ int ovl_update_time(struct inode *inode, struct timespec64 *ts, int flags)
 		if (upperpath.dentry) {
 			touch_atime(&upperpath);
 			inode->i_atime = d_inode(upperpath.dentry)->i_atime;
+			ovl_mark_inode_dirty(inode);
 		}
 	}
 	return 0;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 3894f3347955..5a016baa06dd 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -276,6 +276,7 @@ static inline bool ovl_allow_offline_changes(struct ovl_fs *ofs)
 
 
 /* util.c */
+void ovl_mark_inode_dirty(struct inode *inode);
 int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
@@ -529,6 +530,9 @@ static inline void ovl_copyattr(struct inode *from, struct inode *to)
 	to->i_mtime = from->i_mtime;
 	to->i_ctime = from->i_ctime;
 	i_size_write(to, i_size_read(from));
+
+	if (ovl_inode_upper(to) && from->i_state & I_DIRTY_ALL)
+		ovl_mark_inode_dirty(to);
 }
 
 /* vfs inode flags copied from real to ovl inode */
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index f48284a2a896..5441eae2e345 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -25,7 +25,14 @@ int ovl_want_write(struct dentry *dentry)
 void ovl_drop_write(struct dentry *dentry)
 {
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	struct dentry *upper;
+
 	mnt_drop_write(ovl_upper_mnt(ofs));
+	if (d_inode(dentry)) {
+		upper = ovl_dentry_upper(dentry);
+		if (upper && d_inode(upper) && d_inode(upper)->i_state & I_DIRTY_ALL)
+			ovl_mark_inode_dirty(d_inode(dentry));
+	}
 }
 
 struct dentry *ovl_workdir(struct dentry *dentry)
@@ -1060,3 +1067,17 @@ int ovl_sync_status(struct ovl_fs *ofs)
 
 	return errseq_check(&mnt->mnt_sb->s_wb_err, ofs->errseq);
 }
+
+/*
+ * We intentionally add I_DIRTY_SYNC flag regardless dirty flag
+ * of upper inode so that we have chance to invoke ->write_inode
+ * to re-dirty overlayfs' inode during writeback process.
+ */
+void ovl_mark_inode_dirty(struct inode *inode)
+{
+	struct inode *upper = ovl_inode_upper(inode);
+	unsigned long iflag = I_DIRTY_SYNC;
+
+	iflag |= upper->i_state & I_DIRTY_ALL;
+	__mark_inode_dirty(inode, iflag);
+}
-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH v5 05/10] ovl: mark overlayfs' inode dirty on shared mmap
  2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (3 preceding siblings ...)
  2021-09-23 13:08 ` [RFC PATCH v5 04/10] ovl: mark overlayfs' inode dirty on modification Chengguang Xu
@ 2021-09-23 13:08 ` Chengguang Xu
  2021-09-23 13:08 ` [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

Overlayfs cannot be notified when mmapped area gets dirty,
so we need to proactively mark inode dirty in ->mmap operation.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/file.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index d081faa55e83..f9dc5249c183 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -482,6 +482,12 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 	revert_creds(old_cred);
 	ovl_file_accessed(file);
 
+	if (!ret) {
+		if (ovl_inode_upper(file_inode(file)) &&
+		    vma->vm_flags & VM_SHARED)
+			ovl_mark_inode_dirty(file_inode(file));
+	}
+
 	return ret;
 }
 
-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (4 preceding siblings ...)
  2021-09-23 13:08 ` [RFC PATCH v5 05/10] ovl: mark overlayfs' inode dirty on shared mmap Chengguang Xu
@ 2021-09-23 13:08 ` Chengguang Xu
  2021-10-07  9:01   ` Jan Kara
  2021-10-07  9:23   ` Miklos Szeredi
  2021-09-23 13:08 ` [RFC PATCH v5 07/10] ovl: cache dirty overlayfs' inode Chengguang Xu
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

Implement overlayfs' ->write_inode to sync dirty data
and redirty overlayfs' inode if necessary.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 2ab77adf7256..cddae3ca2fa5 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -412,12 +412,42 @@ static void ovl_evict_inode(struct inode *inode)
 	clear_inode(inode);
 }
 
+static int ovl_write_inode(struct inode *inode,
+			   struct writeback_control *wbc)
+{
+	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+	struct inode *upper = ovl_inode_upper(inode);
+	unsigned long iflag = 0;
+	int ret = 0;
+
+	if (!upper)
+		return 0;
+
+	if (!ovl_should_sync(ofs))
+		return 0;
+
+	if (upper->i_sb->s_op->write_inode)
+		ret = upper->i_sb->s_op->write_inode(inode, wbc);
+
+	if (mapping_writably_mapped(upper->i_mapping) ||
+	    mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
+		iflag |= I_DIRTY_PAGES;
+
+	iflag |= upper->i_state & I_DIRTY_ALL;
+
+	if (iflag)
+		ovl_mark_inode_dirty(inode);
+
+	return ret;
+}
+
 static const struct super_operations ovl_super_operations = {
 	.alloc_inode	= ovl_alloc_inode,
 	.free_inode	= ovl_free_inode,
 	.destroy_inode	= ovl_destroy_inode,
 	.drop_inode	= generic_delete_inode,
 	.evict_inode	= ovl_evict_inode,
+	.write_inode	= ovl_write_inode,
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
 	.statfs		= ovl_statfs,
-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH v5 07/10] ovl: cache dirty overlayfs' inode
  2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (5 preceding siblings ...)
  2021-09-23 13:08 ` [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
@ 2021-09-23 13:08 ` Chengguang Xu
  2021-10-07 11:09   ` Miklos Szeredi
  2021-09-23 13:08 ` [RFC PATCH v5 08/10] fs: export wait_sb_inodes() Chengguang Xu
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

Now drop overlayfs' inode will sync dirty data,
so we change to only drop clean inode.

The purpose of doing this is to keep compatible
behavior with before because without this change
dropping overlayfs inode will not trigger syncing
of underlying dirty inode.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index cddae3ca2fa5..bf4000eb9be8 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -441,11 +441,25 @@ static int ovl_write_inode(struct inode *inode,
 	return ret;
 }
 
+/*
+ * In iput_final(), clean inode will drop directly and dirty inode will
+ * keep in the cache until write back to sync dirty data then add to lru
+ * list to wait reclaim.
+ */
+static int ovl_drop_inode(struct inode *inode)
+{
+	struct inode *upper = ovl_inode_upper(inode);
+
+	if (!upper || !(inode->i_state & I_DIRTY_ALL))
+		return 1;
+	return generic_drop_inode(inode);
+}
+
 static const struct super_operations ovl_super_operations = {
 	.alloc_inode	= ovl_alloc_inode,
 	.free_inode	= ovl_free_inode,
 	.destroy_inode	= ovl_destroy_inode,
-	.drop_inode	= generic_delete_inode,
+	.drop_inode	= ovl_drop_inode,
 	.evict_inode	= ovl_evict_inode,
 	.write_inode	= ovl_write_inode,
 	.put_super	= ovl_put_super,
-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH v5 08/10] fs: export wait_sb_inodes()
  2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (6 preceding siblings ...)
  2021-09-23 13:08 ` [RFC PATCH v5 07/10] ovl: cache dirty overlayfs' inode Chengguang Xu
@ 2021-09-23 13:08 ` Chengguang Xu
  2021-09-23 13:08 ` [RFC PATCH v5 09/10] fs: introduce new helper sync_fs_and_blockdev() Chengguang Xu
  2021-09-23 13:08 ` [RFC PATCH v5 10/10] ovl: implement containerized syncfs for overlayfs Chengguang Xu
  9 siblings, 0 replies; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

In order to wait syncing upper inodes we need to
call wait_sb_inodes() in overlayfs' ->sync_fs.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/fs-writeback.c         | 3 ++-
 include/linux/writeback.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 81ec192ce067..0438c911241e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2505,7 +2505,7 @@ EXPORT_SYMBOL(__mark_inode_dirty);
  * completed by the time we have gained the lock and waited for all IO that is
  * in progress regardless of the order callers are granted the lock.
  */
-static void wait_sb_inodes(struct super_block *sb)
+void wait_sb_inodes(struct super_block *sb)
 {
 	LIST_HEAD(sync_list);
 
@@ -2589,6 +2589,7 @@ static void wait_sb_inodes(struct super_block *sb)
 	rcu_read_unlock();
 	mutex_unlock(&sb->s_sync_lock);
 }
+EXPORT_SYMBOL(wait_sb_inodes);
 
 static void __writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr,
 				     enum wb_reason reason, bool skip_if_busy)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index d1f65adf6a26..d7aacd0434cf 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -198,6 +198,7 @@ void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi,
 				enum wb_reason reason);
 void inode_wait_for_writeback(struct inode *inode);
 void inode_io_list_del(struct inode *inode);
+void wait_sb_inodes(struct super_block *sb);
 
 /* writeback.h requires fs.h; it, too, is not included from here. */
 static inline void wait_on_inode(struct inode *inode)
-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH v5 09/10] fs: introduce new helper sync_fs_and_blockdev()
  2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (7 preceding siblings ...)
  2021-09-23 13:08 ` [RFC PATCH v5 08/10] fs: export wait_sb_inodes() Chengguang Xu
@ 2021-09-23 13:08 ` Chengguang Xu
  2021-10-19  7:15   ` Amir Goldstein
  2021-09-23 13:08 ` [RFC PATCH v5 10/10] ovl: implement containerized syncfs for overlayfs Chengguang Xu
  9 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

Overlayfs needs to call upper layer's ->sync_fs
and __sync_blockdev() to sync metadata during syncfs(2).

Currently, __sync_blockdev() does not export to module
so introduce new helper sync_fs_and_blockdev() to wrap
those operations.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/sync.c          | 14 ++++++++++----
 include/linux/fs.h |  1 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 1373a610dc78..36c755e6568a 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -21,6 +21,15 @@
 #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \
 			SYNC_FILE_RANGE_WAIT_AFTER)
 
+
+int sync_fs_and_blockdev(struct super_block *sb, int wait)
+{
+	if (sb->s_op->sync_fs)
+		sb->s_op->sync_fs(sb, wait);
+	return __sync_blockdev(sb->s_bdev, wait);
+}
+EXPORT_SYMBOL(sync_fs_and_blockdev);
+
 /*
  * Do the filesystem syncing work. For simple filesystems
  * writeback_inodes_sb(sb) just dirties buffers with inodes so we have to
@@ -34,10 +43,7 @@ static int __sync_filesystem(struct super_block *sb, int wait)
 		sync_inodes_sb(sb);
 	else
 		writeback_inodes_sb(sb, WB_REASON_SYNC);
-
-	if (sb->s_op->sync_fs)
-		sb->s_op->sync_fs(sb, wait);
-	return __sync_blockdev(sb->s_bdev, wait);
+	return sync_fs_and_blockdev(sb, wait);
 }
 
 /*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e7a633353fd2..e5ebf126281c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2777,6 +2777,7 @@ static inline bool sb_is_blkdev_sb(struct super_block *sb)
 
 void emergency_thaw_all(void);
 extern int sync_filesystem(struct super_block *);
+extern int sync_fs_and_blockdev(struct super_block *sb, int wait);
 extern const struct file_operations def_blk_fops;
 extern const struct file_operations def_chr_fops;
 
-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH v5 10/10] ovl: implement containerized syncfs for overlayfs
  2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (8 preceding siblings ...)
  2021-09-23 13:08 ` [RFC PATCH v5 09/10] fs: introduce new helper sync_fs_and_blockdev() Chengguang Xu
@ 2021-09-23 13:08 ` Chengguang Xu
  9 siblings, 0 replies; 61+ messages in thread
From: Chengguang Xu @ 2021-09-23 13:08 UTC (permalink / raw)
  To: miklos, jack, amir73il
  Cc: linux-fsdevel, linux-unionfs, linux-kernel, Chengguang Xu

Now overlayfs can sync proper dirty inodes during syncfs,
so remove unnecessary sync_filesystem() on upper file
system.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index bf4000eb9be8..ef998ada6cb9 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -15,6 +15,7 @@
 #include <linux/seq_file.h>
 #include <linux/posix_acl_xattr.h>
 #include <linux/exportfs.h>
+#include <linux/writeback.h>
 #include "overlayfs.h"
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
@@ -281,18 +282,15 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
 	/*
 	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
 	 * All the super blocks will be iterated, including upper_sb.
-	 *
-	 * If this is a syncfs(2) call, then we do need to call
-	 * sync_filesystem() on upper_sb, but enough if we do it when being
-	 * called with wait == 1.
 	 */
-	if (!wait)
-		return 0;
 
 	upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
 
 	down_read(&upper_sb->s_umount);
-	ret = sync_filesystem(upper_sb);
+	if (wait)
+		wait_sb_inodes(upper_sb);
+
+	ret = sync_fs_and_blockdev(upper_sb, wait);
 	up_read(&upper_sb->s_umount);
 
 	return ret;
-- 
2.27.0



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 03/10] ovl: implement overlayfs' ->evict_inode operation
  2021-09-23 13:08 ` [RFC PATCH v5 03/10] ovl: implement overlayfs' ->evict_inode operation Chengguang Xu
@ 2021-10-06 15:33   ` Miklos Szeredi
  2021-10-07  6:08     ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Miklos Szeredi @ 2021-10-06 15:33 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> Implement overlayfs' ->evict_inode operation,
> so that we can clear dirty flags of overlayfs inode.
>
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> ---
>  fs/overlayfs/super.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 51886ba6130a..2ab77adf7256 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -406,11 +406,18 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
>         return ret;
>  }
>
> +static void ovl_evict_inode(struct inode *inode)
> +{
> +       inode->i_state &= ~I_DIRTY_ALL;
> +       clear_inode(inode);

clear_inode() should already clear the dirty flags; the default
eviction should work fine without having to define an ->evict_inode.
What am I missing?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 03/10] ovl: implement overlayfs' ->evict_inode operation
  2021-10-06 15:33   ` Miklos Szeredi
@ 2021-10-07  6:08     ` Chengguang Xu
  2021-10-07  7:43       ` Miklos Szeredi
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-10-07  6:08 UTC (permalink / raw)
  To: Miklos Szeredi, Chengguang Xu
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

在 2021/10/6 23:33, Miklos Szeredi 写道:
> On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
>> Implement overlayfs' ->evict_inode operation,
>> so that we can clear dirty flags of overlayfs inode.
>>
>> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>> ---
>>   fs/overlayfs/super.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> index 51886ba6130a..2ab77adf7256 100644
>> --- a/fs/overlayfs/super.c
>> +++ b/fs/overlayfs/super.c
>> @@ -406,11 +406,18 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
>>          return ret;
>>   }
>>
>> +static void ovl_evict_inode(struct inode *inode)
>> +{
>> +       inode->i_state &= ~I_DIRTY_ALL;
>> +       clear_inode(inode);
> clear_inode() should already clear the dirty flags; the default
> eviction should work fine without having to define an ->evict_inode.
> What am I missing?

Yeah, you are right, we don't need overlayfs' own ->evict_inode anymore

because we wait all writeback upper inodes in overlayfs' ->sync_fs.


Thanks,

Chengguang



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 03/10] ovl: implement overlayfs' ->evict_inode operation
  2021-10-07  6:08     ` Chengguang Xu
@ 2021-10-07  7:43       ` Miklos Szeredi
  0 siblings, 0 replies; 61+ messages in thread
From: Miklos Szeredi @ 2021-10-07  7:43 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Chengguang Xu, Jan Kara, Amir Goldstein, linux-fsdevel,
	overlayfs, linux-kernel

On Thu, 7 Oct 2021 at 08:08, Chengguang Xu <cgxu519@139.com> wrote:
>
> 在 2021/10/6 23:33, Miklos Szeredi 写道:
> > On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
> >> Implement overlayfs' ->evict_inode operation,
> >> so that we can clear dirty flags of overlayfs inode.
> >>
> >> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> >> ---
> >>   fs/overlayfs/super.c | 7 +++++++
> >>   1 file changed, 7 insertions(+)
> >>
> >> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> >> index 51886ba6130a..2ab77adf7256 100644
> >> --- a/fs/overlayfs/super.c
> >> +++ b/fs/overlayfs/super.c
> >> @@ -406,11 +406,18 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
> >>          return ret;
> >>   }
> >>
> >> +static void ovl_evict_inode(struct inode *inode)
> >> +{
> >> +       inode->i_state &= ~I_DIRTY_ALL;
> >> +       clear_inode(inode);
> > clear_inode() should already clear the dirty flags; the default
> > eviction should work fine without having to define an ->evict_inode.
> > What am I missing?
>
> Yeah, you are right, we don't need overlayfs' own ->evict_inode anymore
>
> because we wait all writeback upper inodes in overlayfs' ->sync_fs.

Okay, I dropped this patch then.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-09-23 13:08 ` [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
@ 2021-10-07  9:01   ` Jan Kara
  2021-10-07 12:26     ` Chengguang Xu
  2021-10-07  9:23   ` Miklos Szeredi
  1 sibling, 1 reply; 61+ messages in thread
From: Jan Kara @ 2021-10-07  9:01 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: miklos, jack, amir73il, linux-fsdevel, linux-unionfs, linux-kernel

On Thu 23-09-21 21:08:10, Chengguang Xu wrote:
> Implement overlayfs' ->write_inode to sync dirty data
> and redirty overlayfs' inode if necessary.
> 
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>

...

> +static int ovl_write_inode(struct inode *inode,
> +			   struct writeback_control *wbc)
> +{
> +	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
> +	struct inode *upper = ovl_inode_upper(inode);
> +	unsigned long iflag = 0;
> +	int ret = 0;
> +
> +	if (!upper)
> +		return 0;
> +
> +	if (!ovl_should_sync(ofs))
> +		return 0;
> +
> +	if (upper->i_sb->s_op->write_inode)
> +		ret = upper->i_sb->s_op->write_inode(inode, wbc);
> +

I'm somewhat confused here. 'inode' is overlayfs inode AFAIU, so how is it
correct to pass it to ->write_inode function of upper filesystem? Shouldn't
you pass 'upper' there instead?

> +	if (mapping_writably_mapped(upper->i_mapping) ||
> +	    mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
> +		iflag |= I_DIRTY_PAGES;
> +
> +	iflag |= upper->i_state & I_DIRTY_ALL;

Also since you call ->write_inode directly upper->i_state won't be updated
to reflect that inode has been written out (I_DIRTY flags get cleared in
__writeback_single_inode()). So it seems to me overlayfs will keep writing
out upper inode until flush worker on upper filesystem also writes the
inode and clears the dirty flags? So you rather need to call something like
write_inode_now() that will handle the flag clearing and do writeback list
handling for you?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-09-23 13:08 ` [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
  2021-10-07  9:01   ` Jan Kara
@ 2021-10-07  9:23   ` Miklos Szeredi
  2021-10-07 12:28     ` Chengguang Xu
  1 sibling, 1 reply; 61+ messages in thread
From: Miklos Szeredi @ 2021-10-07  9:23 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> Implement overlayfs' ->write_inode to sync dirty data
> and redirty overlayfs' inode if necessary.
>
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> ---
>  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 2ab77adf7256..cddae3ca2fa5 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -412,12 +412,42 @@ static void ovl_evict_inode(struct inode *inode)
>         clear_inode(inode);
>  }
>
> +static int ovl_write_inode(struct inode *inode,
> +                          struct writeback_control *wbc)
> +{
> +       struct ovl_fs *ofs = inode->i_sb->s_fs_info;
> +       struct inode *upper = ovl_inode_upper(inode);
> +       unsigned long iflag = 0;
> +       int ret = 0;
> +
> +       if (!upper)
> +               return 0;
> +
> +       if (!ovl_should_sync(ofs))
> +               return 0;
> +
> +       if (upper->i_sb->s_op->write_inode)
> +               ret = upper->i_sb->s_op->write_inode(inode, wbc);

Where is page writeback on upper inode triggered?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 07/10] ovl: cache dirty overlayfs' inode
  2021-09-23 13:08 ` [RFC PATCH v5 07/10] ovl: cache dirty overlayfs' inode Chengguang Xu
@ 2021-10-07 11:09   ` Miklos Szeredi
  2021-10-07 12:04     ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Miklos Szeredi @ 2021-10-07 11:09 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> Now drop overlayfs' inode will sync dirty data,
> so we change to only drop clean inode.
>
> The purpose of doing this is to keep compatible
> behavior with before because without this change
> dropping overlayfs inode will not trigger syncing
> of underlying dirty inode.
>
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> ---
>  fs/overlayfs/super.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index cddae3ca2fa5..bf4000eb9be8 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -441,11 +441,25 @@ static int ovl_write_inode(struct inode *inode,
>         return ret;
>  }
>
> +/*
> + * In iput_final(), clean inode will drop directly and dirty inode will
> + * keep in the cache until write back to sync dirty data then add to lru
> + * list to wait reclaim.
> + */
> +static int ovl_drop_inode(struct inode *inode)
> +{
> +       struct inode *upper = ovl_inode_upper(inode);
> +
> +       if (!upper || !(inode->i_state & I_DIRTY_ALL))

Could we check upper dirtyness here? That would give a more precise result.

Alternatively don't set .drop_inode (i.e. use generic_drop_inode())
and set I_DONTCACHE on overlay inodes.  That would cause the upper
inode to be always written back before eviction.

The latter would result in simpler logic, and I think performance-wise
it wouldn't matter.   But I may be missing something.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 07/10] ovl: cache dirty overlayfs' inode
  2021-10-07 11:09   ` Miklos Szeredi
@ 2021-10-07 12:04     ` Chengguang Xu
  2021-10-07 12:27       ` Miklos Szeredi
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-10-07 12:04 UTC (permalink / raw)
  To: Miklos Szeredi, Chengguang Xu
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

在 2021/10/7 19:09, Miklos Szeredi 写道:
> On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
>> Now drop overlayfs' inode will sync dirty data,
>> so we change to only drop clean inode.
>>
>> The purpose of doing this is to keep compatible
>> behavior with before because without this change
>> dropping overlayfs inode will not trigger syncing
>> of underlying dirty inode.
>>
>> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>> ---
>>   fs/overlayfs/super.c | 16 +++++++++++++++-
>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> index cddae3ca2fa5..bf4000eb9be8 100644
>> --- a/fs/overlayfs/super.c
>> +++ b/fs/overlayfs/super.c
>> @@ -441,11 +441,25 @@ static int ovl_write_inode(struct inode *inode,
>>          return ret;
>>   }
>>
>> +/*
>> + * In iput_final(), clean inode will drop directly and dirty inode will
>> + * keep in the cache until write back to sync dirty data then add to lru
>> + * list to wait reclaim.
>> + */
>> +static int ovl_drop_inode(struct inode *inode)
>> +{
>> +       struct inode *upper = ovl_inode_upper(inode);
>> +
>> +       if (!upper || !(inode->i_state & I_DIRTY_ALL))
> Could we check upper dirtyness here? That would give a more precise result.

We keep tracking mmapped-file(shared mode) by explicitely marking 
overlay inode dirty,

so if we drop overlay inode by checking upper dirtyness, we may lose 
control on those mmapped upper inodes.

>
> Alternatively don't set .drop_inode (i.e. use generic_drop_inode())
> and set I_DONTCACHE on overlay inodes.  That would cause the upper
> inode to be always written back before eviction.
>
> The latter would result in simpler logic, and I think performance-wise
> it wouldn't matter.  But I may be missing something.

I think we may seperate mmapped-file(shared) inode and other inode by

clear/set I_DONTCACHE flag on overlay inode if you prefer this approach.


Thanks,

Chengguang






^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07  9:01   ` Jan Kara
@ 2021-10-07 12:26     ` Chengguang Xu
  2021-10-07 14:41       ` Jan Kara
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-10-07 12:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: miklos, amir73il, linux-fsdevel, linux-unionfs, linux-kernel

 ---- 在 星期四, 2021-10-07 17:01:57 Jan Kara <jack@suse.cz> 撰写 ----
 > On Thu 23-09-21 21:08:10, Chengguang Xu wrote:
 > > Implement overlayfs' ->write_inode to sync dirty data
 > > and redirty overlayfs' inode if necessary.
 > > 
 > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > 
 > ...
 > 
 > > +static int ovl_write_inode(struct inode *inode,
 > > +               struct writeback_control *wbc)
 > > +{
 > > +    struct ovl_fs *ofs = inode->i_sb->s_fs_info;
 > > +    struct inode *upper = ovl_inode_upper(inode);
 > > +    unsigned long iflag = 0;
 > > +    int ret = 0;
 > > +
 > > +    if (!upper)
 > > +        return 0;
 > > +
 > > +    if (!ovl_should_sync(ofs))
 > > +        return 0;
 > > +
 > > +    if (upper->i_sb->s_op->write_inode)
 > > +        ret = upper->i_sb->s_op->write_inode(inode, wbc);
 > > +
 > 
 > I'm somewhat confused here. 'inode' is overlayfs inode AFAIU, so how is it
 > correct to pass it to ->write_inode function of upper filesystem? Shouldn't
 > you pass 'upper' there instead?

That's right!

 > 
 > > +    if (mapping_writably_mapped(upper->i_mapping) ||
 > > +        mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
 > > +        iflag |= I_DIRTY_PAGES;
 > > +
 > > +    iflag |= upper->i_state & I_DIRTY_ALL;
 > 
 > Also since you call ->write_inode directly upper->i_state won't be updated
 > to reflect that inode has been written out (I_DIRTY flags get cleared in
 > __writeback_single_inode()). So it seems to me overlayfs will keep writing
 > out upper inode until flush worker on upper filesystem also writes the
 > inode and clears the dirty flags? So you rather need to call something like
 > write_inode_now() that will handle the flag clearing and do writeback list
 > handling for you?
 > 

Calling ->write_inode directly upper->i_state won't be updated, 
however, I don't think overlayfs will keep writing out upper inode since ->write_inode
will be called when only overlay inode itself marked dirty.  Am I missing something?


Thanks,
Chengguang



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 07/10] ovl: cache dirty overlayfs' inode
  2021-10-07 12:04     ` Chengguang Xu
@ 2021-10-07 12:27       ` Miklos Szeredi
  0 siblings, 0 replies; 61+ messages in thread
From: Miklos Szeredi @ 2021-10-07 12:27 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Chengguang Xu, Jan Kara, Amir Goldstein, linux-fsdevel,
	overlayfs, linux-kernel

On Thu, 7 Oct 2021 at 14:04, Chengguang Xu <cgxu519@139.com> wrote:
>
> 在 2021/10/7 19:09, Miklos Szeredi 写道:
> > On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
> >> Now drop overlayfs' inode will sync dirty data,
> >> so we change to only drop clean inode.
> >>
> >> The purpose of doing this is to keep compatible
> >> behavior with before because without this change
> >> dropping overlayfs inode will not trigger syncing
> >> of underlying dirty inode.
> >>
> >> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> >> ---
> >>   fs/overlayfs/super.c | 16 +++++++++++++++-
> >>   1 file changed, 15 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> >> index cddae3ca2fa5..bf4000eb9be8 100644
> >> --- a/fs/overlayfs/super.c
> >> +++ b/fs/overlayfs/super.c
> >> @@ -441,11 +441,25 @@ static int ovl_write_inode(struct inode *inode,
> >>          return ret;
> >>   }
> >>
> >> +/*
> >> + * In iput_final(), clean inode will drop directly and dirty inode will
> >> + * keep in the cache until write back to sync dirty data then add to lru
> >> + * list to wait reclaim.
> >> + */
> >> +static int ovl_drop_inode(struct inode *inode)
> >> +{
> >> +       struct inode *upper = ovl_inode_upper(inode);
> >> +
> >> +       if (!upper || !(inode->i_state & I_DIRTY_ALL))
> > Could we check upper dirtyness here? That would give a more precise result.
>
> We keep tracking mmapped-file(shared mode) by explicitely marking
> overlay inode dirty,
>
> so if we drop overlay inode by checking upper dirtyness, we may lose
> control on those mmapped upper inodes.

That's fine, since there are no more mmaps at this point.

> >
> > Alternatively don't set .drop_inode (i.e. use generic_drop_inode())
> > and set I_DONTCACHE on overlay inodes.  That would cause the upper
> > inode to be always written back before eviction.
> >
> > The latter would result in simpler logic, and I think performance-wise
> > it wouldn't matter.  But I may be missing something.
>
> I think we may seperate mmapped-file(shared) inode and other inode by
>
> clear/set I_DONTCACHE flag on overlay inode if you prefer this approach.

Same reasoning here: after upper inode is written out, the dirtyness
in the overlay inode doesn't matter since there cannot be any active
mmaps.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07  9:23   ` Miklos Szeredi
@ 2021-10-07 12:28     ` Chengguang Xu
  2021-10-07 12:45       ` Miklos Szeredi
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-10-07 12:28 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

 ---- 在 星期四, 2021-10-07 17:23:06 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > > Implement overlayfs' ->write_inode to sync dirty data
 > > and redirty overlayfs' inode if necessary.
 > >
 > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > > ---
 > >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 > >  1 file changed, 30 insertions(+)
 > >
 > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
 > > index 2ab77adf7256..cddae3ca2fa5 100644
 > > --- a/fs/overlayfs/super.c
 > > +++ b/fs/overlayfs/super.c
 > > @@ -412,12 +412,42 @@ static void ovl_evict_inode(struct inode *inode)
 > >         clear_inode(inode);
 > >  }
 > >
 > > +static int ovl_write_inode(struct inode *inode,
 > > +                          struct writeback_control *wbc)
 > > +{
 > > +       struct ovl_fs *ofs = inode->i_sb->s_fs_info;
 > > +       struct inode *upper = ovl_inode_upper(inode);
 > > +       unsigned long iflag = 0;
 > > +       int ret = 0;
 > > +
 > > +       if (!upper)
 > > +               return 0;
 > > +
 > > +       if (!ovl_should_sync(ofs))
 > > +               return 0;
 > > +
 > > +       if (upper->i_sb->s_op->write_inode)
 > > +               ret = upper->i_sb->s_op->write_inode(inode, wbc);
 > 
 > Where is page writeback on upper inode triggered?
 > 

Should pass upper inode instead of overlay inode here.

Thanks,
Chengguang



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07 12:28     ` Chengguang Xu
@ 2021-10-07 12:45       ` Miklos Szeredi
  2021-10-07 13:09         ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Miklos Szeredi @ 2021-10-07 12:45 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

On Thu, 7 Oct 2021 at 14:28, Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期四, 2021-10-07 17:23:06 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
>  > On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >
>  > > Implement overlayfs' ->write_inode to sync dirty data
>  > > and redirty overlayfs' inode if necessary.
>  > >
>  > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>  > > ---
>  > >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
>  > >  1 file changed, 30 insertions(+)
>  > >
>  > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>  > > index 2ab77adf7256..cddae3ca2fa5 100644
>  > > --- a/fs/overlayfs/super.c
>  > > +++ b/fs/overlayfs/super.c
>  > > @@ -412,12 +412,42 @@ static void ovl_evict_inode(struct inode *inode)
>  > >         clear_inode(inode);
>  > >  }
>  > >
>  > > +static int ovl_write_inode(struct inode *inode,
>  > > +                          struct writeback_control *wbc)
>  > > +{
>  > > +       struct ovl_fs *ofs = inode->i_sb->s_fs_info;
>  > > +       struct inode *upper = ovl_inode_upper(inode);
>  > > +       unsigned long iflag = 0;
>  > > +       int ret = 0;
>  > > +
>  > > +       if (!upper)
>  > > +               return 0;
>  > > +
>  > > +       if (!ovl_should_sync(ofs))
>  > > +               return 0;
>  > > +
>  > > +       if (upper->i_sb->s_op->write_inode)
>  > > +               ret = upper->i_sb->s_op->write_inode(inode, wbc);
>  >
>  > Where is page writeback on upper inode triggered?
>  >
>
> Should pass upper inode instead of overlay inode here.

That's true and it does seem to indicate lack of thorough testing.

However that wasn't what I was asking about.  AFAICS ->write_inode()
won't start write back for dirty pages.   Maybe I'm missing something,
but there it looks as if nothing will actually trigger writeback for
dirty pages in upper inode.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07 12:45       ` Miklos Szeredi
@ 2021-10-07 13:09         ` Chengguang Xu
  2021-10-07 13:34           ` Miklos Szeredi
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-10-07 13:09 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

 ---- 在 星期四, 2021-10-07 20:45:20 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > On Thu, 7 Oct 2021 at 14:28, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期四, 2021-10-07 17:23:06 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > >  > On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >
 > >  > > Implement overlayfs' ->write_inode to sync dirty data
 > >  > > and redirty overlayfs' inode if necessary.
 > >  > >
 > >  > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > >  > > ---
 > >  > >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 > >  > >  1 file changed, 30 insertions(+)
 > >  > >
 > >  > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
 > >  > > index 2ab77adf7256..cddae3ca2fa5 100644
 > >  > > --- a/fs/overlayfs/super.c
 > >  > > +++ b/fs/overlayfs/super.c
 > >  > > @@ -412,12 +412,42 @@ static void ovl_evict_inode(struct inode *inode)
 > >  > >         clear_inode(inode);
 > >  > >  }
 > >  > >
 > >  > > +static int ovl_write_inode(struct inode *inode,
 > >  > > +                          struct writeback_control *wbc)
 > >  > > +{
 > >  > > +       struct ovl_fs *ofs = inode->i_sb->s_fs_info;
 > >  > > +       struct inode *upper = ovl_inode_upper(inode);
 > >  > > +       unsigned long iflag = 0;
 > >  > > +       int ret = 0;
 > >  > > +
 > >  > > +       if (!upper)
 > >  > > +               return 0;
 > >  > > +
 > >  > > +       if (!ovl_should_sync(ofs))
 > >  > > +               return 0;
 > >  > > +
 > >  > > +       if (upper->i_sb->s_op->write_inode)
 > >  > > +               ret = upper->i_sb->s_op->write_inode(inode, wbc);
 > >  >
 > >  > Where is page writeback on upper inode triggered?
 > >  >
 > >
 > > Should pass upper inode instead of overlay inode here.
 > 
 > That's true and it does seem to indicate lack of thorough testing.

It's a little bit weird this passed all overlay cases and generic/474(syncfs) without errors in xfstests.
Please let me do more diagnosis on this and strengthen the test case.


 > 
 > However that wasn't what I was asking about.  AFAICS ->write_inode()
 > won't start write back for dirty pages.   Maybe I'm missing something,
 > but there it looks as if nothing will actually trigger writeback for
 > dirty pages in upper inode.
 > 

Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).

Thanks,
Chengguang




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07 13:09         ` Chengguang Xu
@ 2021-10-07 13:34           ` Miklos Szeredi
  2021-10-07 14:46             ` Jan Kara
  2021-11-16  2:20             ` Chengguang Xu
  0 siblings, 2 replies; 61+ messages in thread
From: Miklos Szeredi @ 2021-10-07 13:34 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > However that wasn't what I was asking about.  AFAICS ->write_inode()
>  > won't start write back for dirty pages.   Maybe I'm missing something,
>  > but there it looks as if nothing will actually trigger writeback for
>  > dirty pages in upper inode.
>  >
>
> Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
> overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).

Right.

But wouldn't it be simpler to do this from ->write_inode()?

I.e. call write_inode_now() as suggested by Jan.

Also could just call mark_inode_dirty() on the overlay inode
regardless of the dirty flags on the upper inode since it shouldn't
matter and results in simpler logic.

Thanks,
Miklos


>
> Thanks,
> Chengguang
>
>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07 12:26     ` Chengguang Xu
@ 2021-10-07 14:41       ` Jan Kara
  2021-10-07 14:54         ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Jan Kara @ 2021-10-07 14:41 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, miklos, amir73il, linux-fsdevel, linux-unionfs, linux-kernel

On Thu 07-10-21 20:26:36, Chengguang Xu wrote:
>  ---- 在 星期四, 2021-10-07 17:01:57 Jan Kara <jack@suse.cz> 撰写 ----
>  > 
>  > > +    if (mapping_writably_mapped(upper->i_mapping) ||
>  > > +        mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
>  > > +        iflag |= I_DIRTY_PAGES;
>  > > +
>  > > +    iflag |= upper->i_state & I_DIRTY_ALL;
>  > 
>  > Also since you call ->write_inode directly upper->i_state won't be updated
>  > to reflect that inode has been written out (I_DIRTY flags get cleared in
>  > __writeback_single_inode()). So it seems to me overlayfs will keep writing
>  > out upper inode until flush worker on upper filesystem also writes the
>  > inode and clears the dirty flags? So you rather need to call something like
>  > write_inode_now() that will handle the flag clearing and do writeback list
>  > handling for you?
>  > 
> 
> Calling ->write_inode directly upper->i_state won't be updated, however,
> I don't think overlayfs will keep writing out upper inode since
> ->write_inode will be called when only overlay inode itself marked dirty.
> Am I missing something?

Well, if upper->i_state is not updated, you are more or less guaranteed
upper->i_state & I_DIRTY_ALL != 0 and thus even overlay inode stays dirty.
And thus next time writeback runs you will see dirty overlay inode and
writeback the upper inode again although it is not necessary.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07 13:34           ` Miklos Szeredi
@ 2021-10-07 14:46             ` Jan Kara
  2021-10-07 14:53               ` Chengguang Xu
  2021-11-16  2:20             ` Chengguang Xu
  1 sibling, 1 reply; 61+ messages in thread
From: Jan Kara @ 2021-10-07 14:46 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Chengguang Xu, Jan Kara, Amir Goldstein, linux-fsdevel,
	overlayfs, linux-kernel

On Thu 07-10-21 15:34:19, Miklos Szeredi wrote:
> On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
> >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
> >  > won't start write back for dirty pages.   Maybe I'm missing something,
> >  > but there it looks as if nothing will actually trigger writeback for
> >  > dirty pages in upper inode.
> >  >
> >
> > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
> > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
> 
> Right.
> 
> But wouldn't it be simpler to do this from ->write_inode()?

You could but then you'd have to make sure you have I_DIRTY_SYNC always set
when I_DIRTY_PAGES is set on the upper inode so that your ->write_inode()
callback gets called. Overall I agree the logic would be probably simpler.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07 14:46             ` Jan Kara
@ 2021-10-07 14:53               ` Chengguang Xu
  2021-10-07 18:51                 ` Miklos Szeredi
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-10-07 14:53 UTC (permalink / raw)
  To: Jan Kara, Miklos Szeredi
  Cc: Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

 ---- 在 星期四, 2021-10-07 22:46:46 Jan Kara <jack@suse.cz> 撰写 ----
 > On Thu 07-10-21 15:34:19, Miklos Szeredi wrote:
 > > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
 > > >  > won't start write back for dirty pages.   Maybe I'm missing something,
 > > >  > but there it looks as if nothing will actually trigger writeback for
 > > >  > dirty pages in upper inode.
 > > >  >
 > > >
 > > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
 > > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
 > > 
 > > Right.
 > > 
 > > But wouldn't it be simpler to do this from ->write_inode()?
 > 
 > You could but then you'd have to make sure you have I_DIRTY_SYNC always set
 > when I_DIRTY_PAGES is set on the upper inode so that your ->write_inode()
 > callback gets called. Overall I agree the logic would be probably simpler.
 > 

Hi Jan, Miklos

Thnaks for your suggestions. Let me have a try in next version.


Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07 14:41       ` Jan Kara
@ 2021-10-07 14:54         ` Chengguang Xu
  0 siblings, 0 replies; 61+ messages in thread
From: Chengguang Xu @ 2021-10-07 14:54 UTC (permalink / raw)
  To: Jan Kara; +Cc: miklos, amir73il, linux-fsdevel, linux-unionfs, linux-kernel


 ---- 在 星期四, 2021-10-07 22:41:56 Jan Kara <jack@suse.cz> 撰写 ----
 > On Thu 07-10-21 20:26:36, Chengguang Xu wrote:
 > >  ---- 在 星期四, 2021-10-07 17:01:57 Jan Kara <jack@suse.cz> 撰写 ----
 > >  > 
 > >  > > +    if (mapping_writably_mapped(upper->i_mapping) ||
 > >  > > +        mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
 > >  > > +        iflag |= I_DIRTY_PAGES;
 > >  > > +
 > >  > > +    iflag |= upper->i_state & I_DIRTY_ALL;
 > >  > 
 > >  > Also since you call ->write_inode directly upper->i_state won't be updated
 > >  > to reflect that inode has been written out (I_DIRTY flags get cleared in
 > >  > __writeback_single_inode()). So it seems to me overlayfs will keep writing
 > >  > out upper inode until flush worker on upper filesystem also writes the
 > >  > inode and clears the dirty flags? So you rather need to call something like
 > >  > write_inode_now() that will handle the flag clearing and do writeback list
 > >  > handling for you?
 > >  > 
 > > 
 > > Calling ->write_inode directly upper->i_state won't be updated, however,
 > > I don't think overlayfs will keep writing out upper inode since
 > > ->write_inode will be called when only overlay inode itself marked dirty.
 > > Am I missing something?
 > 
 > Well, if upper->i_state is not updated, you are more or less guaranteed
 > upper->i_state & I_DIRTY_ALL != 0 and thus even overlay inode stays dirty.
 > And thus next time writeback runs you will see dirty overlay inode and
 > writeback the upper inode again although it is not necessary.
 > 

Hi Jan,

Yes, I get the point now. Thanks for the explanation.


Thanks,
Chengguang





^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 04/10] ovl: mark overlayfs' inode dirty on modification
  2021-09-23 13:08 ` [RFC PATCH v5 04/10] ovl: mark overlayfs' inode dirty on modification Chengguang Xu
@ 2021-10-07 18:43   ` Miklos Szeredi
  0 siblings, 0 replies; 61+ messages in thread
From: Miklos Szeredi @ 2021-10-07 18:43 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

On Thu, 23 Sept 2021 at 15:08, Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> Mark overlayfs' inode dirty on modification so that
> we can recognize and collect target inodes for syncfs.
>
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> ---
>  fs/overlayfs/inode.c     |  1 +
>  fs/overlayfs/overlayfs.h |  4 ++++
>  fs/overlayfs/util.c      | 21 +++++++++++++++++++++
>  3 files changed, 26 insertions(+)
>
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index d854e59a3710..4a03aceaeedc 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -478,6 +478,7 @@ int ovl_update_time(struct inode *inode, struct timespec64 *ts, int flags)
>                 if (upperpath.dentry) {
>                         touch_atime(&upperpath);
>                         inode->i_atime = d_inode(upperpath.dentry)->i_atime;
> +                       ovl_mark_inode_dirty(inode);
>                 }
>         }
>         return 0;
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 3894f3347955..5a016baa06dd 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -276,6 +276,7 @@ static inline bool ovl_allow_offline_changes(struct ovl_fs *ofs)
>
>
>  /* util.c */
> +void ovl_mark_inode_dirty(struct inode *inode);
>  int ovl_want_write(struct dentry *dentry);
>  void ovl_drop_write(struct dentry *dentry);
>  struct dentry *ovl_workdir(struct dentry *dentry);
> @@ -529,6 +530,9 @@ static inline void ovl_copyattr(struct inode *from, struct inode *to)
>         to->i_mtime = from->i_mtime;
>         to->i_ctime = from->i_ctime;
>         i_size_write(to, i_size_read(from));
> +
> +       if (ovl_inode_upper(to) && from->i_state & I_DIRTY_ALL)
> +               ovl_mark_inode_dirty(to);

I'd be more comfortable with calling ovl_mark_inode_dirty() unconditionally.

Checking if there's an upper seems to make no sense, since we should
only be copying the attributes if something was changed, and then it
is an upper inode.

Checking dirty flags on upper inode actually makes this racy:

  - upper inode dirtied through overlayfs
  - inode writeback starts (e.g. background writeback) on upper inode
  - dirty flags are cleared
  - check for dirty flags in upper inode above indicates not dirty,
ovl inode not dirtied
  - syncfs called, misses this inode
  - inode writeback completed after syncfs

>  }
>
>  /* vfs inode flags copied from real to ovl inode */
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index f48284a2a896..5441eae2e345 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -25,7 +25,14 @@ int ovl_want_write(struct dentry *dentry)
>  void ovl_drop_write(struct dentry *dentry)
>  {
>         struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
> +       struct dentry *upper;
> +
>         mnt_drop_write(ovl_upper_mnt(ofs));
> +       if (d_inode(dentry)) {
> +               upper = ovl_dentry_upper(dentry);
> +               if (upper && d_inode(upper) && d_inode(upper)->i_state & I_DIRTY_ALL)
> +                       ovl_mark_inode_dirty(d_inode(dentry));

ovl_want_write/ovl_drop_write means modification of the upper
filesystem.  It may or may not be the given dentry, so this is not the
right place to clall ovl_mark_inode_dirty IMO.  Better check all
instances of these and see if there are cases where ovl_copyattr()
doesn't handle inode dirtying, and do it explicitly there.


> +       }
>  }
>
>  struct dentry *ovl_workdir(struct dentry *dentry)
> @@ -1060,3 +1067,17 @@ int ovl_sync_status(struct ovl_fs *ofs)
>
>         return errseq_check(&mnt->mnt_sb->s_wb_err, ofs->errseq);
>  }
> +
> +/*
> + * We intentionally add I_DIRTY_SYNC flag regardless dirty flag
> + * of upper inode so that we have chance to invoke ->write_inode
> + * to re-dirty overlayfs' inode during writeback process.
> + */
> +void ovl_mark_inode_dirty(struct inode *inode)
> +{
> +       struct inode *upper = ovl_inode_upper(inode);
> +       unsigned long iflag = I_DIRTY_SYNC;
> +
> +       iflag |= upper->i_state & I_DIRTY_ALL;
> +       __mark_inode_dirty(inode, iflag);
> +}

I think ovl_mark_inode_dirty()  can just call mark_inode_dirty().
And so that can go in "overlayfs.h" file as static inline.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07 14:53               ` Chengguang Xu
@ 2021-10-07 18:51                 ` Miklos Szeredi
  2021-10-08 13:13                   ` Jan Kara
  0 siblings, 1 reply; 61+ messages in thread
From: Miklos Szeredi @ 2021-10-07 18:51 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

On Thu, 7 Oct 2021 at 16:53, Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期四, 2021-10-07 22:46:46 Jan Kara <jack@suse.cz> 撰写 ----
>  > On Thu 07-10-21 15:34:19, Miklos Szeredi wrote:
>  > > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
>  > > >  > won't start write back for dirty pages.   Maybe I'm missing something,
>  > > >  > but there it looks as if nothing will actually trigger writeback for
>  > > >  > dirty pages in upper inode.
>  > > >  >
>  > > >
>  > > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
>  > > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
>  > >
>  > > Right.
>  > >
>  > > But wouldn't it be simpler to do this from ->write_inode()?
>  >
>  > You could but then you'd have to make sure you have I_DIRTY_SYNC always set
>  > when I_DIRTY_PAGES is set on the upper inode so that your ->write_inode()
>  > callback gets called. Overall I agree the logic would be probably simpler.
>  >
>

And it's not just for simplicity.  The I_SYNC logic in
writeback_single_inode() is actually necessary to prevent races
between instances on a specific inode.  I.e. if inode writeback is
started by background wb then syncfs needs to synchronize with that
otherwise it will miss the inode, or worse, mess things up by calling
->write_inode() multiple times in parallel.  So going throught
writeback_single_inode() is actually a must AFAICS.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07 18:51                 ` Miklos Szeredi
@ 2021-10-08 13:13                   ` Jan Kara
  0 siblings, 0 replies; 61+ messages in thread
From: Jan Kara @ 2021-10-08 13:13 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Chengguang Xu, Jan Kara, Amir Goldstein, linux-fsdevel,
	overlayfs, linux-kernel

On Thu 07-10-21 20:51:47, Miklos Szeredi wrote:
> On Thu, 7 Oct 2021 at 16:53, Chengguang Xu <cgxu519@mykernel.net> wrote:
> >
> >  ---- 在 星期四, 2021-10-07 22:46:46 Jan Kara <jack@suse.cz> 撰写 ----
> >  > On Thu 07-10-21 15:34:19, Miklos Szeredi wrote:
> >  > > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
> >  > > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
> >  > > >  > won't start write back for dirty pages.   Maybe I'm missing something,
> >  > > >  > but there it looks as if nothing will actually trigger writeback for
> >  > > >  > dirty pages in upper inode.
> >  > > >  >
> >  > > >
> >  > > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
> >  > > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
> >  > >
> >  > > Right.
> >  > >
> >  > > But wouldn't it be simpler to do this from ->write_inode()?
> >  >
> >  > You could but then you'd have to make sure you have I_DIRTY_SYNC always set
> >  > when I_DIRTY_PAGES is set on the upper inode so that your ->write_inode()
> >  > callback gets called. Overall I agree the logic would be probably simpler.
> >  >
> >
> 
> And it's not just for simplicity.  The I_SYNC logic in
> writeback_single_inode() is actually necessary to prevent races
> between instances on a specific inode.  I.e. if inode writeback is
> started by background wb then syncfs needs to synchronize with that
> otherwise it will miss the inode, or worse, mess things up by calling
> ->write_inode() multiple times in parallel.  So going throught
> writeback_single_inode() is actually a must AFAICS.

Yes, you are correct.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 09/10] fs: introduce new helper sync_fs_and_blockdev()
  2021-09-23 13:08 ` [RFC PATCH v5 09/10] fs: introduce new helper sync_fs_and_blockdev() Chengguang Xu
@ 2021-10-19  7:15   ` Amir Goldstein
  2021-11-15 11:39     ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Amir Goldstein @ 2021-10-19  7:15 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Miklos Szeredi, Jan Kara, linux-fsdevel, overlayfs, linux-kernel

On Thu, Sep 23, 2021 at 4:08 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> Overlayfs needs to call upper layer's ->sync_fs
> and __sync_blockdev() to sync metadata during syncfs(2).
>
> Currently, __sync_blockdev() does not export to module
> so introduce new helper sync_fs_and_blockdev() to wrap
> those operations.

Heads up: looks like __sync_blockdev() will be gone soon,
but you will have other exported symbols that overlayfs can use

https://lore.kernel.org/linux-fsdevel/20211019062530.2174626-1-hch@lst.de/T/

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 09/10] fs: introduce new helper sync_fs_and_blockdev()
  2021-10-19  7:15   ` Amir Goldstein
@ 2021-11-15 11:39     ` Chengguang Xu
  0 siblings, 0 replies; 61+ messages in thread
From: Chengguang Xu @ 2021-11-15 11:39 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Jan Kara, linux-fsdevel, overlayfs, linux-kernel


 ---- 在 星期二, 2021-10-19 15:15:28 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Thu, Sep 23, 2021 at 4:08 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > > Overlayfs needs to call upper layer's ->sync_fs
 > > and __sync_blockdev() to sync metadata during syncfs(2).
 > >
 > > Currently, __sync_blockdev() does not export to module
 > > so introduce new helper sync_fs_and_blockdev() to wrap
 > > those operations.
 > 
 > Heads up: looks like __sync_blockdev() will be gone soon,
 > but you will have other exported symbols that overlayfs can use
 > 
 > https://lore.kernel.org/linux-fsdevel/20211019062530.2174626-1-hch@lst.de/T/
 > 

Hi Amir,

Thanks for the information.

Chengguang,

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-10-07 13:34           ` Miklos Szeredi
  2021-10-07 14:46             ` Jan Kara
@ 2021-11-16  2:20             ` Chengguang Xu
  2021-11-16 12:35               ` Miklos Szeredi
  1 sibling, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-11-16  2:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

 ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
 > >  > won't start write back for dirty pages.   Maybe I'm missing something,
 > >  > but there it looks as if nothing will actually trigger writeback for
 > >  > dirty pages in upper inode.
 > >  >
 > >
 > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
 > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
 > 
 > Right.
 > 
 > But wouldn't it be simpler to do this from ->write_inode()?
 > 
 > I.e. call write_inode_now() as suggested by Jan.
 > 
 > Also could just call mark_inode_dirty() on the overlay inode
 > regardless of the dirty flags on the upper inode since it shouldn't
 > matter and results in simpler logic.
 > 

Hi Miklos,

Sorry for delayed response for this, I've been busy with another project.

I agree with your suggesion above and further more how about just mark overlay inode dirty
when it has upper inode? This approach will make marking dirtiness simple enough.

Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-16  2:20             ` Chengguang Xu
@ 2021-11-16 12:35               ` Miklos Szeredi
  2021-11-17  6:11                 ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Miklos Szeredi @ 2021-11-16 12:35 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

On Tue, 16 Nov 2021 at 03:20, Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
>  > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
>  > >  > won't start write back for dirty pages.   Maybe I'm missing something,
>  > >  > but there it looks as if nothing will actually trigger writeback for
>  > >  > dirty pages in upper inode.
>  > >  >
>  > >
>  > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
>  > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
>  >
>  > Right.
>  >
>  > But wouldn't it be simpler to do this from ->write_inode()?
>  >
>  > I.e. call write_inode_now() as suggested by Jan.
>  >
>  > Also could just call mark_inode_dirty() on the overlay inode
>  > regardless of the dirty flags on the upper inode since it shouldn't
>  > matter and results in simpler logic.
>  >
>
> Hi Miklos,
>
> Sorry for delayed response for this, I've been busy with another project.
>
> I agree with your suggesion above and further more how about just mark overlay inode dirty
> when it has upper inode? This approach will make marking dirtiness simple enough.

Are you suggesting that all non-lower overlay inodes should always be dirty?

The logic would be simple, no doubt, but there's the cost to walking
those overlay inodes which don't have a dirty upper inode, right?  Can
you quantify this cost with a benchmark?  Can be totally synthetic,
e.g. lookup a million upper files without modifying them, then call
syncfs.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-16 12:35               ` Miklos Szeredi
@ 2021-11-17  6:11                 ` Chengguang Xu
  2021-11-18  6:32                   ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-11-17  6:11 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

 ---- 在 星期二, 2021-11-16 20:35:55 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > On Tue, 16 Nov 2021 at 03:20, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > >  > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
 > >  > >  > won't start write back for dirty pages.   Maybe I'm missing something,
 > >  > >  > but there it looks as if nothing will actually trigger writeback for
 > >  > >  > dirty pages in upper inode.
 > >  > >  >
 > >  > >
 > >  > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
 > >  > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
 > >  >
 > >  > Right.
 > >  >
 > >  > But wouldn't it be simpler to do this from ->write_inode()?
 > >  >
 > >  > I.e. call write_inode_now() as suggested by Jan.
 > >  >
 > >  > Also could just call mark_inode_dirty() on the overlay inode
 > >  > regardless of the dirty flags on the upper inode since it shouldn't
 > >  > matter and results in simpler logic.
 > >  >
 > >
 > > Hi Miklos,
 > >
 > > Sorry for delayed response for this, I've been busy with another project.
 > >
 > > I agree with your suggesion above and further more how about just mark overlay inode dirty
 > > when it has upper inode? This approach will make marking dirtiness simple enough.
 > 
 > Are you suggesting that all non-lower overlay inodes should always be dirty?
 > 
 > The logic would be simple, no doubt, but there's the cost to walking
 > those overlay inodes which don't have a dirty upper inode, right?  

That's true.

 > Can you quantify this cost with a benchmark?  Can be totally synthetic,
 > e.g. lookup a million upper files without modifying them, then call
 > syncfs.
 > 

No problem, I'll do some tests for the performance.

Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-17  6:11                 ` Chengguang Xu
@ 2021-11-18  6:32                   ` Chengguang Xu
  2021-11-18 11:23                     ` Jan Kara
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-11-18  6:32 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel


 ---- 在 星期三, 2021-11-17 14:11:29 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 >  ---- 在 星期二, 2021-11-16 20:35:55 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 >  > On Tue, 16 Nov 2021 at 03:20, Chengguang Xu <cgxu519@mykernel.net> wrote:
 >  > >
 >  > >  ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 >  > >  > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
 >  > >  > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
 >  > >  > >  > won't start write back for dirty pages.   Maybe I'm missing something,
 >  > >  > >  > but there it looks as if nothing will actually trigger writeback for
 >  > >  > >  > dirty pages in upper inode.
 >  > >  > >  >
 >  > >  > >
 >  > >  > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
 >  > >  > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
 >  > >  >
 >  > >  > Right.
 >  > >  >
 >  > >  > But wouldn't it be simpler to do this from ->write_inode()?
 >  > >  >
 >  > >  > I.e. call write_inode_now() as suggested by Jan.
 >  > >  >
 >  > >  > Also could just call mark_inode_dirty() on the overlay inode
 >  > >  > regardless of the dirty flags on the upper inode since it shouldn't
 >  > >  > matter and results in simpler logic.
 >  > >  >
 >  > >
 >  > > Hi Miklos,
 >  > >
 >  > > Sorry for delayed response for this, I've been busy with another project.
 >  > >
 >  > > I agree with your suggesion above and further more how about just mark overlay inode dirty
 >  > > when it has upper inode? This approach will make marking dirtiness simple enough.
 >  > 
 >  > Are you suggesting that all non-lower overlay inodes should always be dirty?
 >  > 
 >  > The logic would be simple, no doubt, but there's the cost to walking
 >  > those overlay inodes which don't have a dirty upper inode, right?  
 > 
 > That's true.
 > 
 >  > Can you quantify this cost with a benchmark?  Can be totally synthetic,
 >  > e.g. lookup a million upper files without modifying them, then call
 >  > syncfs.
 >  > 
 > 
 > No problem, I'll do some tests for the performance.
 > 

Hi Miklos,

I did some rough tests and the results like below.
In practice,  I don't think that 1.3s extra time of syncfs will cause significant problem.
What do you think?



Test bed: kvm vm 
2.50GHz cpu 32core
64GB mem
vm kernel  5.15.0-rc1+ (with ovl syncfs patch V6)

one millon files spread to 2 level of dir hierarchy.
test step:
1) create testfiles in ovl upper dir
2) mount overlayfs
3) excute ls -lR to lookup all file in overlay merge dir
4) excute slabtop to make sure overlay inode number
5) call syncfs to the file in merge dir

Tested five times and the reusults are in 1.310s ~ 1.326s

root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
syncfs success

real    0m1.310s
user    0m0.000s
sys     0m0.001s
[root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
syncfs success

real    0m1.326s
user    0m0.001s
sys     0m0.000s
[root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
syncfs success

real    0m1.321s
user    0m0.000s
sys     0m0.001s
[root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
syncfs success

real    0m1.316s
user    0m0.000s
sys     0m0.001s
[root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
syncfs success

real    0m1.314s
user    0m0.001s
sys     0m0.001s


Directly run syncfs to the file in ovl-upper dir.
Tested five times and the reusults are in 0.001s ~ 0.003s

[root@VM-144-4-centos test]# time ./syncfs a
syncfs success

real    0m0.002s
user    0m0.001s
sys     0m0.000s
[root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
syncfs success

real    0m0.003s
user    0m0.001s
sys     0m0.000s
[root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
syncfs success

real    0m0.001s
user    0m0.000s
sys     0m0.001s
[root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
syncfs success

real    0m0.001s
user    0m0.000s
sys     0m0.001s
[root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
syncfs success

real    0m0.001s
user    0m0.000s
sys     0m0.001s
[root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
syncfs success

real    0m0.001s
user    0m0.000s
sys     0m0.001







^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-18  6:32                   ` Chengguang Xu
@ 2021-11-18 11:23                     ` Jan Kara
  2021-11-18 12:02                       ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Jan Kara @ 2021-11-18 11:23 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Miklos Szeredi, Jan Kara, Amir Goldstein, linux-fsdevel,
	overlayfs, linux-kernel

On Thu 18-11-21 14:32:36, Chengguang Xu wrote:
> 
>  ---- 在 星期三, 2021-11-17 14:11:29 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
>  >  ---- 在 星期二, 2021-11-16 20:35:55 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
>  >  > On Tue, 16 Nov 2021 at 03:20, Chengguang Xu <cgxu519@mykernel.net> wrote:
>  >  > >
>  >  > >  ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
>  >  > >  > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
>  >  > >  > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
>  >  > >  > >  > won't start write back for dirty pages.   Maybe I'm missing something,
>  >  > >  > >  > but there it looks as if nothing will actually trigger writeback for
>  >  > >  > >  > dirty pages in upper inode.
>  >  > >  > >  >
>  >  > >  > >
>  >  > >  > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
>  >  > >  > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
>  >  > >  >
>  >  > >  > Right.
>  >  > >  >
>  >  > >  > But wouldn't it be simpler to do this from ->write_inode()?
>  >  > >  >
>  >  > >  > I.e. call write_inode_now() as suggested by Jan.
>  >  > >  >
>  >  > >  > Also could just call mark_inode_dirty() on the overlay inode
>  >  > >  > regardless of the dirty flags on the upper inode since it shouldn't
>  >  > >  > matter and results in simpler logic.
>  >  > >  >
>  >  > >
>  >  > > Hi Miklos,
>  >  > >
>  >  > > Sorry for delayed response for this, I've been busy with another project.
>  >  > >
>  >  > > I agree with your suggesion above and further more how about just mark overlay inode dirty
>  >  > > when it has upper inode? This approach will make marking dirtiness simple enough.
>  >  > 
>  >  > Are you suggesting that all non-lower overlay inodes should always be dirty?
>  >  > 
>  >  > The logic would be simple, no doubt, but there's the cost to walking
>  >  > those overlay inodes which don't have a dirty upper inode, right?  
>  > 
>  > That's true.
>  > 
>  >  > Can you quantify this cost with a benchmark?  Can be totally synthetic,
>  >  > e.g. lookup a million upper files without modifying them, then call
>  >  > syncfs.
>  >  > 
>  > 
>  > No problem, I'll do some tests for the performance.
>  > 
> 
> Hi Miklos,
> 
> I did some rough tests and the results like below.  In practice,  I don't
> think that 1.3s extra time of syncfs will cause significant problem.
> What do you think?

Well, burning 1.3s worth of CPU time for doing nothing seems like quite a
bit to me. I understand this is with 1000000 inodes but although that is
quite a few it is not unheard of. If there would be several containers
calling sync_fs(2) on the machine they could easily hog the machine... That
is why I was originally against keeping overlay inodes always dirty and
wanted their dirtiness to at least roughly track the real need to do
writeback.

								Honza

> Test bed: kvm vm 
> 2.50GHz cpu 32core
> 64GB mem
> vm kernel  5.15.0-rc1+ (with ovl syncfs patch V6)
> 
> one millon files spread to 2 level of dir hierarchy.
> test step:
> 1) create testfiles in ovl upper dir
> 2) mount overlayfs
> 3) excute ls -lR to lookup all file in overlay merge dir
> 4) excute slabtop to make sure overlay inode number
> 5) call syncfs to the file in merge dir
> 
> Tested five times and the reusults are in 1.310s ~ 1.326s
> 
> root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
> syncfs success
> 
> real    0m1.310s
> user    0m0.000s
> sys     0m0.001s
> [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
> syncfs success
> 
> real    0m1.326s
> user    0m0.001s
> sys     0m0.000s
> [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
> syncfs success
> 
> real    0m1.321s
> user    0m0.000s
> sys     0m0.001s
> [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
> syncfs success
> 
> real    0m1.316s
> user    0m0.000s
> sys     0m0.001s
> [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
> syncfs success
> 
> real    0m1.314s
> user    0m0.001s
> sys     0m0.001s
> 
> 
> Directly run syncfs to the file in ovl-upper dir.
> Tested five times and the reusults are in 0.001s ~ 0.003s
> 
> [root@VM-144-4-centos test]# time ./syncfs a
> syncfs success
> 
> real    0m0.002s
> user    0m0.001s
> sys     0m0.000s
> [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
> syncfs success
> 
> real    0m0.003s
> user    0m0.001s
> sys     0m0.000s
> [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
> syncfs success
> 
> real    0m0.001s
> user    0m0.000s
> sys     0m0.001s
> [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
> syncfs success
> 
> real    0m0.001s
> user    0m0.000s
> sys     0m0.001s
> [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
> syncfs success
> 
> real    0m0.001s
> user    0m0.000s
> sys     0m0.001s
> [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
> syncfs success
> 
> real    0m0.001s
> user    0m0.000s
> sys     0m0.001
> 
> 
> 
> 
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-18 11:23                     ` Jan Kara
@ 2021-11-18 12:02                       ` Chengguang Xu
  2021-11-18 16:43                         ` Jan Kara
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-11-18 12:02 UTC (permalink / raw)
  To: Jan Kara
  Cc: Miklos Szeredi, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

 ---- 在 星期四, 2021-11-18 19:23:15 Jan Kara <jack@suse.cz> 撰写 ----
 > On Thu 18-11-21 14:32:36, Chengguang Xu wrote:
 > > 
 > >  ---- 在 星期三, 2021-11-17 14:11:29 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > >  >  ---- 在 星期二, 2021-11-16 20:35:55 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > >  >  > On Tue, 16 Nov 2021 at 03:20, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  >  > >
 > >  >  > >  ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > >  >  > >  > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  >  > >  > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
 > >  >  > >  > >  > won't start write back for dirty pages.   Maybe I'm missing something,
 > >  >  > >  > >  > but there it looks as if nothing will actually trigger writeback for
 > >  >  > >  > >  > dirty pages in upper inode.
 > >  >  > >  > >  >
 > >  >  > >  > >
 > >  >  > >  > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
 > >  >  > >  > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
 > >  >  > >  >
 > >  >  > >  > Right.
 > >  >  > >  >
 > >  >  > >  > But wouldn't it be simpler to do this from ->write_inode()?
 > >  >  > >  >
 > >  >  > >  > I.e. call write_inode_now() as suggested by Jan.
 > >  >  > >  >
 > >  >  > >  > Also could just call mark_inode_dirty() on the overlay inode
 > >  >  > >  > regardless of the dirty flags on the upper inode since it shouldn't
 > >  >  > >  > matter and results in simpler logic.
 > >  >  > >  >
 > >  >  > >
 > >  >  > > Hi Miklos,
 > >  >  > >
 > >  >  > > Sorry for delayed response for this, I've been busy with another project.
 > >  >  > >
 > >  >  > > I agree with your suggesion above and further more how about just mark overlay inode dirty
 > >  >  > > when it has upper inode? This approach will make marking dirtiness simple enough.
 > >  >  > 
 > >  >  > Are you suggesting that all non-lower overlay inodes should always be dirty?
 > >  >  > 
 > >  >  > The logic would be simple, no doubt, but there's the cost to walking
 > >  >  > those overlay inodes which don't have a dirty upper inode, right?  
 > >  > 
 > >  > That's true.
 > >  > 
 > >  >  > Can you quantify this cost with a benchmark?  Can be totally synthetic,
 > >  >  > e.g. lookup a million upper files without modifying them, then call
 > >  >  > syncfs.
 > >  >  > 
 > >  > 
 > >  > No problem, I'll do some tests for the performance.
 > >  > 
 > > 
 > > Hi Miklos,
 > > 
 > > I did some rough tests and the results like below.  In practice,  I don't
 > > think that 1.3s extra time of syncfs will cause significant problem.
 > > What do you think?
 > 
 > Well, burning 1.3s worth of CPU time for doing nothing seems like quite a
 > bit to me. I understand this is with 1000000 inodes but although that is
 > quite a few it is not unheard of. If there would be several containers
 > calling sync_fs(2) on the machine they could easily hog the machine... That
 > is why I was originally against keeping overlay inodes always dirty and
 > wanted their dirtiness to at least roughly track the real need to do
 > writeback.
 > 

Hi Jan,

Actually, the time on user and sys are almost same with directly excute syncfs on underlying fs.
IMO, it only extends syncfs(2) waiting time for perticular container but not burning cpu.
What am I missing?


Thanks,
Chengguang


 > 
 > > Test bed: kvm vm 
 > > 2.50GHz cpu 32core
 > > 64GB mem
 > > vm kernel  5.15.0-rc1+ (with ovl syncfs patch V6)
 > > 
 > > one millon files spread to 2 level of dir hierarchy.
 > > test step:
 > > 1) create testfiles in ovl upper dir
 > > 2) mount overlayfs
 > > 3) excute ls -lR to lookup all file in overlay merge dir
 > > 4) excute slabtop to make sure overlay inode number
 > > 5) call syncfs to the file in merge dir
 > > 
 > > Tested five times and the reusults are in 1.310s ~ 1.326s
 > > 
 > > root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
 > > syncfs success
 > > 
 > > real    0m1.310s
 > > user    0m0.000s
 > > sys     0m0.001s
 > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
 > > syncfs success
 > > 
 > > real    0m1.326s
 > > user    0m0.001s
 > > sys     0m0.000s
 > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
 > > syncfs success
 > > 
 > > real    0m1.321s
 > > user    0m0.000s
 > > sys     0m0.001s
 > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
 > > syncfs success
 > > 
 > > real    0m1.316s
 > > user    0m0.000s
 > > sys     0m0.001s
 > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
 > > syncfs success
 > > 
 > > real    0m1.314s
 > > user    0m0.001s
 > > sys     0m0.001s
 > > 
 > > 
 > > Directly run syncfs to the file in ovl-upper dir.
 > > Tested five times and the reusults are in 0.001s ~ 0.003s
 > > 
 > > [root@VM-144-4-centos test]# time ./syncfs a
 > > syncfs success
 > > 
 > > real    0m0.002s
 > > user    0m0.001s
 > > sys     0m0.000s
 > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
 > > syncfs success
 > > 
 > > real    0m0.003s
 > > user    0m0.001s
 > > sys     0m0.000s
 > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
 > > syncfs success
 > > 
 > > real    0m0.001s
 > > user    0m0.000s
 > > sys     0m0.001s
 > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
 > > syncfs success
 > > 
 > > real    0m0.001s
 > > user    0m0.000s
 > > sys     0m0.001s
 > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
 > > syncfs success
 > > 
 > > real    0m0.001s
 > > user    0m0.000s
 > > sys     0m0.001s
 > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
 > > syncfs success
 > > 
 > > real    0m0.001s
 > > user    0m0.000s
 > > sys     0m0.001
 > > 
 > > 
 > > 
 > > 
 > > 
 > > 
 > -- 
 > Jan Kara <jack@suse.com>
 > SUSE Labs, CR
 > 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-18 12:02                       ` Chengguang Xu
@ 2021-11-18 16:43                         ` Jan Kara
  2021-11-19  6:12                           ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Jan Kara @ 2021-11-18 16:43 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Miklos Szeredi, Amir Goldstein, linux-fsdevel,
	overlayfs, linux-kernel

On Thu 18-11-21 20:02:09, Chengguang Xu wrote:
>  ---- 在 星期四, 2021-11-18 19:23:15 Jan Kara <jack@suse.cz> 撰写 ----
>  > On Thu 18-11-21 14:32:36, Chengguang Xu wrote:
>  > > 
>  > >  ---- 在 星期三, 2021-11-17 14:11:29 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
>  > >  >  ---- 在 星期二, 2021-11-16 20:35:55 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
>  > >  >  > On Tue, 16 Nov 2021 at 03:20, Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >  >  > >
>  > >  >  > >  ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
>  > >  >  > >  > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >  >  > >  > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
>  > >  >  > >  > >  > won't start write back for dirty pages.   Maybe I'm missing something,
>  > >  >  > >  > >  > but there it looks as if nothing will actually trigger writeback for
>  > >  >  > >  > >  > dirty pages in upper inode.
>  > >  >  > >  > >  >
>  > >  >  > >  > >
>  > >  >  > >  > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
>  > >  >  > >  > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
>  > >  >  > >  >
>  > >  >  > >  > Right.
>  > >  >  > >  >
>  > >  >  > >  > But wouldn't it be simpler to do this from ->write_inode()?
>  > >  >  > >  >
>  > >  >  > >  > I.e. call write_inode_now() as suggested by Jan.
>  > >  >  > >  >
>  > >  >  > >  > Also could just call mark_inode_dirty() on the overlay inode
>  > >  >  > >  > regardless of the dirty flags on the upper inode since it shouldn't
>  > >  >  > >  > matter and results in simpler logic.
>  > >  >  > >  >
>  > >  >  > >
>  > >  >  > > Hi Miklos,
>  > >  >  > >
>  > >  >  > > Sorry for delayed response for this, I've been busy with another project.
>  > >  >  > >
>  > >  >  > > I agree with your suggesion above and further more how about just mark overlay inode dirty
>  > >  >  > > when it has upper inode? This approach will make marking dirtiness simple enough.
>  > >  >  > 
>  > >  >  > Are you suggesting that all non-lower overlay inodes should always be dirty?
>  > >  >  > 
>  > >  >  > The logic would be simple, no doubt, but there's the cost to walking
>  > >  >  > those overlay inodes which don't have a dirty upper inode, right?  
>  > >  > 
>  > >  > That's true.
>  > >  > 
>  > >  >  > Can you quantify this cost with a benchmark?  Can be totally synthetic,
>  > >  >  > e.g. lookup a million upper files without modifying them, then call
>  > >  >  > syncfs.
>  > >  >  > 
>  > >  > 
>  > >  > No problem, I'll do some tests for the performance.
>  > >  > 
>  > > 
>  > > Hi Miklos,
>  > > 
>  > > I did some rough tests and the results like below.  In practice,  I don't
>  > > think that 1.3s extra time of syncfs will cause significant problem.
>  > > What do you think?
>  > 
>  > Well, burning 1.3s worth of CPU time for doing nothing seems like quite a
>  > bit to me. I understand this is with 1000000 inodes but although that is
>  > quite a few it is not unheard of. If there would be several containers
>  > calling sync_fs(2) on the machine they could easily hog the machine... That
>  > is why I was originally against keeping overlay inodes always dirty and
>  > wanted their dirtiness to at least roughly track the real need to do
>  > writeback.
>  > 
> 
> Hi Jan,
> 
> Actually, the time on user and sys are almost same with directly excute syncfs on underlying fs.
> IMO, it only extends syncfs(2) waiting time for perticular container but not burning cpu.
> What am I missing?

Ah, right, I've missed that only realtime changed, not systime. I'm sorry
for confusion. But why did the realtime increase so much? Are we waiting
for some IO?

								Honza

>  > > Test bed: kvm vm 
>  > > 2.50GHz cpu 32core
>  > > 64GB mem
>  > > vm kernel  5.15.0-rc1+ (with ovl syncfs patch V6)
>  > > 
>  > > one millon files spread to 2 level of dir hierarchy.
>  > > test step:
>  > > 1) create testfiles in ovl upper dir
>  > > 2) mount overlayfs
>  > > 3) excute ls -lR to lookup all file in overlay merge dir
>  > > 4) excute slabtop to make sure overlay inode number
>  > > 5) call syncfs to the file in merge dir
>  > > 
>  > > Tested five times and the reusults are in 1.310s ~ 1.326s
>  > > 
>  > > root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
>  > > syncfs success
>  > > 
>  > > real    0m1.310s
>  > > user    0m0.000s
>  > > sys     0m0.001s
>  > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
>  > > syncfs success
>  > > 
>  > > real    0m1.326s
>  > > user    0m0.001s
>  > > sys     0m0.000s
>  > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
>  > > syncfs success
>  > > 
>  > > real    0m1.321s
>  > > user    0m0.000s
>  > > sys     0m0.001s
>  > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
>  > > syncfs success
>  > > 
>  > > real    0m1.316s
>  > > user    0m0.000s
>  > > sys     0m0.001s
>  > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh 
>  > > syncfs success
>  > > 
>  > > real    0m1.314s
>  > > user    0m0.001s
>  > > sys     0m0.001s
>  > > 
>  > > 
>  > > Directly run syncfs to the file in ovl-upper dir.
>  > > Tested five times and the reusults are in 0.001s ~ 0.003s
>  > > 
>  > > [root@VM-144-4-centos test]# time ./syncfs a
>  > > syncfs success
>  > > 
>  > > real    0m0.002s
>  > > user    0m0.001s
>  > > sys     0m0.000s
>  > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
>  > > syncfs success
>  > > 
>  > > real    0m0.003s
>  > > user    0m0.001s
>  > > sys     0m0.000s
>  > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
>  > > syncfs success
>  > > 
>  > > real    0m0.001s
>  > > user    0m0.000s
>  > > sys     0m0.001s
>  > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
>  > > syncfs success
>  > > 
>  > > real    0m0.001s
>  > > user    0m0.000s
>  > > sys     0m0.001s
>  > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
>  > > syncfs success
>  > > 
>  > > real    0m0.001s
>  > > user    0m0.000s
>  > > sys     0m0.001s
>  > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh 
>  > > syncfs success
>  > > 
>  > > real    0m0.001s
>  > > user    0m0.000s
>  > > sys     0m0.001
>  > > 
>  > > 
>  > > 
>  > > 
>  > > 
>  > > 
>  > -- 
>  > Jan Kara <jack@suse.com>
>  > SUSE Labs, CR
>  > 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-18 16:43                         ` Jan Kara
@ 2021-11-19  6:12                           ` Chengguang Xu
  2021-11-30 11:22                             ` Jan Kara
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-11-19  6:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Miklos Szeredi, Amir Goldstein, linux-fsdevel, overlayfs, linux-kernel

 ---- 在 星期五, 2021-11-19 00:43:49 Jan Kara <jack@suse.cz> 撰写 ----
 > On Thu 18-11-21 20:02:09, Chengguang Xu wrote:
 > >  ---- 在 星期四, 2021-11-18 19:23:15 Jan Kara <jack@suse.cz> 撰写 ----
 > >  > On Thu 18-11-21 14:32:36, Chengguang Xu wrote:
 > >  > > 
 > >  > >  ---- 在 星期三, 2021-11-17 14:11:29 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > >  > >  >  ---- 在 星期二, 2021-11-16 20:35:55 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > >  > >  >  > On Tue, 16 Nov 2021 at 03:20, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >  >  > >
 > >  > >  >  > >  ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > >  > >  >  > >  > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >  >  > >  > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
 > >  > >  >  > >  > >  > won't start write back for dirty pages.   Maybe I'm missing something,
 > >  > >  >  > >  > >  > but there it looks as if nothing will actually trigger writeback for
 > >  > >  >  > >  > >  > dirty pages in upper inode.
 > >  > >  >  > >  > >  >
 > >  > >  >  > >  > >
 > >  > >  >  > >  > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
 > >  > >  >  > >  > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
 > >  > >  >  > >  >
 > >  > >  >  > >  > Right.
 > >  > >  >  > >  >
 > >  > >  >  > >  > But wouldn't it be simpler to do this from ->write_inode()?
 > >  > >  >  > >  >
 > >  > >  >  > >  > I.e. call write_inode_now() as suggested by Jan.
 > >  > >  >  > >  >
 > >  > >  >  > >  > Also could just call mark_inode_dirty() on the overlay inode
 > >  > >  >  > >  > regardless of the dirty flags on the upper inode since it shouldn't
 > >  > >  >  > >  > matter and results in simpler logic.
 > >  > >  >  > >  >
 > >  > >  >  > >
 > >  > >  >  > > Hi Miklos,
 > >  > >  >  > >
 > >  > >  >  > > Sorry for delayed response for this, I've been busy with another project.
 > >  > >  >  > >
 > >  > >  >  > > I agree with your suggesion above and further more how about just mark overlay inode dirty
 > >  > >  >  > > when it has upper inode? This approach will make marking dirtiness simple enough.
 > >  > >  >  > 
 > >  > >  >  > Are you suggesting that all non-lower overlay inodes should always be dirty?
 > >  > >  >  > 
 > >  > >  >  > The logic would be simple, no doubt, but there's the cost to walking
 > >  > >  >  > those overlay inodes which don't have a dirty upper inode, right?  
 > >  > >  > 
 > >  > >  > That's true.
 > >  > >  > 
 > >  > >  >  > Can you quantify this cost with a benchmark?  Can be totally synthetic,
 > >  > >  >  > e.g. lookup a million upper files without modifying them, then call
 > >  > >  >  > syncfs.
 > >  > >  >  > 
 > >  > >  > 
 > >  > >  > No problem, I'll do some tests for the performance.
 > >  > >  > 
 > >  > > 
 > >  > > Hi Miklos,
 > >  > > 
 > >  > > I did some rough tests and the results like below.  In practice,  I don't
 > >  > > think that 1.3s extra time of syncfs will cause significant problem.
 > >  > > What do you think?
 > >  > 
 > >  > Well, burning 1.3s worth of CPU time for doing nothing seems like quite a
 > >  > bit to me. I understand this is with 1000000 inodes but although that is
 > >  > quite a few it is not unheard of. If there would be several containers
 > >  > calling sync_fs(2) on the machine they could easily hog the machine... That
 > >  > is why I was originally against keeping overlay inodes always dirty and
 > >  > wanted their dirtiness to at least roughly track the real need to do
 > >  > writeback.
 > >  > 
 > > 
 > > Hi Jan,
 > > 
 > > Actually, the time on user and sys are almost same with directly excute syncfs on underlying fs.
 > > IMO, it only extends syncfs(2) waiting time for perticular container but not burning cpu.
 > > What am I missing?
 > 
 > Ah, right, I've missed that only realtime changed, not systime. I'm sorry
 > for confusion. But why did the realtime increase so much? Are we waiting
 > for some IO?
 > 

There are many places to call cond_resched() in writeback process,
so sycnfs process was scheduled several times.

Thanks,
Chengguang




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-19  6:12                           ` Chengguang Xu
@ 2021-11-30 11:22                             ` Jan Kara
  2021-11-30 16:09                               ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Jan Kara @ 2021-11-30 11:22 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Miklos Szeredi, Amir Goldstein, linux-fsdevel,
	overlayfs, linux-kernel

On Fri 19-11-21 14:12:46, Chengguang Xu wrote:
>  ---- 在 星期五, 2021-11-19 00:43:49 Jan Kara <jack@suse.cz> 撰写 ----
>  > On Thu 18-11-21 20:02:09, Chengguang Xu wrote:
>  > >  ---- 在 星期四, 2021-11-18 19:23:15 Jan Kara <jack@suse.cz> 撰写 ----
>  > >  > On Thu 18-11-21 14:32:36, Chengguang Xu wrote:
>  > >  > > 
>  > >  > >  ---- 在 星期三, 2021-11-17 14:11:29 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
>  > >  > >  >  ---- 在 星期二, 2021-11-16 20:35:55 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
>  > >  > >  >  > On Tue, 16 Nov 2021 at 03:20, Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >  > >  >  > >
>  > >  > >  >  > >  ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
>  > >  > >  >  > >  > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >  > >  >  > >  > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
>  > >  > >  >  > >  > >  > won't start write back for dirty pages.   Maybe I'm missing something,
>  > >  > >  >  > >  > >  > but there it looks as if nothing will actually trigger writeback for
>  > >  > >  >  > >  > >  > dirty pages in upper inode.
>  > >  > >  >  > >  > >  >
>  > >  > >  >  > >  > >
>  > >  > >  >  > >  > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
>  > >  > >  >  > >  > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
>  > >  > >  >  > >  >
>  > >  > >  >  > >  > Right.
>  > >  > >  >  > >  >
>  > >  > >  >  > >  > But wouldn't it be simpler to do this from ->write_inode()?
>  > >  > >  >  > >  >
>  > >  > >  >  > >  > I.e. call write_inode_now() as suggested by Jan.
>  > >  > >  >  > >  >
>  > >  > >  >  > >  > Also could just call mark_inode_dirty() on the overlay inode
>  > >  > >  >  > >  > regardless of the dirty flags on the upper inode since it shouldn't
>  > >  > >  >  > >  > matter and results in simpler logic.
>  > >  > >  >  > >  >
>  > >  > >  >  > >
>  > >  > >  >  > > Hi Miklos,
>  > >  > >  >  > >
>  > >  > >  >  > > Sorry for delayed response for this, I've been busy with another project.
>  > >  > >  >  > >
>  > >  > >  >  > > I agree with your suggesion above and further more how about just mark overlay inode dirty
>  > >  > >  >  > > when it has upper inode? This approach will make marking dirtiness simple enough.
>  > >  > >  >  > 
>  > >  > >  >  > Are you suggesting that all non-lower overlay inodes should always be dirty?
>  > >  > >  >  > 
>  > >  > >  >  > The logic would be simple, no doubt, but there's the cost to walking
>  > >  > >  >  > those overlay inodes which don't have a dirty upper inode, right?  
>  > >  > >  > 
>  > >  > >  > That's true.
>  > >  > >  > 
>  > >  > >  >  > Can you quantify this cost with a benchmark?  Can be totally synthetic,
>  > >  > >  >  > e.g. lookup a million upper files without modifying them, then call
>  > >  > >  >  > syncfs.
>  > >  > >  >  > 
>  > >  > >  > 
>  > >  > >  > No problem, I'll do some tests for the performance.
>  > >  > >  > 
>  > >  > > 
>  > >  > > Hi Miklos,
>  > >  > > 
>  > >  > > I did some rough tests and the results like below.  In practice,  I don't
>  > >  > > think that 1.3s extra time of syncfs will cause significant problem.
>  > >  > > What do you think?
>  > >  > 
>  > >  > Well, burning 1.3s worth of CPU time for doing nothing seems like quite a
>  > >  > bit to me. I understand this is with 1000000 inodes but although that is
>  > >  > quite a few it is not unheard of. If there would be several containers
>  > >  > calling sync_fs(2) on the machine they could easily hog the machine... That
>  > >  > is why I was originally against keeping overlay inodes always dirty and
>  > >  > wanted their dirtiness to at least roughly track the real need to do
>  > >  > writeback.
>  > >  > 
>  > > 
>  > > Hi Jan,
>  > > 
>  > > Actually, the time on user and sys are almost same with directly excute syncfs on underlying fs.
>  > > IMO, it only extends syncfs(2) waiting time for perticular container but not burning cpu.
>  > > What am I missing?
>  > 
>  > Ah, right, I've missed that only realtime changed, not systime. I'm sorry
>  > for confusion. But why did the realtime increase so much? Are we waiting
>  > for some IO?
>  > 
> 
> There are many places to call cond_resched() in writeback process,
> so sycnfs process was scheduled several times.

I was thinking about this a bit more and I don't think I buy this
explanation. What I rather think is happening is that real work for syncfs
(writeback_inodes_sb() and sync_inodes_sb() calls) gets offloaded to a flush
worker. E.g. writeback_inodes_sb() ends up calling
__writeback_inodes_sb_nr() which does:

bdi_split_work_to_wbs()
wb_wait_for_completion()

So you don't see the work done in the times accounted to your test
program. But in practice the flush worker is indeed burning 1.3s worth of
CPU to scan the 1 million inode list and do nothing.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-30 11:22                             ` Jan Kara
@ 2021-11-30 16:09                               ` Chengguang Xu
  2021-11-30 19:04                                 ` Amir Goldstein
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-11-30 16:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: Miklos Szeredi, Amir Goldstein, linux-fsdevel, overlayfs,
	linux-kernel, ronyjin, charliecgxu


 ---- 在 星期二, 2021-11-30 19:22:06 Jan Kara <jack@suse.cz> 撰写 ----
 > On Fri 19-11-21 14:12:46, Chengguang Xu wrote:
 > >  ---- 在 星期五, 2021-11-19 00:43:49 Jan Kara <jack@suse.cz> 撰写 ----
 > >  > On Thu 18-11-21 20:02:09, Chengguang Xu wrote:
 > >  > >  ---- 在 星期四, 2021-11-18 19:23:15 Jan Kara <jack@suse.cz> 撰写 ----
 > >  > >  > On Thu 18-11-21 14:32:36, Chengguang Xu wrote:
 > >  > >  > > 
 > >  > >  > >  ---- 在 星期三, 2021-11-17 14:11:29 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > >  > >  > >  >  ---- 在 星期二, 2021-11-16 20:35:55 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > >  > >  > >  >  > On Tue, 16 Nov 2021 at 03:20, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >  > >  >  > >
 > >  > >  > >  >  > >  ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi <miklos@szeredi.hu> 撰写 ----
 > >  > >  > >  >  > >  > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >  > >  >  > >  > >  > However that wasn't what I was asking about.  AFAICS ->write_inode()
 > >  > >  > >  >  > >  > >  > won't start write back for dirty pages.   Maybe I'm missing something,
 > >  > >  > >  >  > >  > >  > but there it looks as if nothing will actually trigger writeback for
 > >  > >  > >  >  > >  > >  > dirty pages in upper inode.
 > >  > >  > >  >  > >  > >  >
 > >  > >  > >  >  > >  > >
 > >  > >  > >  >  > >  > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and
 > >  > >  > >  >  > >  > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes).
 > >  > >  > >  >  > >  >
 > >  > >  > >  >  > >  > Right.
 > >  > >  > >  >  > >  >
 > >  > >  > >  >  > >  > But wouldn't it be simpler to do this from ->write_inode()?
 > >  > >  > >  >  > >  >
 > >  > >  > >  >  > >  > I.e. call write_inode_now() as suggested by Jan.
 > >  > >  > >  >  > >  >
 > >  > >  > >  >  > >  > Also could just call mark_inode_dirty() on the overlay inode
 > >  > >  > >  >  > >  > regardless of the dirty flags on the upper inode since it shouldn't
 > >  > >  > >  >  > >  > matter and results in simpler logic.
 > >  > >  > >  >  > >  >
 > >  > >  > >  >  > >
 > >  > >  > >  >  > > Hi Miklos,
 > >  > >  > >  >  > >
 > >  > >  > >  >  > > Sorry for delayed response for this, I've been busy with another project.
 > >  > >  > >  >  > >
 > >  > >  > >  >  > > I agree with your suggesion above and further more how about just mark overlay inode dirty
 > >  > >  > >  >  > > when it has upper inode? This approach will make marking dirtiness simple enough.
 > >  > >  > >  >  > 
 > >  > >  > >  >  > Are you suggesting that all non-lower overlay inodes should always be dirty?
 > >  > >  > >  >  > 
 > >  > >  > >  >  > The logic would be simple, no doubt, but there's the cost to walking
 > >  > >  > >  >  > those overlay inodes which don't have a dirty upper inode, right?  
 > >  > >  > >  > 
 > >  > >  > >  > That's true.
 > >  > >  > >  > 
 > >  > >  > >  >  > Can you quantify this cost with a benchmark?  Can be totally synthetic,
 > >  > >  > >  >  > e.g. lookup a million upper files without modifying them, then call
 > >  > >  > >  >  > syncfs.
 > >  > >  > >  >  > 
 > >  > >  > >  > 
 > >  > >  > >  > No problem, I'll do some tests for the performance.
 > >  > >  > >  > 
 > >  > >  > > 
 > >  > >  > > Hi Miklos,
 > >  > >  > > 
 > >  > >  > > I did some rough tests and the results like below.  In practice,  I don't
 > >  > >  > > think that 1.3s extra time of syncfs will cause significant problem.
 > >  > >  > > What do you think?
 > >  > >  > 
 > >  > >  > Well, burning 1.3s worth of CPU time for doing nothing seems like quite a
 > >  > >  > bit to me. I understand this is with 1000000 inodes but although that is
 > >  > >  > quite a few it is not unheard of. If there would be several containers
 > >  > >  > calling sync_fs(2) on the machine they could easily hog the machine... That
 > >  > >  > is why I was originally against keeping overlay inodes always dirty and
 > >  > >  > wanted their dirtiness to at least roughly track the real need to do
 > >  > >  > writeback.
 > >  > >  > 
 > >  > > 
 > >  > > Hi Jan,
 > >  > > 
 > >  > > Actually, the time on user and sys are almost same with directly excute syncfs on underlying fs.
 > >  > > IMO, it only extends syncfs(2) waiting time for perticular container but not burning cpu.
 > >  > > What am I missing?
 > >  > 
 > >  > Ah, right, I've missed that only realtime changed, not systime. I'm sorry
 > >  > for confusion. But why did the realtime increase so much? Are we waiting
 > >  > for some IO?
 > >  > 
 > > 
 > > There are many places to call cond_resched() in writeback process,
 > > so sycnfs process was scheduled several times.
 > 
 > I was thinking about this a bit more and I don't think I buy this
 > explanation. What I rather think is happening is that real work for syncfs
 > (writeback_inodes_sb() and sync_inodes_sb() calls) gets offloaded to a flush
 > worker. E.g. writeback_inodes_sb() ends up calling
 > __writeback_inodes_sb_nr() which does:
 > 
 > bdi_split_work_to_wbs()
 > wb_wait_for_completion()
 > 
 > So you don't see the work done in the times accounted to your test
 > program. But in practice the flush worker is indeed burning 1.3s worth of
 > CPU to scan the 1 million inode list and do nothing.
 > 

That makes sense. However, in real container use case,  the upper dir is always empty,
so I don't think there is meaningful difference compare to accurately marking overlay
inode dirty.  

I'm not very familiar with other use cases of overlayfs except container, should we consider
other use cases? Maybe we can also ignore the cpu burden because those use cases don't
have density deployment like container.



Thanks,
Chengguang




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-30 16:09                               ` Chengguang Xu
@ 2021-11-30 19:04                                 ` Amir Goldstein
  2021-12-01  2:37                                   ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Amir Goldstein @ 2021-11-30 19:04 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Miklos Szeredi, linux-fsdevel, overlayfs, linux-kernel,
	ronyjin, charliecgxu, Vivek Goyal

>  > I was thinking about this a bit more and I don't think I buy this
>  > explanation. What I rather think is happening is that real work for syncfs
>  > (writeback_inodes_sb() and sync_inodes_sb() calls) gets offloaded to a flush
>  > worker. E.g. writeback_inodes_sb() ends up calling
>  > __writeback_inodes_sb_nr() which does:
>  >
>  > bdi_split_work_to_wbs()
>  > wb_wait_for_completion()
>  >
>  > So you don't see the work done in the times accounted to your test
>  > program. But in practice the flush worker is indeed burning 1.3s worth of
>  > CPU to scan the 1 million inode list and do nothing.
>  >
>
> That makes sense. However, in real container use case,  the upper dir is always empty,
> so I don't think there is meaningful difference compare to accurately marking overlay
> inode dirty.
>

It's true the that is a very common case, but...

> I'm not very familiar with other use cases of overlayfs except container, should we consider
> other use cases? Maybe we can also ignore the cpu burden because those use cases don't
> have density deployment like container.
>

metacopy feature was developed for the use case of a container
that chowns all the files in the lower image.

In that case, which is now also quite common, all the overlay inodes are
upper inodes.

What about only re-mark overlay inode dirty if upper inode is dirty or is
writeably mmapped.
For other cases, it is easy to know when overlay inode becomes dirty?
Didn't you already try this?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-11-30 19:04                                 ` Amir Goldstein
@ 2021-12-01  2:37                                   ` Chengguang Xu
  2021-12-01  6:31                                     ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-12-01  2:37 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Miklos Szeredi, linux-fsdevel, overlayfs, linux-kernel,
	ronyjin, charliecgxu, Vivek Goyal


 ---- 在 星期三, 2021-12-01 03:04:59 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > I was thinking about this a bit more and I don't think I buy this
 > >  > explanation. What I rather think is happening is that real work for syncfs
 > >  > (writeback_inodes_sb() and sync_inodes_sb() calls) gets offloaded to a flush
 > >  > worker. E.g. writeback_inodes_sb() ends up calling
 > >  > __writeback_inodes_sb_nr() which does:
 > >  >
 > >  > bdi_split_work_to_wbs()
 > >  > wb_wait_for_completion()
 > >  >
 > >  > So you don't see the work done in the times accounted to your test
 > >  > program. But in practice the flush worker is indeed burning 1.3s worth of
 > >  > CPU to scan the 1 million inode list and do nothing.
 > >  >
 > >
 > > That makes sense. However, in real container use case,  the upper dir is always empty,
 > > so I don't think there is meaningful difference compare to accurately marking overlay
 > > inode dirty.
 > >
 > 
 > It's true the that is a very common case, but...
 > 
 > > I'm not very familiar with other use cases of overlayfs except container, should we consider
 > > other use cases? Maybe we can also ignore the cpu burden because those use cases don't
 > > have density deployment like container.
 > >
 > 
 > metacopy feature was developed for the use case of a container
 > that chowns all the files in the lower image.
 > 
 > In that case, which is now also quite common, all the overlay inodes are
 > upper inodes.
 > 

Regardless of metacopy or datacopy, that copy-up has already modified overlay inode
so initialy marking dirty to all overlay inodes which have upper inode will not be a serious
problem in this case too, right?

I guess maybe you more concern about the re-mark dirtiness on above use case.



 > What about only re-mark overlay inode dirty if upper inode is dirty or is
 > writeably mmapped.
 > For other cases, it is easy to know when overlay inode becomes dirty?
 > Didn't you already try this?
 > 

Yes, I've tried that approach in previous version but as Miklos pointed out in the
feedback there are a few of racy conditions.



Thanks,
Chengguang





^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-12-01  2:37                                   ` Chengguang Xu
@ 2021-12-01  6:31                                     ` Chengguang Xu
  2021-12-01  7:19                                       ` Amir Goldstein
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-12-01  6:31 UTC (permalink / raw)
  To: Amir Goldstein, Jan Kara, Miklos Szeredi
  Cc: linux-fsdevel, overlayfs, linux-kernel, ronyjin, charliecgxu,
	Vivek Goyal


 ---- 在 星期三, 2021-12-01 10:37:15 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > 
 >  ---- 在 星期三, 2021-12-01 03:04:59 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 >  > >  > I was thinking about this a bit more and I don't think I buy this
 >  > >  > explanation. What I rather think is happening is that real work for syncfs
 >  > >  > (writeback_inodes_sb() and sync_inodes_sb() calls) gets offloaded to a flush
 >  > >  > worker. E.g. writeback_inodes_sb() ends up calling
 >  > >  > __writeback_inodes_sb_nr() which does:
 >  > >  >
 >  > >  > bdi_split_work_to_wbs()
 >  > >  > wb_wait_for_completion()
 >  > >  >
 >  > >  > So you don't see the work done in the times accounted to your test
 >  > >  > program. But in practice the flush worker is indeed burning 1.3s worth of
 >  > >  > CPU to scan the 1 million inode list and do nothing.
 >  > >  >
 >  > >
 >  > > That makes sense. However, in real container use case,  the upper dir is always empty,
 >  > > so I don't think there is meaningful difference compare to accurately marking overlay
 >  > > inode dirty.
 >  > >
 >  > 
 >  > It's true the that is a very common case, but...
 >  > 
 >  > > I'm not very familiar with other use cases of overlayfs except container, should we consider
 >  > > other use cases? Maybe we can also ignore the cpu burden because those use cases don't
 >  > > have density deployment like container.
 >  > >
 >  > 
 >  > metacopy feature was developed for the use case of a container
 >  > that chowns all the files in the lower image.
 >  > 
 >  > In that case, which is now also quite common, all the overlay inodes are
 >  > upper inodes.
 >  > 
 > 
 > Regardless of metacopy or datacopy, that copy-up has already modified overlay inode
 > so initialy marking dirty to all overlay inodes which have upper inode will not be a serious
 > problem in this case too, right?
 > 
 > I guess maybe you more concern about the re-mark dirtiness on above use case.
 > 
 > 
 > 
 >  > What about only re-mark overlay inode dirty if upper inode is dirty or is
 >  > writeably mmapped.
 >  > For other cases, it is easy to know when overlay inode becomes dirty?
 >  > Didn't you already try this?
 >  > 
 > 
 > Yes, I've tried that approach in previous version but as Miklos pointed out in the
 > feedback there are a few of racy conditions.
 > 

So the final solution to handle all the concerns looks like accurately mark overlay inode
diry on modification and re-mark dirty only for mmaped file in ->write_inode().

Hi Miklos, Jan

Will you agree with new proposal above?



Thanks,
Chengguang
































^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-12-01  6:31                                     ` Chengguang Xu
@ 2021-12-01  7:19                                       ` Amir Goldstein
  2021-12-01 13:46                                         ` Jan Kara
  0 siblings, 1 reply; 61+ messages in thread
From: Amir Goldstein @ 2021-12-01  7:19 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Miklos Szeredi, linux-fsdevel, overlayfs, linux-kernel,
	ronyjin, charliecgxu, Vivek Goyal

On Wed, Dec 1, 2021 at 8:31 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>
>  ---- 在 星期三, 2021-12-01 10:37:15 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
>  >
>  >  ---- 在 星期三, 2021-12-01 03:04:59 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  >  > >  > I was thinking about this a bit more and I don't think I buy this
>  >  > >  > explanation. What I rather think is happening is that real work for syncfs
>  >  > >  > (writeback_inodes_sb() and sync_inodes_sb() calls) gets offloaded to a flush
>  >  > >  > worker. E.g. writeback_inodes_sb() ends up calling
>  >  > >  > __writeback_inodes_sb_nr() which does:
>  >  > >  >
>  >  > >  > bdi_split_work_to_wbs()
>  >  > >  > wb_wait_for_completion()
>  >  > >  >
>  >  > >  > So you don't see the work done in the times accounted to your test
>  >  > >  > program. But in practice the flush worker is indeed burning 1.3s worth of
>  >  > >  > CPU to scan the 1 million inode list and do nothing.
>  >  > >  >
>  >  > >
>  >  > > That makes sense. However, in real container use case,  the upper dir is always empty,
>  >  > > so I don't think there is meaningful difference compare to accurately marking overlay
>  >  > > inode dirty.
>  >  > >
>  >  >
>  >  > It's true the that is a very common case, but...
>  >  >
>  >  > > I'm not very familiar with other use cases of overlayfs except container, should we consider
>  >  > > other use cases? Maybe we can also ignore the cpu burden because those use cases don't
>  >  > > have density deployment like container.
>  >  > >
>  >  >
>  >  > metacopy feature was developed for the use case of a container
>  >  > that chowns all the files in the lower image.
>  >  >
>  >  > In that case, which is now also quite common, all the overlay inodes are
>  >  > upper inodes.
>  >  >
>  >
>  > Regardless of metacopy or datacopy, that copy-up has already modified overlay inode
>  > so initialy marking dirty to all overlay inodes which have upper inode will not be a serious
>  > problem in this case too, right?
>  >
>  > I guess maybe you more concern about the re-mark dirtiness on above use case.
>  >
>  >
>  >
>  >  > What about only re-mark overlay inode dirty if upper inode is dirty or is
>  >  > writeably mmapped.
>  >  > For other cases, it is easy to know when overlay inode becomes dirty?
>  >  > Didn't you already try this?
>  >  >
>  >
>  > Yes, I've tried that approach in previous version but as Miklos pointed out in the
>  > feedback there are a few of racy conditions.
>  >

Right..

>
> So the final solution to handle all the concerns looks like accurately mark overlay inode
> diry on modification and re-mark dirty only for mmaped file in ->write_inode().
>
> Hi Miklos, Jan
>
> Will you agree with new proposal above?
>

Maybe you can still pull off a simpler version by remarking dirty only
writably mmapped upper AND inode_is_open_for_write(upper)?

If I am not mistaken, if you always mark overlay inode dirty on ovl_flush()
of FMODE_WRITE file, there is nothing that can make upper inode dirty
after last close (if upper is not mmaped), so one more inode sync should
be enough. No?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-12-01  7:19                                       ` Amir Goldstein
@ 2021-12-01 13:46                                         ` Jan Kara
  2021-12-01 14:59                                           ` Chengguang Xu
  2021-12-01 16:24                                           ` Chengguang Xu
  0 siblings, 2 replies; 61+ messages in thread
From: Jan Kara @ 2021-12-01 13:46 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Chengguang Xu, Jan Kara, Miklos Szeredi, linux-fsdevel,
	overlayfs, linux-kernel, ronyjin, charliecgxu, Vivek Goyal

On Wed 01-12-21 09:19:17, Amir Goldstein wrote:
> On Wed, Dec 1, 2021 at 8:31 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
> > So the final solution to handle all the concerns looks like accurately
> > mark overlay inode diry on modification and re-mark dirty only for
> > mmaped file in ->write_inode().
> >
> > Hi Miklos, Jan
> >
> > Will you agree with new proposal above?
> >
> 
> Maybe you can still pull off a simpler version by remarking dirty only
> writably mmapped upper AND inode_is_open_for_write(upper)?

Well, if inode is writeably mapped, it must be also open for write, doesn't
it? The VMA of the mapping will hold file open. So remarking overlay inode
dirty during writeback while inode_is_open_for_write(upper) looks like
reasonably easy and presumably there won't be that many inodes open for
writing for this to become big overhead?

> If I am not mistaken, if you always mark overlay inode dirty on ovl_flush()
> of FMODE_WRITE file, there is nothing that can make upper inode dirty
> after last close (if upper is not mmaped), so one more inode sync should
> be enough. No?

But we still need to catch other dirtying events like timestamp updates,
truncate(2) etc. to mark overlay inode dirty. Not sure how reliably that
can be done...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-12-01 13:46                                         ` Jan Kara
@ 2021-12-01 14:59                                           ` Chengguang Xu
  2021-12-01 16:24                                           ` Chengguang Xu
  1 sibling, 0 replies; 61+ messages in thread
From: Chengguang Xu @ 2021-12-01 14:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: Amir Goldstein, Miklos Szeredi, linux-fsdevel, overlayfs,
	linux-kernel, ronyjin, charliecgxu, Vivek Goyal

 ---- 在 星期三, 2021-12-01 21:46:10 Jan Kara <jack@suse.cz> 撰写 ----
 > On Wed 01-12-21 09:19:17, Amir Goldstein wrote:
 > > On Wed, Dec 1, 2021 at 8:31 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > > > So the final solution to handle all the concerns looks like accurately
 > > > mark overlay inode diry on modification and re-mark dirty only for
 > > > mmaped file in ->write_inode().
 > > >
 > > > Hi Miklos, Jan
 > > >
 > > > Will you agree with new proposal above?
 > > >
 > > 
 > > Maybe you can still pull off a simpler version by remarking dirty only
 > > writably mmapped upper AND inode_is_open_for_write(upper)?
 > 
 > Well, if inode is writeably mapped, it must be also open for write, doesn't
 > it? 

That's right.


 > The VMA of the mapping will hold file open. 

It's a bit tricky but currently ovl_mmap() will replace file to realfile in upper layer
and release overlayfs file. So overlayfs file itself will not have any relationship with
the VMA anymore after mmap().


Thanks,
Chengguang


 > So remarking overlay inode
 > dirty during writeback while inode_is_open_for_write(upper) looks like
 > reasonably easy and presumably there won't be that many inodes open for
 > writing for this to become big overhead?
 > 
 > > If I am not mistaken, if you always mark overlay inode dirty on ovl_flush()
 > > of FMODE_WRITE file, there is nothing that can make upper inode dirty
 > > after last close (if upper is not mmaped), so one more inode sync should
 > > be enough. No?
 > 
 > But we still need to catch other dirtying events like timestamp updates,
 > truncate(2) etc. to mark overlay inode dirty. Not sure how reliably that
 > can be done...
 > 
 >                                 Honza
 > -- 
 > Jan Kara <jack@suse.com>
 > SUSE Labs, CR
 > 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-12-01 13:46                                         ` Jan Kara
  2021-12-01 14:59                                           ` Chengguang Xu
@ 2021-12-01 16:24                                           ` Chengguang Xu
  2021-12-01 22:47                                             ` Amir Goldstein
  1 sibling, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-12-01 16:24 UTC (permalink / raw)
  To: Jan Kara, Amir Goldstein, Miklos Szeredi
  Cc: linux-fsdevel, overlayfs, linux-kernel, ronyjin, charliecgxu,
	Vivek Goyal

 ---- 在 星期三, 2021-12-01 21:46:10 Jan Kara <jack@suse.cz> 撰写 ----
 > On Wed 01-12-21 09:19:17, Amir Goldstein wrote:
 > > On Wed, Dec 1, 2021 at 8:31 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > > > So the final solution to handle all the concerns looks like accurately
 > > > mark overlay inode diry on modification and re-mark dirty only for
 > > > mmaped file in ->write_inode().
 > > >
 > > > Hi Miklos, Jan
 > > >
 > > > Will you agree with new proposal above?
 > > >
 > > 
 > > Maybe you can still pull off a simpler version by remarking dirty only
 > > writably mmapped upper AND inode_is_open_for_write(upper)?
 > 
 > Well, if inode is writeably mapped, it must be also open for write, doesn't
 > it? The VMA of the mapping will hold file open. So remarking overlay inode
 > dirty during writeback while inode_is_open_for_write(upper) looks like
 > reasonably easy and presumably there won't be that many inodes open for
 > writing for this to become big overhead?
 > 
 > > If I am not mistaken, if you always mark overlay inode dirty on ovl_flush()
 > > of FMODE_WRITE file, there is nothing that can make upper inode dirty
 > > after last close (if upper is not mmaped), so one more inode sync should
 > > be enough. No?
 > 
 > But we still need to catch other dirtying events like timestamp updates,
 > truncate(2) etc. to mark overlay inode dirty. Not sure how reliably that
 > can be done...
 > 

To be honest I even don't fully understand what's the ->flush() logic in overlayfs.
Why should we open new underlying file when calling ->flush()?
Is it still correct in the case of opening lower layer first then copy-uped case? 


Thanks,
Chengguang






  

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-12-01 16:24                                           ` Chengguang Xu
@ 2021-12-01 22:47                                             ` Amir Goldstein
  2021-12-01 23:23                                               ` ovl_flush() behavior Amir Goldstein
  2021-12-05 14:06                                               ` [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
  0 siblings, 2 replies; 61+ messages in thread
From: Amir Goldstein @ 2021-12-01 22:47 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Miklos Szeredi, linux-fsdevel, overlayfs, linux-kernel,
	ronyjin, charliecgxu, Vivek Goyal

On Wed, Dec 1, 2021 at 6:24 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期三, 2021-12-01 21:46:10 Jan Kara <jack@suse.cz> 撰写 ----
>  > On Wed 01-12-21 09:19:17, Amir Goldstein wrote:
>  > > On Wed, Dec 1, 2021 at 8:31 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > > > So the final solution to handle all the concerns looks like accurately
>  > > > mark overlay inode diry on modification and re-mark dirty only for
>  > > > mmaped file in ->write_inode().
>  > > >
>  > > > Hi Miklos, Jan
>  > > >
>  > > > Will you agree with new proposal above?
>  > > >
>  > >
>  > > Maybe you can still pull off a simpler version by remarking dirty only
>  > > writably mmapped upper AND inode_is_open_for_write(upper)?
>  >
>  > Well, if inode is writeably mapped, it must be also open for write, doesn't
>  > it? The VMA of the mapping will hold file open. So remarking overlay inode
>  > dirty during writeback while inode_is_open_for_write(upper) looks like
>  > reasonably easy and presumably there won't be that many inodes open for
>  > writing for this to become big overhead?

I think it should be ok and a good tradeoff of complexity vs. performance.

>  >
>  > > If I am not mistaken, if you always mark overlay inode dirty on ovl_flush()
>  > > of FMODE_WRITE file, there is nothing that can make upper inode dirty
>  > > after last close (if upper is not mmaped), so one more inode sync should
>  > > be enough. No?
>  >
>  > But we still need to catch other dirtying events like timestamp updates,
>  > truncate(2) etc. to mark overlay inode dirty. Not sure how reliably that
>  > can be done...
>  >

Oh yeh, we have those as well :)
All those cases should be covered by ovl_copyattr() that updates the
ovl inode ctime/mtime, so always dirty in ovl_copyattr() should be good.
I *think* the only case of ovl_copyattr() that should not dirty is in
ovl_inode_init(), so need some special helper there.

>
> To be honest I even don't fully understand what's the ->flush() logic in overlayfs.
> Why should we open new underlying file when calling ->flush()?
> Is it still correct in the case of opening lower layer first then copy-uped case?
>

The semantics of flush() are far from being uniform across filesystems.
most local filesystems do nothing on close.
most network fs only flush dirty data when a writer closes a file
but not when a reader closes a file.
It is hard to imagine that applications rely on flush-on-close of
rdonly fd behavior and I agree that flushing only if original fd was upper
makes more sense, so I am not sure if it is really essential for
overlayfs to open an upper rdonly fd just to do whatever the upper fs
would have done on close of rdonly fd, but maybe there is no good
reason to change this behavior either.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: ovl_flush() behavior
  2021-12-01 22:47                                             ` Amir Goldstein
@ 2021-12-01 23:23                                               ` Amir Goldstein
  2021-12-02  2:11                                                 ` Chengguang Xu
  2021-12-02 15:14                                                 ` Vivek Goyal
  2021-12-05 14:06                                               ` [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
  1 sibling, 2 replies; 61+ messages in thread
From: Amir Goldstein @ 2021-12-01 23:23 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, Miklos Szeredi, linux-fsdevel, overlayfs, linux-kernel,
	ronyjin, charliecgxu, Vivek Goyal

> >
> > To be honest I even don't fully understand what's the ->flush() logic in overlayfs.
> > Why should we open new underlying file when calling ->flush()?
> > Is it still correct in the case of opening lower layer first then copy-uped case?
> >
>
> The semantics of flush() are far from being uniform across filesystems.
> most local filesystems do nothing on close.
> most network fs only flush dirty data when a writer closes a file
> but not when a reader closes a file.
> It is hard to imagine that applications rely on flush-on-close of
> rdonly fd behavior and I agree that flushing only if original fd was upper
> makes more sense, so I am not sure if it is really essential for
> overlayfs to open an upper rdonly fd just to do whatever the upper fs
> would have done on close of rdonly fd, but maybe there is no good
> reason to change this behavior either.
>

On second thought, I think there may be a good reason to change
ovl_flush() otherwise I wouldn't have submitted commit
a390ccb316be ("fuse: add FOPEN_NOFLUSH") - I did observe
applications that frequently open short lived rdonly fds and suffered
undesired latencies on close().

As for "changing existing behavior", I think that most fs used as
upper do not implement flush at all.
Using fuse/virtiofs as overlayfs upper is quite new, so maybe that
is not a problem and maybe the new behavior would be preferred
for those users?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: ovl_flush() behavior
  2021-12-01 23:23                                               ` ovl_flush() behavior Amir Goldstein
@ 2021-12-02  2:11                                                 ` Chengguang Xu
  2021-12-02 15:20                                                   ` Vivek Goyal
  2021-12-02 15:14                                                 ` Vivek Goyal
  1 sibling, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-12-02  2:11 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Miklos Szeredi, linux-fsdevel, overlayfs, linux-kernel,
	ronyjin, charliecgxu, Vivek Goyal


 ---- 在 星期四, 2021-12-02 07:23:17 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > > >
 > > > To be honest I even don't fully understand what's the ->flush() logic in overlayfs.
 > > > Why should we open new underlying file when calling ->flush()?
 > > > Is it still correct in the case of opening lower layer first then copy-uped case?
 > > >
 > >
 > > The semantics of flush() are far from being uniform across filesystems.
 > > most local filesystems do nothing on close.
 > > most network fs only flush dirty data when a writer closes a file
 > > but not when a reader closes a file.
 > > It is hard to imagine that applications rely on flush-on-close of
 > > rdonly fd behavior and I agree that flushing only if original fd was upper
 > > makes more sense, so I am not sure if it is really essential for
 > > overlayfs to open an upper rdonly fd just to do whatever the upper fs
 > > would have done on close of rdonly fd, but maybe there is no good
 > > reason to change this behavior either.
 > >
 > 
 > On second thought, I think there may be a good reason to change
 > ovl_flush() otherwise I wouldn't have submitted commit
 > a390ccb316be ("fuse: add FOPEN_NOFLUSH") - I did observe
 > applications that frequently open short lived rdonly fds and suffered
 > undesired latencies on close().
 > 
 > As for "changing existing behavior", I think that most fs used as
 > upper do not implement flush at all.
 > Using fuse/virtiofs as overlayfs upper is quite new, so maybe that
 > is not a problem and maybe the new behavior would be preferred
 > for those users?
 > 

So is that mean simply redirect the ->flush request to original underlying realfile?


Thanks,
Chengguang


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: ovl_flush() behavior
  2021-12-01 23:23                                               ` ovl_flush() behavior Amir Goldstein
  2021-12-02  2:11                                                 ` Chengguang Xu
@ 2021-12-02 15:14                                                 ` Vivek Goyal
  1 sibling, 0 replies; 61+ messages in thread
From: Vivek Goyal @ 2021-12-02 15:14 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Chengguang Xu, Jan Kara, Miklos Szeredi, linux-fsdevel,
	overlayfs, linux-kernel, ronyjin, charliecgxu

On Thu, Dec 02, 2021 at 01:23:17AM +0200, Amir Goldstein wrote:
> > >
> > > To be honest I even don't fully understand what's the ->flush() logic in overlayfs.
> > > Why should we open new underlying file when calling ->flush()?
> > > Is it still correct in the case of opening lower layer first then copy-uped case?
> > >
> >
> > The semantics of flush() are far from being uniform across filesystems.
> > most local filesystems do nothing on close.
> > most network fs only flush dirty data when a writer closes a file
> > but not when a reader closes a file.
> > It is hard to imagine that applications rely on flush-on-close of
> > rdonly fd behavior and I agree that flushing only if original fd was upper
> > makes more sense, so I am not sure if it is really essential for
> > overlayfs to open an upper rdonly fd just to do whatever the upper fs
> > would have done on close of rdonly fd, but maybe there is no good
> > reason to change this behavior either.
> >
> 
> On second thought, I think there may be a good reason to change
> ovl_flush() otherwise I wouldn't have submitted commit
> a390ccb316be ("fuse: add FOPEN_NOFLUSH") - I did observe
> applications that frequently open short lived rdonly fds and suffered
> undesired latencies on close().
> 
> As for "changing existing behavior", I think that most fs used as
> upper do not implement flush at all.
> Using fuse/virtiofs as overlayfs upper is quite new, so maybe that
> is not a problem and maybe the new behavior would be preferred
> for those users?

It probably will be nice not to send flush to fuse server when it is not
required.

Right now in virtiofsd, I see that we are depending on flush being sent
as we are dealing with remote posix lock magic. I am supporting remotme
posix locks in virtiofs and virtiofsd is building these on top of open
file description locks on host. (Can't use posix locks on host as these
locks are per process and virtiofsd is single process working on behalf
of all the guest processes, and unexpected things happen).

When an fd is being closed, flush request is sent and along with it we
also send "lock_owner".

inarg.lock_owner = fuse_lock_owner_id(fm->fc, id);

We basically use this to keep track which process is closing the fd and
release associated OFD locks on host. /me needs to dive into details
to explain it better. Will do that if need be.

Bottom line is that as of now virtiofsd seems to be relying on receiving
FLUSH requests when remote posix locks are enabled. Maybe we can set
FOPEN_NOFLUSH when remote posix locks are not enabled.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: ovl_flush() behavior
  2021-12-02  2:11                                                 ` Chengguang Xu
@ 2021-12-02 15:20                                                   ` Vivek Goyal
  2021-12-02 15:59                                                     ` Amir Goldstein
  0 siblings, 1 reply; 61+ messages in thread
From: Vivek Goyal @ 2021-12-02 15:20 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Amir Goldstein, Jan Kara, Miklos Szeredi, linux-fsdevel,
	overlayfs, linux-kernel, ronyjin, charliecgxu

On Thu, Dec 02, 2021 at 10:11:39AM +0800, Chengguang Xu wrote:
> 
>  ---- 在 星期四, 2021-12-02 07:23:17 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  > > >
>  > > > To be honest I even don't fully understand what's the ->flush() logic in overlayfs.
>  > > > Why should we open new underlying file when calling ->flush()?
>  > > > Is it still correct in the case of opening lower layer first then copy-uped case?
>  > > >
>  > >
>  > > The semantics of flush() are far from being uniform across filesystems.
>  > > most local filesystems do nothing on close.
>  > > most network fs only flush dirty data when a writer closes a file
>  > > but not when a reader closes a file.
>  > > It is hard to imagine that applications rely on flush-on-close of
>  > > rdonly fd behavior and I agree that flushing only if original fd was upper
>  > > makes more sense, so I am not sure if it is really essential for
>  > > overlayfs to open an upper rdonly fd just to do whatever the upper fs
>  > > would have done on close of rdonly fd, but maybe there is no good
>  > > reason to change this behavior either.
>  > >
>  > 
>  > On second thought, I think there may be a good reason to change
>  > ovl_flush() otherwise I wouldn't have submitted commit
>  > a390ccb316be ("fuse: add FOPEN_NOFLUSH") - I did observe
>  > applications that frequently open short lived rdonly fds and suffered
>  > undesired latencies on close().
>  > 
>  > As for "changing existing behavior", I think that most fs used as
>  > upper do not implement flush at all.
>  > Using fuse/virtiofs as overlayfs upper is quite new, so maybe that
>  > is not a problem and maybe the new behavior would be preferred
>  > for those users?
>  > 
> 
> So is that mean simply redirect the ->flush request to original underlying realfile?

If the file has been copied up since open(), then flush should go on upper
file, right?

I think Amir is talking about that can we optimize flush in overlay and
not call ->flush at all if file was opened read-only, IIUC.

In case of fuse he left it to server. If that's the case, then in case
of overlayfs, it should be left to underlyng filesystem as well?
Otherwise, it might happen underlying filesystem (like virtiofs) might
be expecting ->flush() and overlayfs decided not to call it because
file was read only.

So I will lean towards continue to call ->flush in overlay and try to
optimize virtiofsd to set FOPEN_NOFLUSH when not required.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: ovl_flush() behavior
  2021-12-02 15:20                                                   ` Vivek Goyal
@ 2021-12-02 15:59                                                     ` Amir Goldstein
  2021-12-02 22:00                                                       ` Vivek Goyal
  0 siblings, 1 reply; 61+ messages in thread
From: Amir Goldstein @ 2021-12-02 15:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Chengguang Xu, Jan Kara, Miklos Szeredi, linux-fsdevel,
	overlayfs, linux-kernel, ronyjin, charliecgxu

On Thu, Dec 2, 2021 at 5:20 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Thu, Dec 02, 2021 at 10:11:39AM +0800, Chengguang Xu wrote:
> >
> >  ---- 在 星期四, 2021-12-02 07:23:17 Amir Goldstein <amir73il@gmail.com> 撰写 ----
> >  > > >
> >  > > > To be honest I even don't fully understand what's the ->flush() logic in overlayfs.
> >  > > > Why should we open new underlying file when calling ->flush()?
> >  > > > Is it still correct in the case of opening lower layer first then copy-uped case?
> >  > > >
> >  > >
> >  > > The semantics of flush() are far from being uniform across filesystems.
> >  > > most local filesystems do nothing on close.
> >  > > most network fs only flush dirty data when a writer closes a file
> >  > > but not when a reader closes a file.
> >  > > It is hard to imagine that applications rely on flush-on-close of
> >  > > rdonly fd behavior and I agree that flushing only if original fd was upper
> >  > > makes more sense, so I am not sure if it is really essential for
> >  > > overlayfs to open an upper rdonly fd just to do whatever the upper fs
> >  > > would have done on close of rdonly fd, but maybe there is no good
> >  > > reason to change this behavior either.
> >  > >
> >  >
> >  > On second thought, I think there may be a good reason to change
> >  > ovl_flush() otherwise I wouldn't have submitted commit
> >  > a390ccb316be ("fuse: add FOPEN_NOFLUSH") - I did observe
> >  > applications that frequently open short lived rdonly fds and suffered
> >  > undesired latencies on close().
> >  >
> >  > As for "changing existing behavior", I think that most fs used as
> >  > upper do not implement flush at all.
> >  > Using fuse/virtiofs as overlayfs upper is quite new, so maybe that
> >  > is not a problem and maybe the new behavior would be preferred
> >  > for those users?
> >  >
> >
> > So is that mean simply redirect the ->flush request to original underlying realfile?
>
> If the file has been copied up since open(), then flush should go on upper
> file, right?
>
> I think Amir is talking about that can we optimize flush in overlay and
> not call ->flush at all if file was opened read-only, IIUC.
>

Maybe that's what I wrote, but what I meant was if file was open as
lower read-only and later copied up, not sure we should bother with
ovl_open_realfile() for flushing upper.

> In case of fuse he left it to server. If that's the case, then in case
> of overlayfs, it should be left to underlyng filesystem as well?
> Otherwise, it might happen underlying filesystem (like virtiofs) might
> be expecting ->flush() and overlayfs decided not to call it because
> file was read only.
>

Certainly, if upper wants flush on rdonly file we must call flush on
close of rdonly file *that was opened on upper*.

But if we opened rdonly file on lower, even if lower is virtiofs, does
virtiosfd needs this flush? that same file on the server was not supposed
to be written by anyone.
If virtiofsd really needs this flush then it is a problem already today,
because if lower file was copied up since open rdonly, virtiofsd
is not going to get the flushes for the lower file (only the release).

However, I now realize that if we opened file rdonly on lower,
we may have later opened a short lived realfile on upper for read post
copy up and we never issued a flush for this short live rdonly upper fd.
So actually, unless we store a flag or something that says that
we never opened upper realfile, we should keep current behavior.

> So I will lean towards continue to call ->flush in overlay and try to
> optimize virtiofsd to set FOPEN_NOFLUSH when not required.
>

Makes sense.
Calling flush() on fs that does nothing with it doesn't hurt.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: ovl_flush() behavior
  2021-12-02 15:59                                                     ` Amir Goldstein
@ 2021-12-02 22:00                                                       ` Vivek Goyal
  0 siblings, 0 replies; 61+ messages in thread
From: Vivek Goyal @ 2021-12-02 22:00 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Chengguang Xu, Jan Kara, Miklos Szeredi, linux-fsdevel,
	overlayfs, linux-kernel, ronyjin, charliecgxu

On Thu, Dec 02, 2021 at 05:59:56PM +0200, Amir Goldstein wrote:
> On Thu, Dec 2, 2021 at 5:20 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Thu, Dec 02, 2021 at 10:11:39AM +0800, Chengguang Xu wrote:
> > >
> > >  ---- 在 星期四, 2021-12-02 07:23:17 Amir Goldstein <amir73il@gmail.com> 撰写 ----
> > >  > > >
> > >  > > > To be honest I even don't fully understand what's the ->flush() logic in overlayfs.
> > >  > > > Why should we open new underlying file when calling ->flush()?
> > >  > > > Is it still correct in the case of opening lower layer first then copy-uped case?
> > >  > > >
> > >  > >
> > >  > > The semantics of flush() are far from being uniform across filesystems.
> > >  > > most local filesystems do nothing on close.
> > >  > > most network fs only flush dirty data when a writer closes a file
> > >  > > but not when a reader closes a file.
> > >  > > It is hard to imagine that applications rely on flush-on-close of
> > >  > > rdonly fd behavior and I agree that flushing only if original fd was upper
> > >  > > makes more sense, so I am not sure if it is really essential for
> > >  > > overlayfs to open an upper rdonly fd just to do whatever the upper fs
> > >  > > would have done on close of rdonly fd, but maybe there is no good
> > >  > > reason to change this behavior either.
> > >  > >
> > >  >
> > >  > On second thought, I think there may be a good reason to change
> > >  > ovl_flush() otherwise I wouldn't have submitted commit
> > >  > a390ccb316be ("fuse: add FOPEN_NOFLUSH") - I did observe
> > >  > applications that frequently open short lived rdonly fds and suffered
> > >  > undesired latencies on close().
> > >  >
> > >  > As for "changing existing behavior", I think that most fs used as
> > >  > upper do not implement flush at all.
> > >  > Using fuse/virtiofs as overlayfs upper is quite new, so maybe that
> > >  > is not a problem and maybe the new behavior would be preferred
> > >  > for those users?
> > >  >
> > >
> > > So is that mean simply redirect the ->flush request to original underlying realfile?
> >
> > If the file has been copied up since open(), then flush should go on upper
> > file, right?
> >
> > I think Amir is talking about that can we optimize flush in overlay and
> > not call ->flush at all if file was opened read-only, IIUC.
> >
> 
> Maybe that's what I wrote, but what I meant was if file was open as
> lower read-only and later copied up, not sure we should bother with
> ovl_open_realfile() for flushing upper.

I am not sure either. Miklos might have thoughts on this.

> 
> > In case of fuse he left it to server. If that's the case, then in case
> > of overlayfs, it should be left to underlyng filesystem as well?
> > Otherwise, it might happen underlying filesystem (like virtiofs) might
> > be expecting ->flush() and overlayfs decided not to call it because
> > file was read only.
> >
> 
> Certainly, if upper wants flush on rdonly file we must call flush on
> close of rdonly file *that was opened on upper*.
> 
> But if we opened rdonly file on lower, even if lower is virtiofs, does
> virtiosfd needs this flush?

Right now virtiofsd seems to care about flush for read only files for
remote posix locks.  Given overlayfs is not registering f_op->lock() handler,
my understaning is that all locking will be done by VFS on overlayfs inode
and it will never reach to the level of virtiofs (lower or upper). If that's
the case, then atleast from locking perspective we don't need flush on
lower for virtiofs.

> that same file on the server was not supposed
> to be written by anyone.
> If virtiofsd really needs this flush then it is a problem already today,
> because if lower file was copied up since open rdonly, virtiofsd
> is not going to get the flushes for the lower file (only the release).

So virtiofs will not get flushes on lower only if file got copied-up
after opening on lower, right? So far nobody has complained. But if
there is a use case somewhere, we might have to issue a flush on lower
as well.

> 
> However, I now realize that if we opened file rdonly on lower,
> we may have later opened a short lived realfile on upper for read post
> copy up and we never issued a flush for this short live rdonly upper fd.
> So actually, unless we store a flag or something that says that
> we never opened upper realfile, we should keep current behavior.

I am assuming you are referring to ovl_read_iter() where it will open
upper file for read and then call fdput(real). So why should we issue
flush on upper in this case?

I thought flush will need to be issued only when overlay "fd" as seen
by user space is closed (and not when inernal fds opened by overlay are
closed).

So real question is, do we need to issue flush when fd is opened read
only. If yes, then we probably need to issue flush both on lower and
upper (and not only on upper). But if flush is to be issued only for
the case of fd which was writable, then issuing it on upper makes
sense.

> 
> > So I will lean towards continue to call ->flush in overlay and try to
> > optimize virtiofsd to set FOPEN_NOFLUSH when not required.
> >
> 
> Makes sense.
> Calling flush() on fs that does nothing with it doesn't hurt.

This probably is safest right now given virtiofs expects flush to be
issued even for read only fd (in case of remote posix locks). Though
that's a different thing that when overlayfs is sitting on top, we
will not be using remote posix lock functionality of virtiofs, IIUC.

Vivek


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-12-01 22:47                                             ` Amir Goldstein
  2021-12-01 23:23                                               ` ovl_flush() behavior Amir Goldstein
@ 2021-12-05 14:06                                               ` Chengguang Xu
  2021-12-07  5:33                                                 ` Amir Goldstein
  1 sibling, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2021-12-05 14:06 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, overlayfs, linux-kernel, ronyjin,
	charliecgxu, Vivek Goyal, Miklos Szeredi

 ---- 在 星期四, 2021-12-02 06:47:25 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Wed, Dec 1, 2021 at 6:24 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期三, 2021-12-01 21:46:10 Jan Kara <jack@suse.cz> 撰写 ----
 > >  > On Wed 01-12-21 09:19:17, Amir Goldstein wrote:
 > >  > > On Wed, Dec 1, 2021 at 8:31 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > > > So the final solution to handle all the concerns looks like accurately
 > >  > > > mark overlay inode diry on modification and re-mark dirty only for
 > >  > > > mmaped file in ->write_inode().
 > >  > > >
 > >  > > > Hi Miklos, Jan
 > >  > > >
 > >  > > > Will you agree with new proposal above?
 > >  > > >
 > >  > >
 > >  > > Maybe you can still pull off a simpler version by remarking dirty only
 > >  > > writably mmapped upper AND inode_is_open_for_write(upper)?
 > >  >
 > >  > Well, if inode is writeably mapped, it must be also open for write, doesn't
 > >  > it? The VMA of the mapping will hold file open. So remarking overlay inode
 > >  > dirty during writeback while inode_is_open_for_write(upper) looks like
 > >  > reasonably easy and presumably there won't be that many inodes open for
 > >  > writing for this to become big overhead?
 > 
 > I think it should be ok and a good tradeoff of complexity vs. performance.

IMO, mark dirtiness on write is relatively simple, so I think we can mark the 
overlayfs inode dirty during real write behavior and only remark writable mmap
unconditionally in ->write_inode().


 > 
 > >  >
 > >  > > If I am not mistaken, if you always mark overlay inode dirty on ovl_flush()
 > >  > > of FMODE_WRITE file, there is nothing that can make upper inode dirty
 > >  > > after last close (if upper is not mmaped), so one more inode sync should
 > >  > > be enough. No?
 > >  >
 > >  > But we still need to catch other dirtying events like timestamp updates,
 > >  > truncate(2) etc. to mark overlay inode dirty. Not sure how reliably that
 > >  > can be done...
 > >  >
 > 
 > Oh yeh, we have those as well :)
 > All those cases should be covered by ovl_copyattr() that updates the
 > ovl inode ctime/mtime, so always dirty in ovl_copyattr() should be good.

Currently ovl_copyattr() does not cover all the cases, so I think we still need to carefully
check all the places of calling mnt_want_write().


Thanks,
Chengguang



 > I *think* the only case of ovl_copyattr() that should not dirty is in
 > ovl_inode_init(), so need some special helper there.
 > 
 > >
 > > To be honest I even don't fully understand what's the ->flush() logic in overlayfs.
 > > Why should we open new underlying file when calling ->flush()?
 > > Is it still correct in the case of opening lower layer first then copy-uped case?
 > >
 > 
 > The semantics of flush() are far from being uniform across filesystems.
 > most local filesystems do nothing on close.
 > most network fs only flush dirty data when a writer closes a file
 > but not when a reader closes a file.
 > It is hard to imagine that applications rely on flush-on-close of
 > rdonly fd behavior and I agree that flushing only if original fd was upper
 > makes more sense, so I am not sure if it is really essential for
 > overlayfs to open an upper rdonly fd just to do whatever the upper fs
 > would have done on close of rdonly fd, but maybe there is no good
 > reason to change this behavior either.
 > 
 > Thanks,
 > Amir.
 > 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-12-05 14:06                                               ` [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
@ 2021-12-07  5:33                                                 ` Amir Goldstein
  2022-02-05 16:09                                                   ` Chengguang Xu
  0 siblings, 1 reply; 61+ messages in thread
From: Amir Goldstein @ 2021-12-07  5:33 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, linux-fsdevel, overlayfs, linux-kernel, ronyjin,
	charliecgxu, Vivek Goyal, Miklos Szeredi

On Sun, Dec 5, 2021 at 4:07 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期四, 2021-12-02 06:47:25 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  > On Wed, Dec 1, 2021 at 6:24 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >
>  > >  ---- 在 星期三, 2021-12-01 21:46:10 Jan Kara <jack@suse.cz> 撰写 ----
>  > >  > On Wed 01-12-21 09:19:17, Amir Goldstein wrote:
>  > >  > > On Wed, Dec 1, 2021 at 8:31 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >  > > > So the final solution to handle all the concerns looks like accurately
>  > >  > > > mark overlay inode diry on modification and re-mark dirty only for
>  > >  > > > mmaped file in ->write_inode().
>  > >  > > >
>  > >  > > > Hi Miklos, Jan
>  > >  > > >
>  > >  > > > Will you agree with new proposal above?
>  > >  > > >
>  > >  > >
>  > >  > > Maybe you can still pull off a simpler version by remarking dirty only
>  > >  > > writably mmapped upper AND inode_is_open_for_write(upper)?
>  > >  >
>  > >  > Well, if inode is writeably mapped, it must be also open for write, doesn't
>  > >  > it? The VMA of the mapping will hold file open. So remarking overlay inode
>  > >  > dirty during writeback while inode_is_open_for_write(upper) looks like
>  > >  > reasonably easy and presumably there won't be that many inodes open for
>  > >  > writing for this to become big overhead?
>  >
>  > I think it should be ok and a good tradeoff of complexity vs. performance.
>
> IMO, mark dirtiness on write is relatively simple, so I think we can mark the
> overlayfs inode dirty during real write behavior and only remark writable mmap
> unconditionally in ->write_inode().
>

If by "on write" you mean on write/copy_file_range/splice_write/...
then yes I agree
since we have to cover all other mnt_want_write() cases anyway.

>
>  >
>  > >  >
>  > >  > > If I am not mistaken, if you always mark overlay inode dirty on ovl_flush()
>  > >  > > of FMODE_WRITE file, there is nothing that can make upper inode dirty
>  > >  > > after last close (if upper is not mmaped), so one more inode sync should
>  > >  > > be enough. No?
>  > >  >
>  > >  > But we still need to catch other dirtying events like timestamp updates,
>  > >  > truncate(2) etc. to mark overlay inode dirty. Not sure how reliably that
>  > >  > can be done...
>  > >  >
>  >
>  > Oh yeh, we have those as well :)
>  > All those cases should be covered by ovl_copyattr() that updates the
>  > ovl inode ctime/mtime, so always dirty in ovl_copyattr() should be good.
>
> Currently ovl_copyattr() does not cover all the cases, so I think we still need to carefully
> check all the places of calling mnt_want_write().
>

Careful audit is always good, but if we do not have ovl_copyattr() in
a call site
that should mark inode dirty, then it sounds like a bug, because ovl inode ctime
will not get updated. Do you know of any such cases?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2021-12-07  5:33                                                 ` Amir Goldstein
@ 2022-02-05 16:09                                                   ` Chengguang Xu
  2022-02-05 16:23                                                     ` Amir Goldstein
  0 siblings, 1 reply; 61+ messages in thread
From: Chengguang Xu @ 2022-02-05 16:09 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, overlayfs, linux-kernel, ronyjin,
	charliecgxu, Vivek Goyal, Miklos Szeredi

在 2021/12/7 13:33, Amir Goldstein 写道:
> On Sun, Dec 5, 2021 at 4:07 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>>   ---- 在 星期四, 2021-12-02 06:47:25 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>>   > On Wed, Dec 1, 2021 at 6:24 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>>   > >
>>   > >  ---- 在 星期三, 2021-12-01 21:46:10 Jan Kara <jack@suse.cz> 撰写 ----
>>   > >  > On Wed 01-12-21 09:19:17, Amir Goldstein wrote:
>>   > >  > > On Wed, Dec 1, 2021 at 8:31 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>>   > >  > > > So the final solution to handle all the concerns looks like accurately
>>   > >  > > > mark overlay inode diry on modification and re-mark dirty only for
>>   > >  > > > mmaped file in ->write_inode().
>>   > >  > > >
>>   > >  > > > Hi Miklos, Jan
>>   > >  > > >
>>   > >  > > > Will you agree with new proposal above?
>>   > >  > > >
>>   > >  > >
>>   > >  > > Maybe you can still pull off a simpler version by remarking dirty only
>>   > >  > > writably mmapped upper AND inode_is_open_for_write(upper)?
>>   > >  >
>>   > >  > Well, if inode is writeably mapped, it must be also open for write, doesn't
>>   > >  > it? The VMA of the mapping will hold file open. So remarking overlay inode
>>   > >  > dirty during writeback while inode_is_open_for_write(upper) looks like
>>   > >  > reasonably easy and presumably there won't be that many inodes open for
>>   > >  > writing for this to become big overhead?
>>   >
>>   > I think it should be ok and a good tradeoff of complexity vs. performance.
>>
>> IMO, mark dirtiness on write is relatively simple, so I think we can mark the
>> overlayfs inode dirty during real write behavior and only remark writable mmap
>> unconditionally in ->write_inode().
>>
> If by "on write" you mean on write/copy_file_range/splice_write/...
> then yes I agree
> since we have to cover all other mnt_want_write() cases anyway.
>
>>   >
>>   > >  >
>>   > >  > > If I am not mistaken, if you always mark overlay inode dirty on ovl_flush()
>>   > >  > > of FMODE_WRITE file, there is nothing that can make upper inode dirty
>>   > >  > > after last close (if upper is not mmaped), so one more inode sync should
>>   > >  > > be enough. No?
>>   > >  >
>>   > >  > But we still need to catch other dirtying events like timestamp updates,
>>   > >  > truncate(2) etc. to mark overlay inode dirty. Not sure how reliably that
>>   > >  > can be done...
>>   > >  >
>>   >
>>   > Oh yeh, we have those as well :)
>>   > All those cases should be covered by ovl_copyattr() that updates the
>>   > ovl inode ctime/mtime, so always dirty in ovl_copyattr() should be good.
>>
>> Currently ovl_copyattr() does not cover all the cases, so I think we still need to carefully
>> check all the places of calling mnt_want_write().
>>
> Careful audit is always good, but if we do not have ovl_copyattr() in
> a call site
> that should mark inode dirty, then it sounds like a bug, because ovl inode ctime
> will not get updated. Do you know of any such cases?

Sorry for my late response, I've been very busy lately.
For your question, for example, there is a case of calling 
ovl_want_write() in ovl_cache_get_impure() and caller does not call 
ovl_copyattr()
so I think we should explicitly mark ovl inode dirty in that case. Is 
that probably a bug?


Thanks,
Chengguang




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation
  2022-02-05 16:09                                                   ` Chengguang Xu
@ 2022-02-05 16:23                                                     ` Amir Goldstein
  0 siblings, 0 replies; 61+ messages in thread
From: Amir Goldstein @ 2022-02-05 16:23 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Jan Kara, linux-fsdevel, overlayfs, linux-kernel, ronyjin,
	charliecgxu, Vivek Goyal, Miklos Szeredi

On Sat, Feb 5, 2022 at 6:10 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> 在 2021/12/7 13:33, Amir Goldstein 写道:
> > On Sun, Dec 5, 2021 at 4:07 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
> >>   ---- 在 星期四, 2021-12-02 06:47:25 Amir Goldstein <amir73il@gmail.com> 撰写 ----
> >>   > On Wed, Dec 1, 2021 at 6:24 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
> >>   > >
> >>   > >  ---- 在 星期三, 2021-12-01 21:46:10 Jan Kara <jack@suse.cz> 撰写 ----
> >>   > >  > On Wed 01-12-21 09:19:17, Amir Goldstein wrote:
> >>   > >  > > On Wed, Dec 1, 2021 at 8:31 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
> >>   > >  > > > So the final solution to handle all the concerns looks like accurately
> >>   > >  > > > mark overlay inode diry on modification and re-mark dirty only for
> >>   > >  > > > mmaped file in ->write_inode().
> >>   > >  > > >
> >>   > >  > > > Hi Miklos, Jan
> >>   > >  > > >
> >>   > >  > > > Will you agree with new proposal above?
> >>   > >  > > >
> >>   > >  > >
> >>   > >  > > Maybe you can still pull off a simpler version by remarking dirty only
> >>   > >  > > writably mmapped upper AND inode_is_open_for_write(upper)?
> >>   > >  >
> >>   > >  > Well, if inode is writeably mapped, it must be also open for write, doesn't
> >>   > >  > it? The VMA of the mapping will hold file open. So remarking overlay inode
> >>   > >  > dirty during writeback while inode_is_open_for_write(upper) looks like
> >>   > >  > reasonably easy and presumably there won't be that many inodes open for
> >>   > >  > writing for this to become big overhead?
> >>   >
> >>   > I think it should be ok and a good tradeoff of complexity vs. performance.
> >>
> >> IMO, mark dirtiness on write is relatively simple, so I think we can mark the
> >> overlayfs inode dirty during real write behavior and only remark writable mmap
> >> unconditionally in ->write_inode().
> >>
> > If by "on write" you mean on write/copy_file_range/splice_write/...
> > then yes I agree
> > since we have to cover all other mnt_want_write() cases anyway.
> >
> >>   >
> >>   > >  >
> >>   > >  > > If I am not mistaken, if you always mark overlay inode dirty on ovl_flush()
> >>   > >  > > of FMODE_WRITE file, there is nothing that can make upper inode dirty
> >>   > >  > > after last close (if upper is not mmaped), so one more inode sync should
> >>   > >  > > be enough. No?
> >>   > >  >
> >>   > >  > But we still need to catch other dirtying events like timestamp updates,
> >>   > >  > truncate(2) etc. to mark overlay inode dirty. Not sure how reliably that
> >>   > >  > can be done...
> >>   > >  >
> >>   >
> >>   > Oh yeh, we have those as well :)
> >>   > All those cases should be covered by ovl_copyattr() that updates the
> >>   > ovl inode ctime/mtime, so always dirty in ovl_copyattr() should be good.
> >>
> >> Currently ovl_copyattr() does not cover all the cases, so I think we still need to carefully
> >> check all the places of calling mnt_want_write().
> >>
> > Careful audit is always good, but if we do not have ovl_copyattr() in
> > a call site
> > that should mark inode dirty, then it sounds like a bug, because ovl inode ctime
> > will not get updated. Do you know of any such cases?
>
> Sorry for my late response, I've been very busy lately.
> For your question, for example, there is a case of calling
> ovl_want_write() in ovl_cache_get_impure() and caller does not call
> ovl_copyattr()
> so I think we should explicitly mark ovl inode dirty in that case. Is
> that probably a bug?
>
>

The correct behavior would be similar to that of setting impure xattr
in ovl_link_up().
We would want to snapshot the upperdir attrs before removing xattr
and restore them after (best effort).
Not that this case is so important, but if you have an opportunity
to mark inode dirty in ovl_copyattr() I think that would be the best
way to go.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2022-02-05 16:23 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-23 13:08 [RFC PATCH v5 00/10] implement containerized syncfs for overlayfs Chengguang Xu
2021-09-23 13:08 ` [RFC PATCH v5 01/10] ovl: setup overlayfs' private bdi Chengguang Xu
2021-09-23 13:08 ` [RFC PATCH v5 02/10] ovl: implement ->writepages operation Chengguang Xu
2021-09-23 13:08 ` [RFC PATCH v5 03/10] ovl: implement overlayfs' ->evict_inode operation Chengguang Xu
2021-10-06 15:33   ` Miklos Szeredi
2021-10-07  6:08     ` Chengguang Xu
2021-10-07  7:43       ` Miklos Szeredi
2021-09-23 13:08 ` [RFC PATCH v5 04/10] ovl: mark overlayfs' inode dirty on modification Chengguang Xu
2021-10-07 18:43   ` Miklos Szeredi
2021-09-23 13:08 ` [RFC PATCH v5 05/10] ovl: mark overlayfs' inode dirty on shared mmap Chengguang Xu
2021-09-23 13:08 ` [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
2021-10-07  9:01   ` Jan Kara
2021-10-07 12:26     ` Chengguang Xu
2021-10-07 14:41       ` Jan Kara
2021-10-07 14:54         ` Chengguang Xu
2021-10-07  9:23   ` Miklos Szeredi
2021-10-07 12:28     ` Chengguang Xu
2021-10-07 12:45       ` Miklos Szeredi
2021-10-07 13:09         ` Chengguang Xu
2021-10-07 13:34           ` Miklos Szeredi
2021-10-07 14:46             ` Jan Kara
2021-10-07 14:53               ` Chengguang Xu
2021-10-07 18:51                 ` Miklos Szeredi
2021-10-08 13:13                   ` Jan Kara
2021-11-16  2:20             ` Chengguang Xu
2021-11-16 12:35               ` Miklos Szeredi
2021-11-17  6:11                 ` Chengguang Xu
2021-11-18  6:32                   ` Chengguang Xu
2021-11-18 11:23                     ` Jan Kara
2021-11-18 12:02                       ` Chengguang Xu
2021-11-18 16:43                         ` Jan Kara
2021-11-19  6:12                           ` Chengguang Xu
2021-11-30 11:22                             ` Jan Kara
2021-11-30 16:09                               ` Chengguang Xu
2021-11-30 19:04                                 ` Amir Goldstein
2021-12-01  2:37                                   ` Chengguang Xu
2021-12-01  6:31                                     ` Chengguang Xu
2021-12-01  7:19                                       ` Amir Goldstein
2021-12-01 13:46                                         ` Jan Kara
2021-12-01 14:59                                           ` Chengguang Xu
2021-12-01 16:24                                           ` Chengguang Xu
2021-12-01 22:47                                             ` Amir Goldstein
2021-12-01 23:23                                               ` ovl_flush() behavior Amir Goldstein
2021-12-02  2:11                                                 ` Chengguang Xu
2021-12-02 15:20                                                   ` Vivek Goyal
2021-12-02 15:59                                                     ` Amir Goldstein
2021-12-02 22:00                                                       ` Vivek Goyal
2021-12-02 15:14                                                 ` Vivek Goyal
2021-12-05 14:06                                               ` [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
2021-12-07  5:33                                                 ` Amir Goldstein
2022-02-05 16:09                                                   ` Chengguang Xu
2022-02-05 16:23                                                     ` Amir Goldstein
2021-09-23 13:08 ` [RFC PATCH v5 07/10] ovl: cache dirty overlayfs' inode Chengguang Xu
2021-10-07 11:09   ` Miklos Szeredi
2021-10-07 12:04     ` Chengguang Xu
2021-10-07 12:27       ` Miklos Szeredi
2021-09-23 13:08 ` [RFC PATCH v5 08/10] fs: export wait_sb_inodes() Chengguang Xu
2021-09-23 13:08 ` [RFC PATCH v5 09/10] fs: introduce new helper sync_fs_and_blockdev() Chengguang Xu
2021-10-19  7:15   ` Amir Goldstein
2021-11-15 11:39     ` Chengguang Xu
2021-09-23 13:08 ` [RFC PATCH v5 10/10] ovl: implement containerized syncfs for overlayfs Chengguang Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).