LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: virtualization@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtio-fs@redhat.com, Stefan Hajnoczi <stefanha@redhat.com>,
	Max Reitz <mreitz@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
	Greg Kurz <groug@kaod.org>, Robert Krawitz <rlk@redhat.com>
Subject: [PATCH v4 5/5] virtiofs: propagate sync() to file server
Date: Thu, 20 May 2021 17:46:54 +0200	[thread overview]
Message-ID: <20210520154654.1791183-6-groug@kaod.org> (raw)
In-Reply-To: <20210520154654.1791183-1-groug@kaod.org>

Even if POSIX doesn't mandate it, linux users legitimately expect
sync() to flush all data and metadata to physical storage when it
is located on the same system. This isn't happening with virtiofs
though : sync() inside the guest returns right away even though
data still needs to be flushed from the host page cache.

This is easily demonstrated by doing the following in the guest:

$ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s
sync()                                  = 0 <0.024068>
+++ exited with 0 +++

and start the following in the host when the 'dd' command completes
in the guest:

$ strace -T -e fsync /usr/bin/sync virtiofs/foo
fsync(3)                                = 0 <10.371640>
+++ exited with 0 +++

There are no good reasons not to honor the expected behavior of
sync() actually : it gives an unrealistic impression that virtiofs
is super fast and that data has safely landed on HW, which isn't
the case obviously.

Implement a ->sync_fs() superblock operation that sends a new
FUSE_SYNCFS request type for this purpose. Provision a 64-bit
placeholder for possible future extensions. Since the file
server cannot handle the wait == 0 case, we skip it to avoid a
gratuitous roundtrip. Note that this is per-superblock : a
FUSE_SYNCFS is send for the root mount and for each submount.

Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for
FUSE_SYNCFS in the file server is treated as permanent success.
This ensures compatibility with older file servers : the client
will get the current behavior of sync() not being propagated to
the file server.

Note that such an operation allows the file server to DoS sync().
Since a typical FUSE file server is an untrusted piece of software
running in userspace, this is disabled by default.  Only enable it
with virtiofs for now since virtiofsd is supposedly trusted by the
guest kernel.

Reported-by: Robert Krawitz <rlk@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 fs/fuse/fuse_i.h          |  3 +++
 fs/fuse/inode.c           | 40 +++++++++++++++++++++++++++++++++++++++
 fs/fuse/virtio_fs.c       |  1 +
 include/uapi/linux/fuse.h | 10 +++++++++-
 4 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e2f5c8617e0d..01d9283261af 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -761,6 +761,9 @@ struct fuse_conn {
 	/* Auto-mount submounts announced by the server */
 	unsigned int auto_submounts:1;
 
+	/* Propagate syncfs() to server */
+	unsigned int sync_fs:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 123b53d1c3c6..96b00253f766 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -506,6 +506,45 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf)
 	return err;
 }
 
+static int fuse_sync_fs(struct super_block *sb, int wait)
+{
+	struct fuse_mount *fm = get_fuse_mount_super(sb);
+	struct fuse_conn *fc = fm->fc;
+	struct fuse_syncfs_in inarg;
+	FUSE_ARGS(args);
+	int err;
+
+	/*
+	 * Userspace cannot handle the wait == 0 case. Avoid a
+	 * gratuitous roundtrip.
+	 */
+	if (!wait)
+		return 0;
+
+	/* The filesystem is being unmounted. Nothing to do. */
+	if (!sb->s_root)
+		return 0;
+
+	if (!fc->sync_fs)
+		return 0;
+
+	memset(&inarg, 0, sizeof(inarg));
+	args.in_numargs = 1;
+	args.in_args[0].size = sizeof(inarg);
+	args.in_args[0].value = &inarg;
+	args.opcode = FUSE_SYNCFS;
+	args.nodeid = get_node_id(sb->s_root->d_inode);
+	args.out_numargs = 0;
+
+	err = fuse_simple_request(fm, &args);
+	if (err == -ENOSYS) {
+		fc->sync_fs = 0;
+		err = 0;
+	}
+
+	return err;
+}
+
 enum {
 	OPT_SOURCE,
 	OPT_SUBTYPE,
@@ -909,6 +948,7 @@ static const struct super_operations fuse_super_operations = {
 	.put_super	= fuse_put_super,
 	.umount_begin	= fuse_umount_begin,
 	.statfs		= fuse_statfs,
+	.sync_fs	= fuse_sync_fs,
 	.show_options	= fuse_show_options,
 };
 
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 8962cd033016..f649a47efb68 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -1455,6 +1455,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc)
 	fc->release = fuse_free_conn;
 	fc->delete_stale = true;
 	fc->auto_submounts = true;
+	fc->sync_fs = true;
 
 	/* Tell FUSE to split requests that exceed the virtqueue's size */
 	fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit,
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 271ae90a9bb7..36ed092227fa 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -181,6 +181,9 @@
  *  - add FUSE_OPEN_KILL_SUIDGID
  *  - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT
  *  - add FUSE_SETXATTR_ACL_KILL_SGID
+ *
+ *  7.34
+ *  - add FUSE_SYNCFS
  */
 
 #ifndef _LINUX_FUSE_H
@@ -216,7 +219,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 33
+#define FUSE_KERNEL_MINOR_VERSION 34
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -509,6 +512,7 @@ enum fuse_opcode {
 	FUSE_COPY_FILE_RANGE	= 47,
 	FUSE_SETUPMAPPING	= 48,
 	FUSE_REMOVEMAPPING	= 49,
+	FUSE_SYNCFS		= 50,
 
 	/* CUSE specific operations */
 	CUSE_INIT		= 4096,
@@ -971,4 +975,8 @@ struct fuse_removemapping_one {
 #define FUSE_REMOVEMAPPING_MAX_ENTRY   \
 		(PAGE_SIZE / sizeof(struct fuse_removemapping_one))
 
+struct fuse_syncfs_in {
+	uint64_t	padding;
+};
+
 #endif /* _LINUX_FUSE_H */
-- 
2.26.3


  parent reply	other threads:[~2021-05-20 15:47 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-20 15:46 [PATCH v4 0/5] " Greg Kurz
2021-05-20 15:46 ` [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path Greg Kurz
2021-05-20 19:45   ` Al Viro
2021-05-21  7:54     ` Miklos Szeredi
2021-05-21  8:15       ` Greg Kurz
2021-05-21  8:23         ` Miklos Szeredi
2021-05-21  8:08     ` Greg Kurz
2021-05-20 15:46 ` [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts Greg Kurz
2021-05-21  8:19   ` Miklos Szeredi
2021-05-21  8:28     ` Greg Kurz
2021-05-22 17:50   ` kernel test robot
2021-05-22 20:12   ` kernel test robot
2021-05-20 15:46 ` [PATCH v4 3/5] fuse: Make fuse_fill_super_submount() static Greg Kurz
2021-05-20 15:46 ` [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() Greg Kurz
2021-05-21  8:26   ` Miklos Szeredi
2021-05-21  8:39     ` Greg Kurz
2021-05-21  8:50       ` Miklos Szeredi
2021-05-21 10:06         ` Greg Kurz
2021-05-21 12:37           ` Miklos Szeredi
2021-05-21 13:36             ` Greg Kurz
2021-05-20 15:46 ` Greg Kurz [this message]
2021-05-21 10:08   ` [PATCH v4 5/5] virtiofs: propagate sync() to file server Greg Kurz
2021-05-21 12:51     ` Miklos Szeredi
2021-08-15 14:14   ` Amir Goldstein
2021-08-16 15:29     ` Vivek Goyal
2021-08-16 18:57       ` Amir Goldstein
2021-08-16 19:11         ` Vivek Goyal
2021-08-16 19:46           ` Amir Goldstein
2021-08-28 15:21       ` Miklos Szeredi
2021-08-30 17:01         ` Vivek Goyal
2021-08-30 17:36           ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210520154654.1791183-6-groug@kaod.org \
    --to=groug@kaod.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mreitz@redhat.com \
    --cc=rlk@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtio-fs@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --subject='Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).