LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org> To: Miklos Szeredi <miklos@szeredi.hu> Cc: <virtualization@lists.linux-foundation.org>, <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <virtio-fs@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Max Reitz <mreitz@redhat.com>, Vivek Goyal <vgoyal@redhat.com>, Robert Krawitz <rlk@redhat.com> Subject: Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server Date: Fri, 21 May 2021 12:08:40 +0200 [thread overview] Message-ID: <20210521120840.4658d42c@bahia.lan> (raw) In-Reply-To: <20210520154654.1791183-6-groug@kaod.org> On Thu, 20 May 2021 17:46:54 +0200 Greg Kurz <groug@kaod.org> wrote: > Even if POSIX doesn't mandate it, linux users legitimately expect > sync() to flush all data and metadata to physical storage when it > is located on the same system. This isn't happening with virtiofs > though : sync() inside the guest returns right away even though > data still needs to be flushed from the host page cache. > > This is easily demonstrated by doing the following in the guest: > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > sync() = 0 <0.024068> > +++ exited with 0 +++ > > and start the following in the host when the 'dd' command completes > in the guest: > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > fsync(3) = 0 <10.371640> > +++ exited with 0 +++ > > There are no good reasons not to honor the expected behavior of > sync() actually : it gives an unrealistic impression that virtiofs > is super fast and that data has safely landed on HW, which isn't > the case obviously. > > Implement a ->sync_fs() superblock operation that sends a new > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > placeholder for possible future extensions. Since the file > server cannot handle the wait == 0 case, we skip it to avoid a > gratuitous roundtrip. Note that this is per-superblock : a > FUSE_SYNCFS is send for the root mount and for each submount. > s/send/sent Miklos, Great thanks for the quick feedback on these patches ! :) Apart from the fact that nothing is sent for submounts as long as we don't set SB_BORN on them, this patch doesn't really depends on the previous ones. If it looks good to you, maybe you can just merge it and I'll re-post the fixes separately ? Cheers, -- Greg > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > FUSE_SYNCFS in the file server is treated as permanent success. > This ensures compatibility with older file servers : the client > will get the current behavior of sync() not being propagated to > the file server. > > Note that such an operation allows the file server to DoS sync(). > Since a typical FUSE file server is an untrusted piece of software > running in userspace, this is disabled by default. Only enable it > with virtiofs for now since virtiofsd is supposedly trusted by the > guest kernel. > > Reported-by: Robert Krawitz <rlk@redhat.com> > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > fs/fuse/fuse_i.h | 3 +++ > fs/fuse/inode.c | 40 +++++++++++++++++++++++++++++++++++++++ > fs/fuse/virtio_fs.c | 1 + > include/uapi/linux/fuse.h | 10 +++++++++- > 4 files changed, 53 insertions(+), 1 deletion(-) > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index e2f5c8617e0d..01d9283261af 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -761,6 +761,9 @@ struct fuse_conn { > /* Auto-mount submounts announced by the server */ > unsigned int auto_submounts:1; > > + /* Propagate syncfs() to server */ > + unsigned int sync_fs:1; > + > /** The number of requests waiting for completion */ > atomic_t num_waiting; > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index 123b53d1c3c6..96b00253f766 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -506,6 +506,45 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) > return err; > } > > +static int fuse_sync_fs(struct super_block *sb, int wait) > +{ > + struct fuse_mount *fm = get_fuse_mount_super(sb); > + struct fuse_conn *fc = fm->fc; > + struct fuse_syncfs_in inarg; > + FUSE_ARGS(args); > + int err; > + > + /* > + * Userspace cannot handle the wait == 0 case. Avoid a > + * gratuitous roundtrip. > + */ > + if (!wait) > + return 0; > + > + /* The filesystem is being unmounted. Nothing to do. */ > + if (!sb->s_root) > + return 0; > + > + if (!fc->sync_fs) > + return 0; > + > + memset(&inarg, 0, sizeof(inarg)); > + args.in_numargs = 1; > + args.in_args[0].size = sizeof(inarg); > + args.in_args[0].value = &inarg; > + args.opcode = FUSE_SYNCFS; > + args.nodeid = get_node_id(sb->s_root->d_inode); > + args.out_numargs = 0; > + > + err = fuse_simple_request(fm, &args); > + if (err == -ENOSYS) { > + fc->sync_fs = 0; > + err = 0; > + } > + > + return err; > +} > + > enum { > OPT_SOURCE, > OPT_SUBTYPE, > @@ -909,6 +948,7 @@ static const struct super_operations fuse_super_operations = { > .put_super = fuse_put_super, > .umount_begin = fuse_umount_begin, > .statfs = fuse_statfs, > + .sync_fs = fuse_sync_fs, > .show_options = fuse_show_options, > }; > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c > index 8962cd033016..f649a47efb68 100644 > --- a/fs/fuse/virtio_fs.c > +++ b/fs/fuse/virtio_fs.c > @@ -1455,6 +1455,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) > fc->release = fuse_free_conn; > fc->delete_stale = true; > fc->auto_submounts = true; > + fc->sync_fs = true; > > /* Tell FUSE to split requests that exceed the virtqueue's size */ > fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h > index 271ae90a9bb7..36ed092227fa 100644 > --- a/include/uapi/linux/fuse.h > +++ b/include/uapi/linux/fuse.h > @@ -181,6 +181,9 @@ > * - add FUSE_OPEN_KILL_SUIDGID > * - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT > * - add FUSE_SETXATTR_ACL_KILL_SGID > + * > + * 7.34 > + * - add FUSE_SYNCFS > */ > > #ifndef _LINUX_FUSE_H > @@ -216,7 +219,7 @@ > #define FUSE_KERNEL_VERSION 7 > > /** Minor version number of this interface */ > -#define FUSE_KERNEL_MINOR_VERSION 33 > +#define FUSE_KERNEL_MINOR_VERSION 34 > > /** The node ID of the root inode */ > #define FUSE_ROOT_ID 1 > @@ -509,6 +512,7 @@ enum fuse_opcode { > FUSE_COPY_FILE_RANGE = 47, > FUSE_SETUPMAPPING = 48, > FUSE_REMOVEMAPPING = 49, > + FUSE_SYNCFS = 50, > > /* CUSE specific operations */ > CUSE_INIT = 4096, > @@ -971,4 +975,8 @@ struct fuse_removemapping_one { > #define FUSE_REMOVEMAPPING_MAX_ENTRY \ > (PAGE_SIZE / sizeof(struct fuse_removemapping_one)) > > +struct fuse_syncfs_in { > + uint64_t padding; > +}; > + > #endif /* _LINUX_FUSE_H */
next prev parent reply other threads:[~2021-05-21 10:09 UTC|newest] Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-20 15:46 [PATCH v4 0/5] virtiofs: propagate sync() to file server Greg Kurz 2021-05-20 15:46 ` [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path Greg Kurz 2021-05-20 19:45 ` Al Viro 2021-05-21 7:54 ` Miklos Szeredi 2021-05-21 8:15 ` Greg Kurz 2021-05-21 8:23 ` Miklos Szeredi 2021-05-21 8:08 ` Greg Kurz 2021-05-20 15:46 ` [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts Greg Kurz 2021-05-21 8:19 ` Miklos Szeredi 2021-05-21 8:28 ` Greg Kurz 2021-05-22 17:50 ` kernel test robot 2021-05-22 20:12 ` kernel test robot 2021-05-20 15:46 ` [PATCH v4 3/5] fuse: Make fuse_fill_super_submount() static Greg Kurz 2021-05-20 15:46 ` [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() Greg Kurz 2021-05-21 8:26 ` Miklos Szeredi 2021-05-21 8:39 ` Greg Kurz 2021-05-21 8:50 ` Miklos Szeredi 2021-05-21 10:06 ` Greg Kurz 2021-05-21 12:37 ` Miklos Szeredi 2021-05-21 13:36 ` Greg Kurz 2021-05-20 15:46 ` [PATCH v4 5/5] virtiofs: propagate sync() to file server Greg Kurz 2021-05-21 10:08 ` Greg Kurz [this message] 2021-05-21 12:51 ` Miklos Szeredi 2021-08-15 14:14 ` Amir Goldstein 2021-08-16 15:29 ` Vivek Goyal 2021-08-16 18:57 ` Amir Goldstein 2021-08-16 19:11 ` Vivek Goyal 2021-08-16 19:46 ` Amir Goldstein 2021-08-28 15:21 ` Miklos Szeredi 2021-08-30 17:01 ` Vivek Goyal 2021-08-30 17:36 ` Miklos Szeredi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210521120840.4658d42c@bahia.lan \ --to=groug@kaod.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=miklos@szeredi.hu \ --cc=mreitz@redhat.com \ --cc=rlk@redhat.com \ --cc=stefanha@redhat.com \ --cc=vgoyal@redhat.com \ --cc=virtio-fs@redhat.com \ --cc=virtualization@lists.linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).