LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() [not found] <20080225191043.GA32342@lst.de> @ 2008-02-27 19:16 ` Roland Dreier 2008-02-27 19:18 ` [PATCH 2/2] IB/uverbs: Use anonymous inodes instead of private filesystem Roland Dreier ` (3 more replies) 0 siblings, 4 replies; 19+ messages in thread From: Roland Dreier @ 2008-02-27 19:16 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-fsdevel, linux-kernel, Davide Libenzi, Avi Kivity, kvm-devel, akpm The anonymous inodes interface anon_inode_getfd() calls fd_install() for the newly created fd, which does not work for some use cases where the caller must do futher initialization before exposing the file to userspace. This is also probably not the safest interface, since the caller must be sure that it is OK if userspace closes the fd before anon_inode_getfd() even returns. Therefore, change the anonymous inodes interface so that the caller is responsible for calling fd_install(), and change the name of the function from anon_inode_getfd() to anon_inode_allocfd() so that any code using the old interface breaks at compilation rather than failing in a strange way. Fix up all the in-kernel users to use the new interface. This change will allow drivers/infiniband/uverbs_main.c to use anonymous inodes and drop a chunk of virtual filesystem code (including the last modular caller of get_empty_filp()). This also allows us to fix a (probably never triggered) race in the kvm code. In create_vcpu_fd(), if userspace guessed the the fd that was about to be returned and closed it immediately then kvm_vcpu_release() could run and do: fput(vcpu->kvm->filp); before the increment of the refcount in create_vcpu_fd(): atomic_inc(&vcpu->kvm->filp->f_count); which seems to leave some room for problems. (This race at least shows how the existing anonymous inode interface is tricky to use) Tested with kvm and with an app that uses the epoll interface. The signalfd and timerfd changes are untested but are trivial enough (just adding an fd_install() call) that they are almost certainly OK. Signed-off-by: Roland Dreier <rolandd@cisco.com> --- fs/anon_inodes.c | 20 +++++++++++--------- fs/eventfd.c | 12 +++++------- fs/eventpoll.c | 6 ++++-- fs/signalfd.c | 10 ++++------ fs/timerfd.c | 5 +++-- include/linux/anon_inodes.h | 6 +++--- virt/kvm/kvm_main.c | 9 ++++++--- 7 files changed, 36 insertions(+), 32 deletions(-) diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c index 2332188..733054d 100644 --- a/fs/anon_inodes.c +++ b/fs/anon_inodes.c @@ -53,9 +53,9 @@ static struct dentry_operations anon_inodefs_dentry_operations = { }; /** - * anon_inode_getfd - creates a new file instance by hooking it up to an - * anonymous inode, and a dentry that describe the "class" - * of the file + * anon_inode_allocfd - creates a new file instance by hooking it up to an + * anonymous inode, and a dentry that describe the "class" + * of the file * * @pfd: [out] pointer to the file descriptor * @dpinode: [out] pointer to the inode @@ -69,10 +69,14 @@ static struct dentry_operations anon_inodefs_dentry_operations = { * All the files created with anon_inode_getfd() will share a single inode, * hence saving memory and avoiding code duplication for the file/inode/dentry * setup. + * + * Unlike the earlier anon_inode_allocfd() interface, this function does not + * call fd_install(), since the caller may have further initialization to do + * before it is safe to expose the file to userspace. */ -int anon_inode_getfd(int *pfd, struct inode **pinode, struct file **pfile, - const char *name, const struct file_operations *fops, - void *priv) +int anon_inode_allocfd(int *pfd, struct inode **pinode, struct file **pfile, + const char *name, const struct file_operations *fops, + void *priv) { struct qstr this; struct dentry *dentry; @@ -125,8 +129,6 @@ int anon_inode_getfd(int *pfd, struct inode **pinode, struct file **pfile, file->f_version = 0; file->private_data = priv; - fd_install(fd, file); - *pfd = fd; *pinode = anon_inode_inode; *pfile = file; @@ -138,7 +140,7 @@ err_put_filp: put_filp(file); return error; } -EXPORT_SYMBOL_GPL(anon_inode_getfd); +EXPORT_SYMBOL_GPL(anon_inode_allocfd); /* * A single inode exists for all anon_inode files. Contrary to pipes, diff --git a/fs/eventfd.c b/fs/eventfd.c index a9f130c..76e9cd2 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -212,14 +212,12 @@ asmlinkage long sys_eventfd(unsigned int count) init_waitqueue_head(&ctx->wqh); ctx->count = count; - /* - * When we call this, the initialization must be complete, since - * anon_inode_getfd() will install the fd. - */ - error = anon_inode_getfd(&fd, &inode, &file, "[eventfd]", - &eventfd_fops, ctx); - if (!error) + error = anon_inode_allocfd(&fd, &inode, &file, "[eventfd]", + &eventfd_fops, ctx); + if (!error) { + fd_install(fd, file); return fd; + } kfree(ctx); return error; diff --git a/fs/eventpoll.c b/fs/eventpoll.c index a415f42..9edf018 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1089,11 +1089,13 @@ asmlinkage long sys_epoll_create(int size) * Creates all the items needed to setup an eventpoll file. That is, * a file structure, and inode and a free file descriptor. */ - error = anon_inode_getfd(&fd, &inode, &file, "[eventpoll]", - &eventpoll_fops, ep); + error = anon_inode_allocfd(&fd, &inode, &file, "[eventpoll]", + &eventpoll_fops, ep); if (error) goto error_free; + fd_install(fd, file); + DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d) = %d\n", current, size, fd)); diff --git a/fs/signalfd.c b/fs/signalfd.c index cb2b63a..16f71da 100644 --- a/fs/signalfd.c +++ b/fs/signalfd.c @@ -221,14 +221,12 @@ asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemas ctx->sigmask = sigmask; - /* - * When we call this, the initialization must be complete, since - * anon_inode_getfd() will install the fd. - */ - error = anon_inode_getfd(&ufd, &inode, &file, "[signalfd]", - &signalfd_fops, ctx); + error = anon_inode_allocfd(&ufd, &inode, &file, "[signalfd]", + &signalfd_fops, ctx); if (error) goto err_fdalloc; + + fd_install(ufd, file); } else { file = fget(ufd); if (!file) diff --git a/fs/timerfd.c b/fs/timerfd.c index 10c80b5..6ce63e1 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -199,13 +199,14 @@ asmlinkage long sys_timerfd_create(int clockid, int flags) ctx->clockid = clockid; hrtimer_init(&ctx->tmr, clockid, HRTIMER_MODE_ABS); - error = anon_inode_getfd(&ufd, &inode, &file, "[timerfd]", - &timerfd_fops, ctx); + error = anon_inode_allocfd(&ufd, &inode, &file, "[timerfd]", + &timerfd_fops, ctx); if (error) { kfree(ctx); return error; } + fd_install(ufd, file); return ufd; } diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h index b2e1ba3..35f7ecb 100644 --- a/include/linux/anon_inodes.h +++ b/include/linux/anon_inodes.h @@ -8,9 +8,9 @@ #ifndef _LINUX_ANON_INODES_H #define _LINUX_ANON_INODES_H -int anon_inode_getfd(int *pfd, struct inode **pinode, struct file **pfile, - const char *name, const struct file_operations *fops, - void *priv); +int anon_inode_allocfd(int *pfd, struct inode **pinode, struct file **pfile, + const char *name, const struct file_operations *fops, + void *priv); #endif /* _LINUX_ANON_INODES_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 32fbf80..a0bf208 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -720,11 +720,12 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu) struct inode *inode; struct file *file; - r = anon_inode_getfd(&fd, &inode, &file, - "kvm-vcpu", &kvm_vcpu_fops, vcpu); + r = anon_inode_allocfd(&fd, &inode, &file, + "kvm-vcpu", &kvm_vcpu_fops, vcpu); if (r) return r; atomic_inc(&vcpu->kvm->filp->f_count); + fd_install(fd, file); return fd; } @@ -1021,7 +1022,7 @@ static int kvm_dev_ioctl_create_vm(void) kvm = kvm_create_vm(); if (IS_ERR(kvm)) return PTR_ERR(kvm); - r = anon_inode_getfd(&fd, &inode, &file, "kvm-vm", &kvm_vm_fops, kvm); + r = anon_inode_allocfd(&fd, &inode, &file, "kvm-vm", &kvm_vm_fops, kvm); if (r) { kvm_destroy_vm(kvm); return r; @@ -1029,6 +1030,8 @@ static int kvm_dev_ioctl_create_vm(void) kvm->filp = file; + fd_install(fd, file); + return fd; } -- 1.5.4.2 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/2] IB/uverbs: Use anonymous inodes instead of private filesystem 2008-02-27 19:16 ` [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() Roland Dreier @ 2008-02-27 19:18 ` Roland Dreier 2008-02-27 19:20 ` [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() Avi Kivity ` (2 subsequent siblings) 3 siblings, 0 replies; 19+ messages in thread From: Roland Dreier @ 2008-02-27 19:18 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-fsdevel, linux-kernel, Davide Libenzi, akpm Now that anonymous inodes allow us to control when fd_install() is called, we can convert ib_uverbs to use them instead of its own private infinibandeventfs virtual filesystem. This removes a healthy chunk of code and also deletes the last in-tree user of get_empty_filp(), which makes Christoph Hellwig happy. Tested with an app that uses the ib_uverbs event file interfaces. Signed-off-by: Roland Dreier <rolandd@cisco.com> --- drivers/infiniband/core/uverbs_main.c | 76 ++++----------------------------- 1 files changed, 9 insertions(+), 67 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 7c2ac39..7629566 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -40,11 +40,10 @@ #include <linux/init.h> #include <linux/device.h> #include <linux/err.h> -#include <linux/fs.h> #include <linux/poll.h> #include <linux/file.h> -#include <linux/mount.h> #include <linux/cdev.h> +#include <linux/anon_inodes.h> #include <asm/uaccess.h> @@ -54,8 +53,6 @@ MODULE_AUTHOR("Roland Dreier"); MODULE_DESCRIPTION("InfiniBand userspace verbs access"); MODULE_LICENSE("Dual BSD/GPL"); -#define INFINIBANDEVENTFS_MAGIC 0x49426576 /* "IBev" */ - enum { IB_UVERBS_MAJOR = 231, IB_UVERBS_BASE_MINOR = 192, @@ -112,8 +109,6 @@ static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file, [IB_USER_VERBS_CMD_DESTROY_SRQ] = ib_uverbs_destroy_srq, }; -static struct vfsmount *uverbs_event_mnt; - static void ib_uverbs_add_one(struct ib_device *device); static void ib_uverbs_remove_one(struct ib_device *device); @@ -495,6 +490,7 @@ struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file, int is_async, int *fd) { struct ib_uverbs_event_file *ev_file; + struct inode *inode; struct file *filp; int ret; @@ -510,37 +506,19 @@ struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file, ev_file->async_queue = NULL; ev_file->is_async = is_async; - *fd = get_unused_fd(); - if (*fd < 0) { - ret = *fd; - goto err; - } - - filp = get_empty_filp(); - if (!filp) { - ret = -ENFILE; - goto err_fd; - } - - ev_file->file = filp; - /* * fops_get() can't fail here, because we're coming from a * system call on a uverbs file, which will already have a * module reference. */ - filp->f_op = fops_get(&uverbs_event_fops); - filp->f_path.mnt = mntget(uverbs_event_mnt); - filp->f_path.dentry = dget(uverbs_event_mnt->mnt_root); - filp->f_mapping = filp->f_path.dentry->d_inode->i_mapping; - filp->f_flags = O_RDONLY; - filp->f_mode = FMODE_READ; - filp->private_data = ev_file; + ret = anon_inode_allocfd(fd, &inode, &filp, "[uverbs-event]", + fops_get(&uverbs_event_fops), ev_file); + if (ret) + goto err; - return filp; + ev_file->file = filp; -err_fd: - put_unused_fd(*fd); + return filp; err: kfree(ev_file); @@ -817,21 +795,6 @@ static void ib_uverbs_remove_one(struct ib_device *device) kfree(uverbs_dev); } -static int uverbs_event_get_sb(struct file_system_type *fs_type, int flags, - const char *dev_name, void *data, - struct vfsmount *mnt) -{ - return get_sb_pseudo(fs_type, "infinibandevent:", NULL, - INFINIBANDEVENTFS_MAGIC, mnt); -} - -static struct file_system_type uverbs_event_fs = { - /* No owner field so module can be unloaded */ - .name = "infinibandeventfs", - .get_sb = uverbs_event_get_sb, - .kill_sb = kill_litter_super -}; - static int __init ib_uverbs_init(void) { int ret; @@ -858,33 +821,14 @@ static int __init ib_uverbs_init(void) goto out_class; } - ret = register_filesystem(&uverbs_event_fs); - if (ret) { - printk(KERN_ERR "user_verbs: couldn't register infinibandeventfs\n"); - goto out_class; - } - - uverbs_event_mnt = kern_mount(&uverbs_event_fs); - if (IS_ERR(uverbs_event_mnt)) { - ret = PTR_ERR(uverbs_event_mnt); - printk(KERN_ERR "user_verbs: couldn't mount infinibandeventfs\n"); - goto out_fs; - } - ret = ib_register_client(&uverbs_client); if (ret) { printk(KERN_ERR "user_verbs: couldn't register client\n"); - goto out_mnt; + goto out_class; } return 0; -out_mnt: - mntput(uverbs_event_mnt); - -out_fs: - unregister_filesystem(&uverbs_event_fs); - out_class: class_destroy(uverbs_class); @@ -898,8 +842,6 @@ out: static void __exit ib_uverbs_cleanup(void) { ib_unregister_client(&uverbs_client); - mntput(uverbs_event_mnt); - unregister_filesystem(&uverbs_event_fs); class_destroy(uverbs_class); unregister_chrdev_region(IB_UVERBS_BASE_DEV, IB_UVERBS_MAX_DEVICES); idr_destroy(&ib_uverbs_pd_idr); -- 1.5.4.2 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-27 19:16 ` [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() Roland Dreier 2008-02-27 19:18 ` [PATCH 2/2] IB/uverbs: Use anonymous inodes instead of private filesystem Roland Dreier @ 2008-02-27 19:20 ` Avi Kivity 2008-02-27 19:41 ` Davide Libenzi 2008-03-06 15:14 ` Al Viro 3 siblings, 0 replies; 19+ messages in thread From: Avi Kivity @ 2008-02-27 19:20 UTC (permalink / raw) To: Roland Dreier Cc: Christoph Hellwig, linux-fsdevel, linux-kernel, Davide Libenzi, kvm-devel, akpm Roland Dreier wrote: > The anonymous inodes interface anon_inode_getfd() calls fd_install() > for the newly created fd, which does not work for some use cases where > the caller must do futher initialization before exposing the file to > userspace. This is also probably not the safest interface, since the > caller must be sure that it is OK if userspace closes the fd before > anon_inode_getfd() even returns. > > Therefore, change the anonymous inodes interface so that the caller is > responsible for calling fd_install(), and change the name of the > function from anon_inode_getfd() to anon_inode_allocfd() so that any > code using the old interface breaks at compilation rather than failing > in a strange way. Fix up all the in-kernel users to use the new > interface. > > The kvm changes are Acked-by: Avi Kivity <avi@qumranet.com> -- Any sufficiently difficult bug is indistinguishable from a feature. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-27 19:16 ` [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() Roland Dreier 2008-02-27 19:18 ` [PATCH 2/2] IB/uverbs: Use anonymous inodes instead of private filesystem Roland Dreier 2008-02-27 19:20 ` [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() Avi Kivity @ 2008-02-27 19:41 ` Davide Libenzi 2008-02-27 20:14 ` Roland Dreier 2008-03-06 15:14 ` Al Viro 3 siblings, 1 reply; 19+ messages in thread From: Davide Libenzi @ 2008-02-27 19:41 UTC (permalink / raw) To: Roland Dreier Cc: Christoph Hellwig, linux-fsdevel, Linux Kernel Mailing List, Davide Libenzi, Avi Kivity, kvm-devel, Andrew Morton, Al Viro [CC-ing Al too] On Wed, 27 Feb 2008, Roland Dreier wrote: > The anonymous inodes interface anon_inode_getfd() calls fd_install() > for the newly created fd, which does not work for some use cases where > the caller must do futher initialization before exposing the file to > userspace. This is also probably not the safest interface, since the > caller must be sure that it is OK if userspace closes the fd before > anon_inode_getfd() even returns. I believe Al changed the interface to not give out inode* and file*, *and* call fd_install() inside it. I'd slightly prefer Al version, although I don't see any major problems in this one too. - Davide ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-27 19:41 ` Davide Libenzi @ 2008-02-27 20:14 ` Roland Dreier 2008-02-27 20:30 ` Davide Libenzi 0 siblings, 1 reply; 19+ messages in thread From: Roland Dreier @ 2008-02-27 20:14 UTC (permalink / raw) To: Davide Libenzi Cc: Christoph Hellwig, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro > > The anonymous inodes interface anon_inode_getfd() calls fd_install() > > for the newly created fd, which does not work for some use cases where > > the caller must do futher initialization before exposing the file to > > userspace. This is also probably not the safest interface, since the > > caller must be sure that it is OK if userspace closes the fd before > > anon_inode_getfd() even returns. > > I believe Al changed the interface to not give out inode* and file*, *and* > call fd_install() inside it. I'd slightly prefer Al version, although I > don't see any major problems in this one too. Any pointer to that patch? A web search for "viro" and "anon_inode_getfd" doesn't turn up anything likely looking. - R. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-27 20:14 ` Roland Dreier @ 2008-02-27 20:30 ` Davide Libenzi 2008-02-27 21:05 ` Roland Dreier 0 siblings, 1 reply; 19+ messages in thread From: Davide Libenzi @ 2008-02-27 20:30 UTC (permalink / raw) To: Roland Dreier Cc: Christoph Hellwig, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro On Wed, 27 Feb 2008, Roland Dreier wrote: > > > The anonymous inodes interface anon_inode_getfd() calls fd_install() > > > for the newly created fd, which does not work for some use cases where > > > the caller must do futher initialization before exposing the file to > > > userspace. This is also probably not the safest interface, since the > > > caller must be sure that it is OK if userspace closes the fd before > > > anon_inode_getfd() even returns. > > > > I believe Al changed the interface to not give out inode* and file*, *and* > > call fd_install() inside it. I'd slightly prefer Al version, although I > > don't see any major problems in this one too. > > Any pointer to that patch? A web search for "viro" and > "anon_inode_getfd" doesn't turn up anything likely looking. It's inside his vfs git tree: http://git.kernel.org/?p=linux/kernel/git/viro/vfs-2.6.git;a=commit;h=49be4f8114e6ff0efdab10ebba2493fb67bc3034 I'm fine with both approaches. - Davide ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-27 20:30 ` Davide Libenzi @ 2008-02-27 21:05 ` Roland Dreier 2008-02-27 23:42 ` Roland Dreier 0 siblings, 1 reply; 19+ messages in thread From: Roland Dreier @ 2008-02-27 21:05 UTC (permalink / raw) To: Davide Libenzi Cc: Christoph Hellwig, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro > http://git.kernel.org/?p=linux/kernel/git/viro/vfs-2.6.git;a=commit;h=49be4f8114e6ff0efdab10ebba2493fb67bc3034 > > I'm fine with both approaches. Both ways are OK with me too, although Al's change leaves the trap in the anon_inode_getfd() in that all users have to keep in mind the race against close() from another thread. Also Al's change moves all documentation to __anon_inode_getfd() and leaves anon_inode_getfd() undocumented, which is a little suboptimal. With Al's change the 2/2 patch would have to change uverbs to use the __anon_inode_getfd() variant and change the fd_install() in uverbs to use fput(). If there is consensus that Al's patch will be merged for 2.6.26 I will write that patch and send it to Al to merge via his tree, so that get_empty_filp() can be unexported. - R. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-27 21:05 ` Roland Dreier @ 2008-02-27 23:42 ` Roland Dreier 2008-02-28 7:52 ` Avi Kivity ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Roland Dreier @ 2008-02-27 23:42 UTC (permalink / raw) To: Davide Libenzi Cc: Christoph Hellwig, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro > http://git.kernel.org/?p=linux/kernel/git/viro/vfs-2.6.git;a=commit;h=49be4f8114e6ff0efdab10ebba2493fb67bc3034 Actually, looking closer at the kvm changes here, I think that create_vcpu_fd() needs the same treatment as kvm_dev_ioctl_create_vm() gets in the patch because of the race I mentioned in the changelog for my patch: otherwise kvm_vcpu_release() could drop the last reference to vcpu->kvm->filp before the get_file() gets an extra reference. I'm beginning to think that moving the fd_install() out of anon_inode_getfd() really is worth it to make a safer interface. - R. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-27 23:42 ` Roland Dreier @ 2008-02-28 7:52 ` Avi Kivity 2008-02-28 20:04 ` Davide Libenzi 2008-03-06 15:14 ` Christoph Hellwig 2 siblings, 0 replies; 19+ messages in thread From: Avi Kivity @ 2008-02-28 7:52 UTC (permalink / raw) To: Roland Dreier Cc: Davide Libenzi, Christoph Hellwig, linux-fsdevel, Linux Kernel Mailing List, kvm-devel, Andrew Morton, Al Viro Roland Dreier wrote: > > http://git.kernel.org/?p=linux/kernel/git/viro/vfs-2.6.git;a=commit;h=49be4f8114e6ff0efdab10ebba2493fb67bc3034 > > Actually, looking closer at the kvm changes here, I think that > create_vcpu_fd() needs the same treatment as kvm_dev_ioctl_create_vm() > gets in the patch because of the race I mentioned in the changelog > for my patch: otherwise kvm_vcpu_release() could drop the last > reference to vcpu->kvm->filp before the get_file() gets an extra > reference. > > I'm beginning to think that moving the fd_install() out of > anon_inode_getfd() really is worth it to make a safer interface. > It makes is less usable, though (since the last step needs to be taken by the caller. We might add a int (*prepare_file)(...) parameter which anon_inode_getfd() uses to munge the file before installing it. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-27 23:42 ` Roland Dreier 2008-02-28 7:52 ` Avi Kivity @ 2008-02-28 20:04 ` Davide Libenzi 2008-02-28 20:24 ` Roland Dreier 2008-03-06 15:14 ` Christoph Hellwig 2 siblings, 1 reply; 19+ messages in thread From: Davide Libenzi @ 2008-02-28 20:04 UTC (permalink / raw) To: Roland Dreier Cc: Christoph Hellwig, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro On Wed, 27 Feb 2008, Roland Dreier wrote: > > http://git.kernel.org/?p=linux/kernel/git/viro/vfs-2.6.git;a=commit;h=49be4f8114e6ff0efdab10ebba2493fb67bc3034 > > Actually, looking closer at the kvm changes here, I think that > create_vcpu_fd() needs the same treatment as kvm_dev_ioctl_create_vm() > gets in the patch because of the race I mentioned in the changelog > for my patch: otherwise kvm_vcpu_release() could drop the last > reference to vcpu->kvm->filp before the get_file() gets an extra > reference. > > I'm beginning to think that moving the fd_install() out of > anon_inode_getfd() really is worth it to make a safer interface. If we let the caller call fd_install(), then it may be messed up WRT cleanup (fd, file, inode). How about removing the inode pointer handout altogether, and *doing* fd_install() inside anon_inode_getfd() like: if (pfile != NULL) { get_file(file); *pfile = file; } fd_install(fd, file); In this way, if the caller want the file* back, he gets the reference bumped before fd_install(). - Davide ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-28 20:04 ` Davide Libenzi @ 2008-02-28 20:24 ` Roland Dreier 2008-02-28 20:52 ` Davide Libenzi 0 siblings, 1 reply; 19+ messages in thread From: Roland Dreier @ 2008-02-28 20:24 UTC (permalink / raw) To: Davide Libenzi Cc: Christoph Hellwig, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro > If we let the caller call fd_install(), then it may be messed up WRT > cleanup (fd, file, inode). Yes, that is a tiny bit tricky (need to call put_unused_fd() if you don't install the fd). > How about removing the inode pointer handout altogether, and *doing* > fd_install() inside anon_inode_getfd() like: > > if (pfile != NULL) { > get_file(file); > *pfile = file; > } > fd_install(fd, file); > > In this way, if the caller want the file* back, he gets the reference > bumped before fd_install(). I think that may be a bit cleaner than Al's approach, but it still leaves the same trap that create_vcpu_fd() falls into. The current code is: static int create_vcpu_fd(struct kvm_vcpu *vcpu) { int fd, r; struct inode *inode; struct file *file; r = anon_inode_getfd(&fd, &inode, &file, "kvm-vcpu", &kvm_vcpu_fops, vcpu); if (r) return r; atomic_inc(&vcpu->kvm->filp->f_count); return fd; } and with your proposal, the natural way to write that becomes: static int create_vcpu_fd(struct kvm_vcpu *vcpu) { int fd, r; r = anon_inode_getfd(&fd, NULL, "kvm-vcpu", &kvm_vcpu_fops, vcpu); if (r) return r; atomic_inc(&vcpu->kvm->filp->f_count); return fd; } which still has the same bug. Maybe a good way to handle this is just to make the get_file() not optional. I dunno... I feel like we've spent more discussion on this point than it deserves, so someone should just make a decision and I'll adapt the ib_uverbs code to work with whatever it is. - R. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-28 20:24 ` Roland Dreier @ 2008-02-28 20:52 ` Davide Libenzi 2008-03-05 9:32 ` [kvm-devel] " Avi Kivity 0 siblings, 1 reply; 19+ messages in thread From: Davide Libenzi @ 2008-02-28 20:52 UTC (permalink / raw) To: Roland Dreier Cc: Christoph Hellwig, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro On Thu, 28 Feb 2008, Roland Dreier wrote: > > If we let the caller call fd_install(), then it may be messed up WRT > > cleanup (fd, file, inode). > > Yes, that is a tiny bit tricky (need to call put_unused_fd() if you > don't install the fd). > > > How about removing the inode pointer handout altogether, and *doing* > > fd_install() inside anon_inode_getfd() like: > > > > if (pfile != NULL) { > > get_file(file); > > *pfile = file; > > } > > fd_install(fd, file); > > > > In this way, if the caller want the file* back, he gets the reference > > bumped before fd_install(). > > I think that may be a bit cleaner than Al's approach, but it still > leaves the same trap that create_vcpu_fd() falls into. The current > code is: > > static int create_vcpu_fd(struct kvm_vcpu *vcpu) > { > int fd, r; > struct inode *inode; > struct file *file; > > r = anon_inode_getfd(&fd, &inode, &file, > "kvm-vcpu", &kvm_vcpu_fops, vcpu); > if (r) > return r; > atomic_inc(&vcpu->kvm->filp->f_count); > return fd; > } > > and with your proposal, the natural way to write that becomes: > > static int create_vcpu_fd(struct kvm_vcpu *vcpu) > { > int fd, r; > > r = anon_inode_getfd(&fd, NULL, > "kvm-vcpu", &kvm_vcpu_fops, vcpu); > if (r) > return r; > atomic_inc(&vcpu->kvm->filp->f_count); > return fd; > } I don't know KVM code, but can't the "private_data" setup be completed before calling anon_inode_getfd()? Or ... static int create_vcpu_fd(struct kvm_vcpu *vcpu) { int fd, r; get_file(vcpu->kvm->filp); r = anon_inode_getfd(&fd, NULL, "kvm-vcpu", &kvm_vcpu_fops, vcpu); if (r) { fput(vcpu->kvm->filp); return r; } return fd; } Hmm...? - Davide ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-28 20:52 ` Davide Libenzi @ 2008-03-05 9:32 ` Avi Kivity 0 siblings, 0 replies; 19+ messages in thread From: Avi Kivity @ 2008-03-05 9:32 UTC (permalink / raw) To: Davide Libenzi Cc: Roland Dreier, kvm-devel, Linux Kernel Mailing List, Al Viro, linux-fsdevel, Andrew Morton, Christoph Hellwig Davide Libenzi wrote: >> I think that may be a bit cleaner than Al's approach, but it still >> leaves the same trap that create_vcpu_fd() falls into. The current >> code is: >> >> static int create_vcpu_fd(struct kvm_vcpu *vcpu) >> { >> int fd, r; >> struct inode *inode; >> struct file *file; >> >> r = anon_inode_getfd(&fd, &inode, &file, >> "kvm-vcpu", &kvm_vcpu_fops, vcpu); >> if (r) >> return r; >> atomic_inc(&vcpu->kvm->filp->f_count); >> return fd; >> } >> >> and with your proposal, the natural way to write that becomes: >> >> static int create_vcpu_fd(struct kvm_vcpu *vcpu) >> { >> int fd, r; >> >> r = anon_inode_getfd(&fd, NULL, >> "kvm-vcpu", &kvm_vcpu_fops, vcpu); >> if (r) >> return r; >> atomic_inc(&vcpu->kvm->filp->f_count); >> return fd; >> } >> > > I don't know KVM code, but can't the "private_data" setup be completed > before calling anon_inode_getfd()? > Creating the fd is the last thing done when creating a vcpu. > Or ... > > static int create_vcpu_fd(struct kvm_vcpu *vcpu) > { > int fd, r; > > get_file(vcpu->kvm->filp); > r = anon_inode_getfd(&fd, NULL, > "kvm-vcpu", &kvm_vcpu_fops, vcpu); > if (r) { > fput(vcpu->kvm->filp); > return r; > } > return fd; > } > This seems reasonable. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-27 23:42 ` Roland Dreier 2008-02-28 7:52 ` Avi Kivity 2008-02-28 20:04 ` Davide Libenzi @ 2008-03-06 15:14 ` Christoph Hellwig 2008-03-09 2:45 ` Roland Dreier 2008-03-09 2:46 ` Roland Dreier 2 siblings, 2 replies; 19+ messages in thread From: Christoph Hellwig @ 2008-03-06 15:14 UTC (permalink / raw) To: Roland Dreier Cc: Davide Libenzi, Christoph Hellwig, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro On Wed, Feb 27, 2008 at 03:42:42PM -0800, Roland Dreier wrote: > > http://git.kernel.org/?p=linux/kernel/git/viro/vfs-2.6.git;a=commit;h=49be4f8114e6ff0efdab10ebba2493fb67bc3034 > > Actually, looking closer at the kvm changes here, I think that > create_vcpu_fd() needs the same treatment as kvm_dev_ioctl_create_vm() > gets in the patch because of the race I mentioned in the changelog > for my patch: otherwise kvm_vcpu_release() could drop the last > reference to vcpu->kvm->filp before the get_file() gets an extra > reference. Actually using the file pointer for reference counting in kvm is quite stupid and risky, it should use a normal reference count instead. See untested patch below. > I'm beginning to think that moving the fd_install() out of > anon_inode_getfd() really is worth it to make a safer interface. No, the best interface is one where the driver doesn't even see inode or file. Of course that's not actually possible in a few nasty cases like the infiniband code, and for those it might be better to do the fd_isntall themselves. Index: linux-2.6/include/linux/kvm_host.h =================================================================== --- linux-2.6.orig/include/linux/kvm_host.h 2008-02-23 20:28:14.000000000 +0100 +++ linux-2.6/include/linux/kvm_host.h 2008-02-23 20:36:20.000000000 +0100 @@ -113,7 +113,7 @@ struct kvm { KVM_PRIVATE_MEM_SLOTS]; struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; struct list_head vm_list; - struct file *filp; + int refcount; struct kvm_io_bus mmio_bus; struct kvm_io_bus pio_bus; struct kvm_vm_stat stat; Index: linux-2.6/virt/kvm/kvm_main.c =================================================================== --- linux-2.6.orig/virt/kvm/kvm_main.c 2008-02-23 20:29:12.000000000 +0100 +++ linux-2.6/virt/kvm/kvm_main.c 2008-02-24 02:34:44.000000000 +0100 @@ -169,6 +169,8 @@ static struct kvm *kvm_create_vm(void) kvm_io_bus_init(&kvm->pio_bus); mutex_init(&kvm->lock); kvm_io_bus_init(&kvm->mmio_bus); + kvm->refcount = 1; + spin_lock(&kvm_lock); list_add(&kvm->vm_list, &vm_list); spin_unlock(&kvm_lock); @@ -201,11 +203,16 @@ void kvm_free_physmem(struct kvm *kvm) kvm_free_physmem_slot(&kvm->memslots[i], NULL); } -static void kvm_destroy_vm(struct kvm *kvm) +static void kvm_put_vm(struct kvm *kvm) { struct mm_struct *mm = kvm->mm; spin_lock(&kvm_lock); + if (--kvm->refcount) { + spin_lock(&kvm_lock); + return; + } + list_del(&kvm->vm_list); spin_unlock(&kvm_lock); kvm_io_bus_destroy(&kvm->pio_bus); @@ -216,9 +223,7 @@ static void kvm_destroy_vm(struct kvm *k static int kvm_vm_release(struct inode *inode, struct file *filp) { - struct kvm *kvm = filp->private_data; - - kvm_destroy_vm(kvm); + kvm_put_vm(filp->private_data); return 0; } @@ -700,7 +705,7 @@ static int kvm_vcpu_release(struct inode { struct kvm_vcpu *vcpu = filp->private_data; - fput(vcpu->kvm->filp); + kvm_put_vm(vcpu->kvm); return 0; } @@ -724,7 +729,16 @@ static int create_vcpu_fd(struct kvm_vcp "kvm-vcpu", &kvm_vcpu_fops, vcpu); if (r) return r; - atomic_inc(&vcpu->kvm->filp->f_count); + + /* + * It is guaranteed that the refcount is non-zero here because + * this function is only called as ioctl method on an open + * file that has a reference to it. + */ + spin_lock(&kvm_lock); + vcpu->kvm->refcount++; + spin_unlock(&kvm_lock); + return fd; } @@ -1023,12 +1037,10 @@ static int kvm_dev_ioctl_create_vm(void) return PTR_ERR(kvm); r = anon_inode_getfd(&fd, &inode, &file, "kvm-vm", &kvm_vm_fops, kvm); if (r) { - kvm_destroy_vm(kvm); + kvm_put_vm(kvm); return r; } - kvm->filp = file; - return fd; } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-03-06 15:14 ` Christoph Hellwig @ 2008-03-09 2:45 ` Roland Dreier 2008-03-17 10:40 ` Christoph Hellwig 2008-03-09 2:46 ` Roland Dreier 1 sibling, 1 reply; 19+ messages in thread From: Roland Dreier @ 2008-03-09 2:45 UTC (permalink / raw) To: Christoph Hellwig Cc: Davide Libenzi, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro > spin_lock(&kvm_lock); > + if (--kvm->refcount) { > + spin_lock(&kvm_lock); obvious typo here... ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-03-09 2:45 ` Roland Dreier @ 2008-03-17 10:40 ` Christoph Hellwig 2008-03-17 10:48 ` Avi Kivity 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2008-03-17 10:40 UTC (permalink / raw) To: Roland Dreier Cc: Christoph Hellwig, Davide Libenzi, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro On Sat, Mar 08, 2008 at 06:45:34PM -0800, Roland Dreier wrote: > > spin_lock(&kvm_lock); > > + if (--kvm->refcount) { > > + spin_lock(&kvm_lock); > > obvious typo here... Indeed. Any comments from the kvm developers in this approach? The current multi-level file refcounting seems rather bogus so I'd like to make some progress on this. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-03-17 10:40 ` Christoph Hellwig @ 2008-03-17 10:48 ` Avi Kivity 0 siblings, 0 replies; 19+ messages in thread From: Avi Kivity @ 2008-03-17 10:48 UTC (permalink / raw) To: Christoph Hellwig Cc: Roland Dreier, Davide Libenzi, linux-fsdevel, Linux Kernel Mailing List, kvm-devel, Andrew Morton, Al Viro Christoph Hellwig wrote: > On Sat, Mar 08, 2008 at 06:45:34PM -0800, Roland Dreier wrote: > >> > spin_lock(&kvm_lock); >> > + if (--kvm->refcount) { >> > + spin_lock(&kvm_lock); >> >> obvious typo here... >> > > > Indeed. Any comments from the kvm developers in this approach? The > current multi-level file refcounting seems rather bogus so I'd like > to make some progress on this. > > I'm okay with switching away from the file-based refcounts to a refcount embedded in the kvm structures, though I'm not particularly thrilled by it. Any reason not to use krefs for this? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-03-06 15:14 ` Christoph Hellwig 2008-03-09 2:45 ` Roland Dreier @ 2008-03-09 2:46 ` Roland Dreier 1 sibling, 0 replies; 19+ messages in thread From: Roland Dreier @ 2008-03-09 2:46 UTC (permalink / raw) To: Christoph Hellwig Cc: Davide Libenzi, linux-fsdevel, Linux Kernel Mailing List, Avi Kivity, kvm-devel, Andrew Morton, Al Viro > No, the best interface is one where the driver doesn't even see > inode or file. Of course that's not actually possible in a few > nasty cases like the infiniband code, and for those it might be > better to do the fd_isntall themselves. Yeah, I've been thinking about this some more, and I think the IB case can be cleaned up to use the current anonfd interface (and in fact ignore the file pointer and inode it gets back). I'll try to write a patch this week. - R. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() 2008-02-27 19:16 ` [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() Roland Dreier ` (2 preceding siblings ...) 2008-02-27 19:41 ` Davide Libenzi @ 2008-03-06 15:14 ` Al Viro 3 siblings, 0 replies; 19+ messages in thread From: Al Viro @ 2008-03-06 15:14 UTC (permalink / raw) To: Roland Dreier Cc: Christoph Hellwig, linux-fsdevel, linux-kernel, Davide Libenzi, Avi Kivity, kvm-devel, akpm On Wed, Feb 27, 2008 at 11:16:02AM -0800, Roland Dreier wrote: > The anonymous inodes interface anon_inode_getfd() calls fd_install() > for the newly created fd, which does not work for some use cases where > the caller must do futher initialization before exposing the file to > userspace. This is also probably not the safest interface, since the > caller must be sure that it is OK if userspace closes the fd before > anon_inode_getfd() even returns. IMO that's a bad idea - majority of callers only care about fd and burdening them with fd_install() is simply wrong. Separate helper function... ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-03-17 10:48 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20080225191043.GA32342@lst.de> 2008-02-27 19:16 ` [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() Roland Dreier 2008-02-27 19:18 ` [PATCH 2/2] IB/uverbs: Use anonymous inodes instead of private filesystem Roland Dreier 2008-02-27 19:20 ` [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd() Avi Kivity 2008-02-27 19:41 ` Davide Libenzi 2008-02-27 20:14 ` Roland Dreier 2008-02-27 20:30 ` Davide Libenzi 2008-02-27 21:05 ` Roland Dreier 2008-02-27 23:42 ` Roland Dreier 2008-02-28 7:52 ` Avi Kivity 2008-02-28 20:04 ` Davide Libenzi 2008-02-28 20:24 ` Roland Dreier 2008-02-28 20:52 ` Davide Libenzi 2008-03-05 9:32 ` [kvm-devel] " Avi Kivity 2008-03-06 15:14 ` Christoph Hellwig 2008-03-09 2:45 ` Roland Dreier 2008-03-17 10:40 ` Christoph Hellwig 2008-03-17 10:48 ` Avi Kivity 2008-03-09 2:46 ` Roland Dreier 2008-03-06 15:14 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).