LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/3] vfio/pci: ioeventfd support
@ 2018-02-28 20:14 Alex Williamson
  2018-02-28 20:14 ` [PATCH 1/3] vfio/pci: Pull BAR mapping setup from read-write path Alex Williamson
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Alex Williamson @ 2018-02-28 20:14 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, qemu-devel, alex.williamson, peterx, eric.auger, aik

A vfio ioeventfd will perform the pre-specified device write on
triggering of an eventfd.  When coupled with KVM ioeventfds, this
feature allows a VM to trap a device page for virtualization, while
also registering targeted ioeventfds to maintain performance of high
frequency register writes within the trapped range.  Much like the
existing interrupt eventfd/irqfd coupling, such writes can be handled
entirely in the host kernel.

The new VFIO device ioctl may be supported by any vfio bus driver,
including mdev drivers, but the implementation here only enables
vfio-pci.  This is intended as an acceleration path, bus drivers
may choose which regions to support and userspace should always
intend to fall back to non-accelerated handling when unavailable.

RFC->v1:
 * An arbitrary limit is added for the number of ioeventfds supported
   per device.  The intention is to set this high enough to allow any
   reasonable use case, but limit malicious user behavior.
 * Split patches, including adding a patch for endian neutral io reads
   and writes.  This should be a nop for little-endian and avoid
   redundant swap on big-endian, and hopefully resolves Alexey's
   comments regarding the endian nature of this interface.
 * Rebase to v4.16-rc3

Thanks,
Alex

---

Alex Williamson (3):
      vfio/pci: Pull BAR mapping setup from read-write path
      vfio/pci: Use endian neutral helpers
      vfio/pci: Add ioeventfd support


 drivers/vfio/pci/vfio_pci.c         |   34 ++++++
 drivers/vfio/pci/vfio_pci_private.h |   18 +++
 drivers/vfio/pci/vfio_pci_rdwr.c    |  188 +++++++++++++++++++++++++++++++----
 include/uapi/linux/vfio.h           |   27 +++++
 4 files changed, 247 insertions(+), 20 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] vfio/pci: Pull BAR mapping setup from read-write path
  2018-02-28 20:14 [PATCH 0/3] vfio/pci: ioeventfd support Alex Williamson
@ 2018-02-28 20:14 ` Alex Williamson
  2018-03-07  7:11   ` Peter Xu
  2018-03-13 12:21   ` Auger Eric
  2018-02-28 20:14 ` [PATCH 2/3] vfio/pci: Use endian neutral helpers Alex Williamson
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Alex Williamson @ 2018-02-28 20:14 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, qemu-devel, alex.williamson, peterx, eric.auger, aik

This creates a common helper that we'll use for ioeventfd setup.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci_rdwr.c |   39 ++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 357243d76f10..5f2b376dcebd 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -113,6 +113,30 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
 	return done;
 }
 
+static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	int ret;
+	void __iomem *io;
+
+	if (vdev->barmap[bar])
+		return 0;
+
+	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
+	if (ret)
+		return ret;
+
+	io = pci_iomap(pdev, bar, 0);
+	if (!io) {
+		pci_release_selected_regions(pdev, 1 << bar);
+		return -ENOMEM;
+	}
+
+	vdev->barmap[bar] = io;
+
+	return 0;
+}
+
 ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite)
 {
@@ -147,22 +171,13 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 		if (!io)
 			return -ENOMEM;
 		x_end = end;
-	} else if (!vdev->barmap[bar]) {
-		int ret;
-
-		ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
+	} else {
+		int ret = vfio_pci_setup_barmap(vdev, bar);
 		if (ret)
 			return ret;
 
-		io = pci_iomap(pdev, bar, 0);
-		if (!io) {
-			pci_release_selected_regions(pdev, 1 << bar);
-			return -ENOMEM;
-		}
-
-		vdev->barmap[bar] = io;
-	} else
 		io = vdev->barmap[bar];
+	}
 
 	if (bar == vdev->msix_bar) {
 		x_start = vdev->msix_offset;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/3] vfio/pci: Use endian neutral helpers
  2018-02-28 20:14 [PATCH 0/3] vfio/pci: ioeventfd support Alex Williamson
  2018-02-28 20:14 ` [PATCH 1/3] vfio/pci: Pull BAR mapping setup from read-write path Alex Williamson
@ 2018-02-28 20:14 ` Alex Williamson
  2018-02-28 20:15 ` [PATCH 3/3] vfio/pci: Add ioeventfd support Alex Williamson
  2018-03-02  7:08 ` [PATCH 0/3] vfio/pci: " Tian, Kevin
  3 siblings, 0 replies; 14+ messages in thread
From: Alex Williamson @ 2018-02-28 20:14 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, qemu-devel, alex.williamson, peterx, eric.auger, aik

The iowriteXX/ioreadXX functions assume little endian hardware and
convert to little endian on a write and from little endian on a read.
We currently do our own explicit conversion to negate this.  Instead,
add some endian dependent defines to avoid all byte swaps.  There
should be no functional change other than big endian systems aren't
penalized with wasted swaps.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci_rdwr.c |   34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 5f2b376dcebd..925419e0f459 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -21,6 +21,24 @@
 
 #include "vfio_pci_private.h"
 
+#ifdef __LITTLE_ENDIAN
+#define vfio_ioread64	ioread64
+#define vfio_iowrite64	iowrite64
+#define vfio_ioread32	ioread32
+#define vfio_iowrite32	iowrite32
+#define vfio_ioread16	ioread16
+#define vfio_iowrite16	iowrite16
+#else
+#define vfio_ioread64	ioread64be
+#define vfio_iowrite64	iowrite64be
+#define vfio_ioread32	ioread32be
+#define vfio_iowrite32	iowrite32be
+#define vfio_ioread16	ioread16be
+#define vfio_iowrite16	iowrite16be
+#endif
+#define vfio_ioread8	ioread8
+#define vfio_iowrite8	iowrite8
+
 /*
  * Read or write from an __iomem region (MMIO or I/O port) with an excluded
  * range which is inaccessible.  The excluded range drops writes and fills
@@ -44,15 +62,15 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
 			fillable = 0;
 
 		if (fillable >= 4 && !(off % 4)) {
-			__le32 val;
+			u32 val;
 
 			if (iswrite) {
 				if (copy_from_user(&val, buf, 4))
 					return -EFAULT;
 
-				iowrite32(le32_to_cpu(val), io + off);
+				vfio_iowrite32(val, io + off);
 			} else {
-				val = cpu_to_le32(ioread32(io + off));
+				val = vfio_ioread32(io + off);
 
 				if (copy_to_user(buf, &val, 4))
 					return -EFAULT;
@@ -60,15 +78,15 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
 
 			filled = 4;
 		} else if (fillable >= 2 && !(off % 2)) {
-			__le16 val;
+			u16 val;
 
 			if (iswrite) {
 				if (copy_from_user(&val, buf, 2))
 					return -EFAULT;
 
-				iowrite16(le16_to_cpu(val), io + off);
+				vfio_iowrite16(val, io + off);
 			} else {
-				val = cpu_to_le16(ioread16(io + off));
+				val = vfio_ioread16(io + off);
 
 				if (copy_to_user(buf, &val, 2))
 					return -EFAULT;
@@ -82,9 +100,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
 				if (copy_from_user(&val, buf, 1))
 					return -EFAULT;
 
-				iowrite8(val, io + off);
+				vfio_iowrite8(val, io + off);
 			} else {
-				val = ioread8(io + off);
+				val = vfio_ioread8(io + off);
 
 				if (copy_to_user(buf, &val, 1))
 					return -EFAULT;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/3] vfio/pci: Add ioeventfd support
  2018-02-28 20:14 [PATCH 0/3] vfio/pci: ioeventfd support Alex Williamson
  2018-02-28 20:14 ` [PATCH 1/3] vfio/pci: Pull BAR mapping setup from read-write path Alex Williamson
  2018-02-28 20:14 ` [PATCH 2/3] vfio/pci: Use endian neutral helpers Alex Williamson
@ 2018-02-28 20:15 ` Alex Williamson
  2018-03-06  6:54   ` kbuild test robot
                     ` (2 more replies)
  2018-03-02  7:08 ` [PATCH 0/3] vfio/pci: " Tian, Kevin
  3 siblings, 3 replies; 14+ messages in thread
From: Alex Williamson @ 2018-02-28 20:15 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, qemu-devel, alex.williamson, peterx, eric.auger, aik

The ioeventfd here is actually irqfd handling of an ioeventfd such as
supported in KVM.  A user is able to pre-program a device write to
occur when the eventfd triggers.  This is yet another instance of
eventfd-irqfd triggering between KVM and vfio.  The impetus for this
is high frequency writes to pages which are virtualized in QEMU.
Enabling this near-direct write path for selected registers within
the virtualized page can improve performance and reduce overhead.
Specifically this is initially targeted at NVIDIA graphics cards where
the driver issues a write to an MMIO register within a virtualized
region in order to allow the MSI interrupt to re-trigger.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c         |   34 ++++++++++
 drivers/vfio/pci/vfio_pci_private.h |   18 +++++
 drivers/vfio/pci/vfio_pci_rdwr.c    |  115 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h           |   27 ++++++++
 4 files changed, 194 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b0f759476900..ad18ed266dc0 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -305,6 +305,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_dummy_resource *dummy_res, *tmp;
+	struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
 	int i, bar;
 
 	/* Stop the device from further DMA */
@@ -314,6 +315,15 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 				VFIO_IRQ_SET_ACTION_TRIGGER,
 				vdev->irq_type, 0, 0, NULL);
 
+	/* Device closed, don't need mutex here */
+	list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
+				 &vdev->ioeventfds_list, next) {
+		vfio_virqfd_disable(&ioeventfd->virqfd);
+		list_del(&ioeventfd->next);
+		kfree(ioeventfd);
+	}
+	vdev->ioeventfds_nr = 0;
+
 	vdev->virq_disabled = false;
 
 	for (i = 0; i < vdev->num_regions; i++)
@@ -1012,6 +1022,28 @@ static long vfio_pci_ioctl(void *device_data,
 
 		kfree(groups);
 		return ret;
+	} else if (cmd == VFIO_DEVICE_IOEVENTFD) {
+		struct vfio_device_ioeventfd ioeventfd;
+		int count;
+
+		minsz = offsetofend(struct vfio_device_ioeventfd, fd);
+
+		if (copy_from_user(&ioeventfd, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (ioeventfd.argsz < minsz)
+			return -EINVAL;
+
+		if (ioeventfd.flags & ~VFIO_DEVICE_IOEVENTFD_SIZE_MASK)
+			return -EINVAL;
+
+		count = ioeventfd.flags & VFIO_DEVICE_IOEVENTFD_SIZE_MASK;
+
+		if (hweight8(count) != 1 || ioeventfd.fd < -1)
+			return -EINVAL;
+
+		return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
+					  ioeventfd.data, count, ioeventfd.fd);
 	}
 
 	return -ENOTTY;
@@ -1174,6 +1206,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	mutex_init(&vdev->igate);
 	spin_lock_init(&vdev->irqlock);
+	mutex_init(&vdev->ioeventfds_lock);
+	INIT_LIST_HEAD(&vdev->ioeventfds_list);
 
 	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
 	if (ret) {
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index f561ac1c78a0..33a48c3ba11c 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -29,6 +29,18 @@
 #define PCI_CAP_ID_INVALID		0xFF	/* default raw access */
 #define PCI_CAP_ID_INVALID_VIRT		0xFE	/* default virt access */
 
+/* Cap maximum number of ioeventfds per device (arbitrary) */
+#define VFIO_PCI_IOEVENTFD_MAX		1000
+
+struct vfio_pci_ioeventfd {
+	struct list_head	next;
+	struct virqfd		*virqfd;
+	loff_t			pos;
+	uint64_t		data;
+	int			bar;
+	int			count;
+};
+
 struct vfio_pci_irq_ctx {
 	struct eventfd_ctx	*trigger;
 	struct virqfd		*unmask;
@@ -92,9 +104,12 @@ struct vfio_pci_device {
 	bool			nointx;
 	struct pci_saved_state	*pci_saved_state;
 	int			refcnt;
+	int			ioeventfds_nr;
 	struct eventfd_ctx	*err_trigger;
 	struct eventfd_ctx	*req_trigger;
 	struct list_head	dummy_resources_list;
+	struct mutex		ioeventfds_lock;
+	struct list_head	ioeventfds_list;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
@@ -120,6 +135,9 @@ extern ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 extern ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
 			       size_t count, loff_t *ppos, bool iswrite);
 
+extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
+			       uint64_t data, int count, int fd);
+
 extern int vfio_pci_init_perm_bits(void);
 extern void vfio_pci_uninit_perm_bits(void);
 
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 925419e0f459..43e4b5112337 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -17,6 +17,7 @@
 #include <linux/pci.h>
 #include <linux/uaccess.h>
 #include <linux/io.h>
+#include <linux/vfio.h>
 #include <linux/vgaarb.h>
 
 #include "vfio_pci_private.h"
@@ -275,3 +276,117 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
 
 	return done;
 }
+
+#define VFIO_PCI_IOEVENTFD_HANDLER(size)				\
+static int vfio_pci_ioeventfd_handler##size(void *opaque, void *data)	\
+{									\
+	vfio_iowrite##size((unsigned long)data, opaque);		\
+	return 0;							\
+}
+
+#ifdef iowrite64
+VFIO_PCI_IOEVENTFD_HANDLER(64)
+#endif
+VFIO_PCI_IOEVENTFD_HANDLER(32)
+VFIO_PCI_IOEVENTFD_HANDLER(16)
+VFIO_PCI_IOEVENTFD_HANDLER(8)
+
+long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
+			uint64_t data, int count, int fd)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
+	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
+	struct vfio_pci_ioeventfd *ioeventfd;
+	int (*handler)(void *addr, void *value);
+
+	/* Only support ioeventfds into BARs */
+	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
+		return -EINVAL;
+
+	if (pos + count > pci_resource_len(pdev, bar))
+		return -EINVAL;
+
+	/* Disallow ioeventfds working around MSI-X table writes */
+	if (bar == vdev->msix_bar &&
+	    !(pos + count <= vdev->msix_offset ||
+	      pos >= vdev->msix_offset + vdev->msix_size))
+		return -EINVAL;
+
+	switch (count) {
+	case 1:
+		handler = &vfio_pci_ioeventfd_handler8;
+		break;
+	case 2:
+		handler = &vfio_pci_ioeventfd_handler16;
+		break;
+	case 4:
+		handler = &vfio_pci_ioeventfd_handler32;
+		break;
+#ifdef iowrite64
+	case 8:
+		handler = &vfio_pci_ioeventfd_handler64;
+		break;
+#endif
+	default:
+		return -EINVAL;
+	}
+
+	ret = vfio_pci_setup_barmap(vdev, bar);
+	if (ret)
+		return ret;
+
+	mutex_lock(&vdev->ioeventfds_lock);
+
+	list_for_each_entry(ioeventfd, &vdev->ioeventfds_list, next) {
+		if (ioeventfd->pos == pos && ioeventfd->bar == bar &&
+		    ioeventfd->data == data && ioeventfd->count == count) {
+			if (fd == -1) {
+				vfio_virqfd_disable(&ioeventfd->virqfd);
+				list_del(&ioeventfd->next);
+				vdev->ioeventfds_nr--;
+				kfree(ioeventfd);
+				ret = 0;
+			} else
+				ret = -EEXIST;
+
+			goto out_unlock;
+		}
+	}
+
+	if (fd < 0) {
+		ret = -ENODEV;
+		goto out_unlock;
+	}
+
+	if (vdev->ioeventfds_nr >= VFIO_PCI_IOEVENTFD_MAX) {
+		ret = -ENOSPC;
+		goto out_unlock;
+	}
+
+	ioeventfd = kzalloc(sizeof(*ioeventfd), GFP_KERNEL);
+	if (!ioeventfd) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	ioeventfd->pos = pos;
+	ioeventfd->bar = bar;
+	ioeventfd->data = data;
+	ioeventfd->count = count;
+
+	ret = vfio_virqfd_enable(vdev->barmap[bar] + pos, handler, NULL,
+				 (void *)data, &ioeventfd->virqfd, fd);
+	if (ret) {
+		kfree(ioeventfd);
+		goto out_unlock;
+	}
+
+	list_add(&ioeventfd->next, &vdev->ioeventfds_list);
+	vdev->ioeventfds_nr++;
+
+out_unlock:
+	mutex_unlock(&vdev->ioeventfds_lock);
+
+	return ret;
+}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index c74372163ed2..7e9d76203e86 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -575,6 +575,33 @@ struct vfio_device_gfx_plane_info {
 
 #define VFIO_DEVICE_GET_GFX_DMABUF _IO(VFIO_TYPE, VFIO_BASE + 15)
 
+/**
+ * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 16,
+ *                              struct vfio_device_ioeventfd)
+ *
+ * Perform a write to the device at the specified device fd offset, with
+ * the specified data and width when the provided eventfd is triggered.
+ * vfio bus drivers may not support this for all regions, or at all.
+ * vfio-pci currently only enables support for BAR regions and excludes
+ * the MSI-X vector table.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_ioeventfd {
+	__u32	argsz;
+	__u32	flags;
+#define VFIO_DEVICE_IOEVENTFD_8		(1 << 0) /* 1-byte write */
+#define VFIO_DEVICE_IOEVENTFD_16	(1 << 1) /* 2-byte write */
+#define VFIO_DEVICE_IOEVENTFD_32	(1 << 2) /* 4-byte write */
+#define VFIO_DEVICE_IOEVENTFD_64	(1 << 3) /* 8-byte write */
+#define VFIO_DEVICE_IOEVENTFD_SIZE_MASK	(0xf)
+	__u64	offset;			/* device fd offset of write */
+	__u64	data;			/* data to be written */
+	__s32	fd;			/* -1 for de-assignment */
+};
+
+#define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE + 16)
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/3] vfio/pci: ioeventfd support
  2018-02-28 20:14 [PATCH 0/3] vfio/pci: ioeventfd support Alex Williamson
                   ` (2 preceding siblings ...)
  2018-02-28 20:15 ` [PATCH 3/3] vfio/pci: Add ioeventfd support Alex Williamson
@ 2018-03-02  7:08 ` Tian, Kevin
  2018-03-02 18:02   ` Alex Williamson
  3 siblings, 1 reply; 14+ messages in thread
From: Tian, Kevin @ 2018-03-02  7:08 UTC (permalink / raw)
  To: Alex Williamson, kvm; +Cc: linux-kernel, qemu-devel, peterx, eric.auger, aik

> From: Alex Williamson
> Sent: Thursday, March 1, 2018 4:15 AM
> 
> A vfio ioeventfd will perform the pre-specified device write on
> triggering of an eventfd.  When coupled with KVM ioeventfds, this
> feature allows a VM to trap a device page for virtualization, while
> also registering targeted ioeventfds to maintain performance of high
> frequency register writes within the trapped range.  Much like the
> existing interrupt eventfd/irqfd coupling, such writes can be handled
> entirely in the host kernel.
> 
> The new VFIO device ioctl may be supported by any vfio bus driver,
> including mdev drivers, but the implementation here only enables
> vfio-pci.  This is intended as an acceleration path, bus drivers
> may choose which regions to support and userspace should always
> intend to fall back to non-accelerated handling when unavailable.
> 

it's a nice feature! A curious question. Is it possible for mdev driver
to directly create ioeventfd on specified offset? Currently ioeventfd
requires quirks in Qemu, which must know the device detail to
create ioeventfd and then connect vfio and kvm together. However
mdev instance is more software defined thus I'm not sure whether 
asking Qemu to catch up quirk with underlying software logic could
be overwhelmed. Also in case of vendor driver emulating mdev
with same DID/VID as a real device, it might be difficult for Qemu
to figure out whether a vfio device is a real one or mdev one to
apply a mdev specific quirk. On the other hand, since vendor
driver knows all the logic constructing mdev, it would be more
convenient allowing vendor driver to directly create/destroy
ioeventfd on its demand?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] vfio/pci: ioeventfd support
  2018-03-02  7:08 ` [PATCH 0/3] vfio/pci: " Tian, Kevin
@ 2018-03-02 18:02   ` Alex Williamson
  2018-03-03  0:43     ` Tian, Kevin
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2018-03-02 18:02 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: kvm, linux-kernel, qemu-devel, peterx, eric.auger, aik

On Fri, 2 Mar 2018 07:08:51 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson
> > Sent: Thursday, March 1, 2018 4:15 AM
> > 
> > A vfio ioeventfd will perform the pre-specified device write on
> > triggering of an eventfd.  When coupled with KVM ioeventfds, this
> > feature allows a VM to trap a device page for virtualization, while
> > also registering targeted ioeventfds to maintain performance of high
> > frequency register writes within the trapped range.  Much like the
> > existing interrupt eventfd/irqfd coupling, such writes can be handled
> > entirely in the host kernel.
> > 
> > The new VFIO device ioctl may be supported by any vfio bus driver,
> > including mdev drivers, but the implementation here only enables
> > vfio-pci.  This is intended as an acceleration path, bus drivers
> > may choose which regions to support and userspace should always
> > intend to fall back to non-accelerated handling when unavailable.
> >   
> 
> it's a nice feature! A curious question. Is it possible for mdev driver
> to directly create ioeventfd on specified offset? Currently ioeventfd
> requires quirks in Qemu, which must know the device detail to
> create ioeventfd and then connect vfio and kvm together. However
> mdev instance is more software defined thus I'm not sure whether 
> asking Qemu to catch up quirk with underlying software logic could
> be overwhelmed. Also in case of vendor driver emulating mdev
> with same DID/VID as a real device, it might be difficult for Qemu
> to figure out whether a vfio device is a real one or mdev one to
> apply a mdev specific quirk. On the other hand, since vendor
> driver knows all the logic constructing mdev, it would be more
> convenient allowing vendor driver to directly create/destroy
> ioeventfd on its demand?

Are file descriptors the right interface between kernel components if
userspace is not involved in connecting them?  Seems like not.  vfio is
designed to be independent of KVM with the necessary interactions
between them orchestrated by userspace.  KVMGT is about the only
exception to that and TBH I'm not thrilled about extending that
further.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/3] vfio/pci: ioeventfd support
  2018-03-02 18:02   ` Alex Williamson
@ 2018-03-03  0:43     ` Tian, Kevin
  0 siblings, 0 replies; 14+ messages in thread
From: Tian, Kevin @ 2018-03-03  0:43 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, qemu-devel, peterx, eric.auger, aik

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Saturday, March 3, 2018 2:03 AM
> 
> On Fri, 2 Mar 2018 07:08:51 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson
> > > Sent: Thursday, March 1, 2018 4:15 AM
> > >
> > > A vfio ioeventfd will perform the pre-specified device write on
> > > triggering of an eventfd.  When coupled with KVM ioeventfds, this
> > > feature allows a VM to trap a device page for virtualization, while
> > > also registering targeted ioeventfds to maintain performance of high
> > > frequency register writes within the trapped range.  Much like the
> > > existing interrupt eventfd/irqfd coupling, such writes can be handled
> > > entirely in the host kernel.
> > >
> > > The new VFIO device ioctl may be supported by any vfio bus driver,
> > > including mdev drivers, but the implementation here only enables
> > > vfio-pci.  This is intended as an acceleration path, bus drivers
> > > may choose which regions to support and userspace should always
> > > intend to fall back to non-accelerated handling when unavailable.
> > >
> >
> > it's a nice feature! A curious question. Is it possible for mdev driver
> > to directly create ioeventfd on specified offset? Currently ioeventfd
> > requires quirks in Qemu, which must know the device detail to
> > create ioeventfd and then connect vfio and kvm together. However
> > mdev instance is more software defined thus I'm not sure whether
> > asking Qemu to catch up quirk with underlying software logic could
> > be overwhelmed. Also in case of vendor driver emulating mdev
> > with same DID/VID as a real device, it might be difficult for Qemu
> > to figure out whether a vfio device is a real one or mdev one to
> > apply a mdev specific quirk. On the other hand, since vendor
> > driver knows all the logic constructing mdev, it would be more
> > convenient allowing vendor driver to directly create/destroy
> > ioeventfd on its demand?
> 
> Are file descriptors the right interface between kernel components if
> userspace is not involved in connecting them?  Seems like not.  vfio is
> designed to be independent of KVM with the necessary interactions
> between them orchestrated by userspace.  KVMGT is about the only
> exception to that and TBH I'm not thrilled about extending that
> further.  Thanks,
> 

why do you think KVMGT is the only exception? Isn't above scenario
general enough to any mdev implementation, just as what you're
trying to solve, that if a trapped operation happens too frequently on
mdev then we may want to optimize it? I understand fd is not a good 
way in-between kernel components, just asking here in case you
have some good idea in mind. If adding mdev specific quirk in 
Qemu is the only way for similar optimization, I'm fine with it. One
thing which I was not sure though is still about the scenario where 
mdev is constructed same as a real device then what would happen
if Qemu applies a mdev-specific quirk to both sides - of course 
just a mental thinking, not sure whether it happens in real...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] vfio/pci: Add ioeventfd support
  2018-02-28 20:15 ` [PATCH 3/3] vfio/pci: Add ioeventfd support Alex Williamson
@ 2018-03-06  6:54   ` kbuild test robot
  2018-03-07  5:56   ` Peter Xu
  2018-03-13 13:12   ` Auger Eric
  2 siblings, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2018-03-06  6:54 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kbuild-all, kvm, linux-kernel, qemu-devel, alex.williamson,
	peterx, eric.auger, aik

Hi Alex,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.16-rc4 next-20180306]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Alex-Williamson/vfio-pci-Pull-BAR-mapping-setup-from-read-write-path/20180303-015851
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/vfio/pci/vfio_pci_rdwr.c:290:1: sparse: incorrect type in argument 2 (different address spaces) @@    expected void [noderef] <asn:2>*<noident> @@    got sn:2>*<noident> @@
   drivers/vfio/pci/vfio_pci_rdwr.c:290:1:    expected void [noderef] <asn:2>*<noident>
   drivers/vfio/pci/vfio_pci_rdwr.c:290:1:    got void *opaque
   drivers/vfio/pci/vfio_pci_rdwr.c:291:1: sparse: incorrect type in argument 2 (different address spaces) @@    expected void [noderef] <asn:2>*<noident> @@    got sn:2>*<noident> @@
   drivers/vfio/pci/vfio_pci_rdwr.c:291:1:    expected void [noderef] <asn:2>*<noident>
   drivers/vfio/pci/vfio_pci_rdwr.c:291:1:    got void *opaque
   drivers/vfio/pci/vfio_pci_rdwr.c:292:1: sparse: incorrect type in argument 2 (different address spaces) @@    expected void [noderef] <asn:2>*<noident> @@    got sn:2>*<noident> @@
   drivers/vfio/pci/vfio_pci_rdwr.c:292:1:    expected void [noderef] <asn:2>*<noident>
   drivers/vfio/pci/vfio_pci_rdwr.c:292:1:    got void *opaque
>> drivers/vfio/pci/vfio_pci_rdwr.c:378:52: sparse: incorrect type in argument 1 (different address spaces) @@    expected void *opaque @@    got void [noderef] <avoid *opaque @@
   drivers/vfio/pci/vfio_pci_rdwr.c:378:52:    expected void *opaque
   drivers/vfio/pci/vfio_pci_rdwr.c:378:52:    got void [noderef] <asn:2>*

vim +290 drivers/vfio/pci/vfio_pci_rdwr.c

   286	
   287	#ifdef iowrite64
   288	VFIO_PCI_IOEVENTFD_HANDLER(64)
   289	#endif
 > 290	VFIO_PCI_IOEVENTFD_HANDLER(32)
   291	VFIO_PCI_IOEVENTFD_HANDLER(16)
   292	VFIO_PCI_IOEVENTFD_HANDLER(8)
   293	
   294	long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
   295				uint64_t data, int count, int fd)
   296	{
   297		struct pci_dev *pdev = vdev->pdev;
   298		loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
   299		int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
   300		struct vfio_pci_ioeventfd *ioeventfd;
   301		int (*handler)(void *addr, void *value);
   302	
   303		/* Only support ioeventfds into BARs */
   304		if (bar > VFIO_PCI_BAR5_REGION_INDEX)
   305			return -EINVAL;
   306	
   307		if (pos + count > pci_resource_len(pdev, bar))
   308			return -EINVAL;
   309	
   310		/* Disallow ioeventfds working around MSI-X table writes */
   311		if (bar == vdev->msix_bar &&
   312		    !(pos + count <= vdev->msix_offset ||
   313		      pos >= vdev->msix_offset + vdev->msix_size))
   314			return -EINVAL;
   315	
   316		switch (count) {
   317		case 1:
   318			handler = &vfio_pci_ioeventfd_handler8;
   319			break;
   320		case 2:
   321			handler = &vfio_pci_ioeventfd_handler16;
   322			break;
   323		case 4:
   324			handler = &vfio_pci_ioeventfd_handler32;
   325			break;
   326	#ifdef iowrite64
   327		case 8:
   328			handler = &vfio_pci_ioeventfd_handler64;
   329			break;
   330	#endif
   331		default:
   332			return -EINVAL;
   333		}
   334	
   335		ret = vfio_pci_setup_barmap(vdev, bar);
   336		if (ret)
   337			return ret;
   338	
   339		mutex_lock(&vdev->ioeventfds_lock);
   340	
   341		list_for_each_entry(ioeventfd, &vdev->ioeventfds_list, next) {
   342			if (ioeventfd->pos == pos && ioeventfd->bar == bar &&
   343			    ioeventfd->data == data && ioeventfd->count == count) {
   344				if (fd == -1) {
   345					vfio_virqfd_disable(&ioeventfd->virqfd);
   346					list_del(&ioeventfd->next);
   347					vdev->ioeventfds_nr--;
   348					kfree(ioeventfd);
   349					ret = 0;
   350				} else
   351					ret = -EEXIST;
   352	
   353				goto out_unlock;
   354			}
   355		}
   356	
   357		if (fd < 0) {
   358			ret = -ENODEV;
   359			goto out_unlock;
   360		}
   361	
   362		if (vdev->ioeventfds_nr >= VFIO_PCI_IOEVENTFD_MAX) {
   363			ret = -ENOSPC;
   364			goto out_unlock;
   365		}
   366	
   367		ioeventfd = kzalloc(sizeof(*ioeventfd), GFP_KERNEL);
   368		if (!ioeventfd) {
   369			ret = -ENOMEM;
   370			goto out_unlock;
   371		}
   372	
   373		ioeventfd->pos = pos;
   374		ioeventfd->bar = bar;
   375		ioeventfd->data = data;
   376		ioeventfd->count = count;
   377	
 > 378		ret = vfio_virqfd_enable(vdev->barmap[bar] + pos, handler, NULL,

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] vfio/pci: Add ioeventfd support
  2018-02-28 20:15 ` [PATCH 3/3] vfio/pci: Add ioeventfd support Alex Williamson
  2018-03-06  6:54   ` kbuild test robot
@ 2018-03-07  5:56   ` Peter Xu
  2018-03-15 21:12     ` Alex Williamson
  2018-03-13 13:12   ` Auger Eric
  2 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2018-03-07  5:56 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, qemu-devel, eric.auger, aik

On Wed, Feb 28, 2018 at 01:15:20PM -0700, Alex Williamson wrote:

[...]

> @@ -1174,6 +1206,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	vdev->irq_type = VFIO_PCI_NUM_IRQS;
>  	mutex_init(&vdev->igate);
>  	spin_lock_init(&vdev->irqlock);
> +	mutex_init(&vdev->ioeventfds_lock);

Do we better need to destroy the mutex in vfio_pci_remove?

I see that vfio_pci_device.igate is also without a destructor.  I'm
not sure on both.

Thanks,

> +	INIT_LIST_HEAD(&vdev->ioeventfds_list);
>  
>  	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
>  	if (ret) {

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] vfio/pci: Pull BAR mapping setup from read-write path
  2018-02-28 20:14 ` [PATCH 1/3] vfio/pci: Pull BAR mapping setup from read-write path Alex Williamson
@ 2018-03-07  7:11   ` Peter Xu
  2018-03-13 12:21   ` Auger Eric
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Xu @ 2018-03-07  7:11 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, qemu-devel, eric.auger, aik

On Wed, Feb 28, 2018 at 01:14:46PM -0700, Alex Williamson wrote:
> This creates a common helper that we'll use for ioeventfd setup.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] vfio/pci: Pull BAR mapping setup from read-write path
  2018-02-28 20:14 ` [PATCH 1/3] vfio/pci: Pull BAR mapping setup from read-write path Alex Williamson
  2018-03-07  7:11   ` Peter Xu
@ 2018-03-13 12:21   ` Auger Eric
  1 sibling, 0 replies; 14+ messages in thread
From: Auger Eric @ 2018-03-13 12:21 UTC (permalink / raw)
  To: Alex Williamson, kvm; +Cc: linux-kernel, qemu-devel, peterx, aik

Hi Alex,
On 28/02/18 21:14, Alex Williamson wrote:
> This creates a common helper that we'll use for ioeventfd setup.

> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
> ---
>  drivers/vfio/pci/vfio_pci_rdwr.c |   39 ++++++++++++++++++++++++++------------
>  1 file changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 357243d76f10..5f2b376dcebd 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -113,6 +113,30 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>  	return done;
>  }
>  
> +static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	int ret;
> +	void __iomem *io;
> +
> +	if (vdev->barmap[bar])
> +		return 0;
> +
> +	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
> +	if (ret)
> +		return ret;
> +
> +	io = pci_iomap(pdev, bar, 0);
> +	if (!io) {
> +		pci_release_selected_regions(pdev, 1 << bar);
> +		return -ENOMEM;
> +	}
> +
> +	vdev->barmap[bar] = io;
> +
> +	return 0;
> +}
> +
>  ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
>  			size_t count, loff_t *ppos, bool iswrite)
>  {
> @@ -147,22 +171,13 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
>  		if (!io)
>  			return -ENOMEM;
>  		x_end = end;
> -	} else if (!vdev->barmap[bar]) {
> -		int ret;
> -
> -		ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
> +	} else {
> +		int ret = vfio_pci_setup_barmap(vdev, bar);
>  		if (ret)
>  			return ret;
>  
> -		io = pci_iomap(pdev, bar, 0);
> -		if (!io) {
> -			pci_release_selected_regions(pdev, 1 << bar);
> -			return -ENOMEM;
> -		}
> -
> -		vdev->barmap[bar] = io;
> -	} else
>  		io = vdev->barmap[bar];
> +	}
>  
>  	if (bar == vdev->msix_bar) {
>  		x_start = vdev->msix_offset;
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] vfio/pci: Add ioeventfd support
  2018-02-28 20:15 ` [PATCH 3/3] vfio/pci: Add ioeventfd support Alex Williamson
  2018-03-06  6:54   ` kbuild test robot
  2018-03-07  5:56   ` Peter Xu
@ 2018-03-13 13:12   ` Auger Eric
  2018-03-15 21:07     ` Alex Williamson
  2 siblings, 1 reply; 14+ messages in thread
From: Auger Eric @ 2018-03-13 13:12 UTC (permalink / raw)
  To: Alex Williamson, kvm; +Cc: linux-kernel, qemu-devel, peterx, aik

Hi Alex,

On 28/02/18 21:15, Alex Williamson wrote:
> The ioeventfd here is actually irqfd handling of an ioeventfd such as
> supported in KVM.  A user is able to pre-program a device write to
> occur when the eventfd triggers.  This is yet another instance of
> eventfd-irqfd triggering between KVM and vfio.  The impetus for this
> is high frequency writes to pages which are virtualized in QEMU.
> Enabling this near-direct write path for selected registers within
> the virtualized page can improve performance and reduce overhead.
> Specifically this is initially targeted at NVIDIA graphics cards where
> the driver issues a write to an MMIO register within a virtualized
> region in order to allow the MSI interrupt to re-trigger.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/vfio/pci/vfio_pci.c         |   34 ++++++++++
>  drivers/vfio/pci/vfio_pci_private.h |   18 +++++
>  drivers/vfio/pci/vfio_pci_rdwr.c    |  115 +++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h           |   27 ++++++++
>  4 files changed, 194 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index b0f759476900..ad18ed266dc0 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -305,6 +305,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
>  {
>  	struct pci_dev *pdev = vdev->pdev;
>  	struct vfio_pci_dummy_resource *dummy_res, *tmp;
> +	struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
>  	int i, bar;
>  
>  	/* Stop the device from further DMA */
> @@ -314,6 +315,15 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
>  				VFIO_IRQ_SET_ACTION_TRIGGER,
>  				vdev->irq_type, 0, 0, NULL);
>  
> +	/* Device closed, don't need mutex here */
> +	list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
> +				 &vdev->ioeventfds_list, next) {
> +		vfio_virqfd_disable(&ioeventfd->virqfd);
> +		list_del(&ioeventfd->next);
> +		kfree(ioeventfd);
> +	}
> +	vdev->ioeventfds_nr = 0;
> +
>  	vdev->virq_disabled = false;
>  
>  	for (i = 0; i < vdev->num_regions; i++)
> @@ -1012,6 +1022,28 @@ static long vfio_pci_ioctl(void *device_data,
>  
>  		kfree(groups);
>  		return ret;
> +	} else if (cmd == VFIO_DEVICE_IOEVENTFD) {
> +		struct vfio_device_ioeventfd ioeventfd;
> +		int count;
> +
> +		minsz = offsetofend(struct vfio_device_ioeventfd, fd);
> +
> +		if (copy_from_user(&ioeventfd, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (ioeventfd.argsz < minsz)
> +			return -EINVAL;
> +
> +		if (ioeventfd.flags & ~VFIO_DEVICE_IOEVENTFD_SIZE_MASK)
> +			return -EINVAL;
> +
> +		count = ioeventfd.flags & VFIO_DEVICE_IOEVENTFD_SIZE_MASK;
> +
> +		if (hweight8(count) != 1 || ioeventfd.fd < -1)
> +			return -EINVAL;
> +
> +		return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
> +					  ioeventfd.data, count, ioeventfd.fd);
>  	}
>  
>  	return -ENOTTY;
> @@ -1174,6 +1206,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	vdev->irq_type = VFIO_PCI_NUM_IRQS;
>  	mutex_init(&vdev->igate);
>  	spin_lock_init(&vdev->irqlock);
> +	mutex_init(&vdev->ioeventfds_lock);
> +	INIT_LIST_HEAD(&vdev->ioeventfds_list);
>  
>  	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
>  	if (ret) {
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index f561ac1c78a0..33a48c3ba11c 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -29,6 +29,18 @@
>  #define PCI_CAP_ID_INVALID		0xFF	/* default raw access */
>  #define PCI_CAP_ID_INVALID_VIRT		0xFE	/* default virt access */
>  
> +/* Cap maximum number of ioeventfds per device (arbitrary) */
> +#define VFIO_PCI_IOEVENTFD_MAX		1000
> +
> +struct vfio_pci_ioeventfd {
> +	struct list_head	next;
> +	struct virqfd		*virqfd;
> +	loff_t			pos;
> +	uint64_t		data;
> +	int			bar;
> +	int			count;
> +};
> +
>  struct vfio_pci_irq_ctx {
>  	struct eventfd_ctx	*trigger;
>  	struct virqfd		*unmask;
> @@ -92,9 +104,12 @@ struct vfio_pci_device {
>  	bool			nointx;
>  	struct pci_saved_state	*pci_saved_state;
>  	int			refcnt;
> +	int			ioeventfds_nr;
>  	struct eventfd_ctx	*err_trigger;
>  	struct eventfd_ctx	*req_trigger;
>  	struct list_head	dummy_resources_list;
> +	struct mutex		ioeventfds_lock;
> +	struct list_head	ioeventfds_list;
>  };
>  
>  #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
> @@ -120,6 +135,9 @@ extern ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
>  extern ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
>  			       size_t count, loff_t *ppos, bool iswrite);
>  
> +extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
> +			       uint64_t data, int count, int fd);
> +
>  extern int vfio_pci_init_perm_bits(void);
>  extern void vfio_pci_uninit_perm_bits(void);
>  
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 925419e0f459..43e4b5112337 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -17,6 +17,7 @@
>  #include <linux/pci.h>
>  #include <linux/uaccess.h>
>  #include <linux/io.h>
> +#include <linux/vfio.h>
>  #include <linux/vgaarb.h>
>  
>  #include "vfio_pci_private.h"
> @@ -275,3 +276,117 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
>  
>  	return done;
>  }
> +
> +#define VFIO_PCI_IOEVENTFD_HANDLER(size)				\
> +static int vfio_pci_ioeventfd_handler##size(void *opaque, void *data)	\
> +{									\
> +	vfio_iowrite##size((unsigned long)data, opaque);		\
> +	return 0;							\
> +}
> +
> +#ifdef iowrite64
> +VFIO_PCI_IOEVENTFD_HANDLER(64)
> +#endif
> +VFIO_PCI_IOEVENTFD_HANDLER(32)
> +VFIO_PCI_IOEVENTFD_HANDLER(16)
> +VFIO_PCI_IOEVENTFD_HANDLER(8)
> +
> +long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
> +			uint64_t data, int count, int fd)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
> +	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
> +	struct vfio_pci_ioeventfd *ioeventfd;
> +	int (*handler)(void *addr, void *value);
> +
> +	/* Only support ioeventfds into BARs */
> +	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
> +		return -EINVAL;
> +
> +	if (pos + count > pci_resource_len(pdev, bar))
> +		return -EINVAL;
> +
> +	/* Disallow ioeventfds working around MSI-X table writes */
> +	if (bar == vdev->msix_bar &&
> +	    !(pos + count <= vdev->msix_offset ||
> +	      pos >= vdev->msix_offset + vdev->msix_size))
> +		return -EINVAL;
> +
> +	switch (count) {
> +	case 1:
> +		handler = &vfio_pci_ioeventfd_handler8;
> +		break;
> +	case 2:
> +		handler = &vfio_pci_ioeventfd_handler16;
> +		break;
> +	case 4:
> +		handler = &vfio_pci_ioeventfd_handler32;
> +		break;
> +#ifdef iowrite64
> +	case 8:
> +		handler = &vfio_pci_ioeventfd_handler64;
> +		break;
from a user point of view, it is straightforward this setup will be
rejected? This is not documented in the uapi at the moment.

Thanks

Eric
> +#endif
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	ret = vfio_pci_setup_barmap(vdev, bar);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&vdev->ioeventfds_lock);
> +
> +	list_for_each_entry(ioeventfd, &vdev->ioeventfds_list, next) {
> +		if (ioeventfd->pos == pos && ioeventfd->bar == bar &&
> +		    ioeventfd->data == data && ioeventfd->count == count) {
> +			if (fd == -1) {
> +				vfio_virqfd_disable(&ioeventfd->virqfd);
> +				list_del(&ioeventfd->next);
> +				vdev->ioeventfds_nr--;
> +				kfree(ioeventfd);
> +				ret = 0;
> +			} else
> +				ret = -EEXIST;
> +
> +			goto out_unlock;
> +		}
> +	}
> +
> +	if (fd < 0) {
> +		ret = -ENODEV;
> +		goto out_unlock;
> +	}
> +
> +	if (vdev->ioeventfds_nr >= VFIO_PCI_IOEVENTFD_MAX) {
> +		ret = -ENOSPC;
> +		goto out_unlock;
> +	}
> +
> +	ioeventfd = kzalloc(sizeof(*ioeventfd), GFP_KERNEL);
> +	if (!ioeventfd) {
> +		ret = -ENOMEM;
> +		goto out_unlock;
> +	}
> +
> +	ioeventfd->pos = pos;
> +	ioeventfd->bar = bar;
> +	ioeventfd->data = data;
> +	ioeventfd->count = count;
> +
> +	ret = vfio_virqfd_enable(vdev->barmap[bar] + pos, handler, NULL,
> +				 (void *)data, &ioeventfd->virqfd, fd);
> +	if (ret) {
> +		kfree(ioeventfd);
> +		goto out_unlock;
> +	}
> +
> +	list_add(&ioeventfd->next, &vdev->ioeventfds_list);
> +	vdev->ioeventfds_nr++;
> +
> +out_unlock:
> +	mutex_unlock(&vdev->ioeventfds_lock);
> +
> +	return ret;
> +}
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index c74372163ed2..7e9d76203e86 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -575,6 +575,33 @@ struct vfio_device_gfx_plane_info {
>  
>  #define VFIO_DEVICE_GET_GFX_DMABUF _IO(VFIO_TYPE, VFIO_BASE + 15)
>  
> +/**
> + * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 16,
> + *                              struct vfio_device_ioeventfd)
> + *
> + * Perform a write to the device at the specified device fd offset, with
> + * the specified data and width when the provided eventfd is triggered.
> + * vfio bus drivers may not support this for all regions, or at all.
> + * vfio-pci currently only enables support for BAR regions and excludes
> + * the MSI-X vector table.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vfio_device_ioeventfd {
> +	__u32	argsz;
> +	__u32	flags;
> +#define VFIO_DEVICE_IOEVENTFD_8		(1 << 0) /* 1-byte write */
> +#define VFIO_DEVICE_IOEVENTFD_16	(1 << 1) /* 2-byte write */
> +#define VFIO_DEVICE_IOEVENTFD_32	(1 << 2) /* 4-byte write */
> +#define VFIO_DEVICE_IOEVENTFD_64	(1 << 3) /* 8-byte write */
> +#define VFIO_DEVICE_IOEVENTFD_SIZE_MASK	(0xf)
> +	__u64	offset;			/* device fd offset of write */
> +	__u64	data;			/* data to be written */
> +	__s32	fd;			/* -1 for de-assignment */
> +};
> +
> +#define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE + 16)
> +
>  /* -------- API for Type1 VFIO IOMMU -------- */
>  
>  /**
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] vfio/pci: Add ioeventfd support
  2018-03-13 13:12   ` Auger Eric
@ 2018-03-15 21:07     ` Alex Williamson
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Williamson @ 2018-03-15 21:07 UTC (permalink / raw)
  To: Auger Eric; +Cc: kvm, linux-kernel, qemu-devel, peterx, aik

On Tue, 13 Mar 2018 14:12:34 +0100
Auger Eric <eric.auger@redhat.com> wrote:
> On 28/02/18 21:15, Alex Williamson wrote:
> > +long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
> > +			uint64_t data, int count, int fd)
> > +{
> > +	struct pci_dev *pdev = vdev->pdev;
> > +	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
> > +	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
> > +	struct vfio_pci_ioeventfd *ioeventfd;
> > +	int (*handler)(void *addr, void *value);
> > +
> > +	/* Only support ioeventfds into BARs */
> > +	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
> > +		return -EINVAL;
> > +
> > +	if (pos + count > pci_resource_len(pdev, bar))
> > +		return -EINVAL;
> > +
> > +	/* Disallow ioeventfds working around MSI-X table writes */
> > +	if (bar == vdev->msix_bar &&
> > +	    !(pos + count <= vdev->msix_offset ||
> > +	      pos >= vdev->msix_offset + vdev->msix_size))
> > +		return -EINVAL;
> > +
> > +	switch (count) {
> > +	case 1:
> > +		handler = &vfio_pci_ioeventfd_handler8;
> > +		break;
> > +	case 2:
> > +		handler = &vfio_pci_ioeventfd_handler16;
> > +		break;
> > +	case 4:
> > +		handler = &vfio_pci_ioeventfd_handler32;
> > +		break;
> > +#ifdef iowrite64
> > +	case 8:
> > +		handler = &vfio_pci_ioeventfd_handler64;
> > +		break;  
> from a user point of view, it is straightforward this setup will be
> rejected? This is not documented in the uapi at the moment.

I added a mention in the uapi, do you see any need for more?
Essentially I consider this an entirely optional accelerator, bus
drivers are free to implement as much or little as they want.
Userspace can clearly make due without it, we've gone this long, and
it's easy to reject cases we don't want to support.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] vfio/pci: Add ioeventfd support
  2018-03-07  5:56   ` Peter Xu
@ 2018-03-15 21:12     ` Alex Williamson
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Williamson @ 2018-03-15 21:12 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, linux-kernel, qemu-devel, eric.auger, aik

On Wed, 7 Mar 2018 13:56:44 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Feb 28, 2018 at 01:15:20PM -0700, Alex Williamson wrote:
> 
> [...]
> 
> > @@ -1174,6 +1206,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  	vdev->irq_type = VFIO_PCI_NUM_IRQS;
> >  	mutex_init(&vdev->igate);
> >  	spin_lock_init(&vdev->irqlock);
> > +	mutex_init(&vdev->ioeventfds_lock);  
> 
> Do we better need to destroy the mutex in vfio_pci_remove?
> 
> I see that vfio_pci_device.igate is also without a destructor.  I'm
> not sure on both.

Yeah, mutex_destroy() is purely for debugging and I must have missed it
when implementing vfio.  I'll add it in the remove function and try to
cleanup the others in a separate patch, at some point.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-03-15 21:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-28 20:14 [PATCH 0/3] vfio/pci: ioeventfd support Alex Williamson
2018-02-28 20:14 ` [PATCH 1/3] vfio/pci: Pull BAR mapping setup from read-write path Alex Williamson
2018-03-07  7:11   ` Peter Xu
2018-03-13 12:21   ` Auger Eric
2018-02-28 20:14 ` [PATCH 2/3] vfio/pci: Use endian neutral helpers Alex Williamson
2018-02-28 20:15 ` [PATCH 3/3] vfio/pci: Add ioeventfd support Alex Williamson
2018-03-06  6:54   ` kbuild test robot
2018-03-07  5:56   ` Peter Xu
2018-03-15 21:12     ` Alex Williamson
2018-03-13 13:12   ` Auger Eric
2018-03-15 21:07     ` Alex Williamson
2018-03-02  7:08 ` [PATCH 0/3] vfio/pci: " Tian, Kevin
2018-03-02 18:02   ` Alex Williamson
2018-03-03  0:43     ` Tian, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).