LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/9] More virtio hardening
@ 2021-09-13  5:53 Jason Wang
  2021-09-13  5:53 ` [PATCH 1/9] virtio-blk: validate num_queues during probe Jason Wang
                   ` (9 more replies)
  0 siblings, 10 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  5:53 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

Hi All:

This series treis to do more hardening for virito.

patch 1 validates the num_queues for virio-blk device.
patch 2-4 validates max_nr_ports for virito-console device.
patch 5-7 harden virtio-pci interrupts to make sure no exepcted
interrupt handler is tiggered. If this makes sense we can do similar
things in other transport drivers.
patch 8-9 validate used ring length.

Smoking test on blk/net with packed=on/off and iommu_platform=on/off.

Please review.

Thanks

Jason Wang (9):
  virtio-blk: validate num_queues during probe
  virtio: add doc for validate() method
  virtio-console: switch to use .validate()
  virtio_console: validate max_nr_ports before trying to use it
  virtio_config: introduce a new ready method
  virtio_pci: harden MSI-X interrupts
  virtio-pci: harden INTX interrupts
  virtio_ring: fix typos in vring_desc_extra
  virtio_ring: validate used buffer length

 drivers/block/virtio_blk.c         |  3 +-
 drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
 drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
 drivers/virtio/virtio_pci_common.h |  7 ++--
 drivers/virtio/virtio_pci_legacy.c |  5 +--
 drivers/virtio/virtio_pci_modern.c |  6 ++--
 drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
 include/linux/virtio.h             |  1 +
 include/linux/virtio_config.h      |  6 ++++
 9 files changed, 118 insertions(+), 31 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/9] virtio-blk: validate num_queues during probe
  2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
@ 2021-09-13  5:53 ` Jason Wang
  2021-09-13  7:48   ` Stefano Garzarella
  2021-09-13 12:05   ` Stefan Hajnoczi
  2021-09-13  5:53 ` [PATCH 2/9] virtio: add doc for validate() method Jason Wang
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  5:53 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Paolo Bonzini, Stefan Hajnoczi

If an untrusted device neogitates BLK_F_MQ but advertises a zero
num_queues, the driver may end up trying to allocating zero size
buffers where ZERO_SIZE_PTR is returned which may pass the checking
against the NULL. This will lead unexpected results.

Fixing this by using single queue if num_queues is zero.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/block/virtio_blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index e574fbf5e6df..f130d12df4b9 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -498,7 +498,8 @@ static int init_vq(struct virtio_blk *vblk)
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
 				   struct virtio_blk_config, num_queues,
 				   &num_vqs);
-	if (err)
+	/* We need at least on virtqueue */
+	if (err || !num_vqs)
 		num_vqs = 1;
 
 	num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 2/9] virtio: add doc for validate() method
  2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
  2021-09-13  5:53 ` [PATCH 1/9] virtio-blk: validate num_queues during probe Jason Wang
@ 2021-09-13  5:53 ` Jason Wang
  2021-09-13  5:53 ` [PATCH 3/9] virtio-console: switch to use .validate() Jason Wang
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  5:53 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

This patch adds doc for validate() method.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/virtio.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 41edbc01ffa4..0cd8685aeba4 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -152,6 +152,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev);
  * @feature_table_size: number of entries in the feature table array.
  * @feature_table_legacy: same as feature_table but when working in legacy mode.
  * @feature_table_size_legacy: number of entries in feature table legacy array.
+ * @validate: optional function to do early checks
  * @probe: the function to call when a device is found.  Returns 0 or -errno.
  * @scan: optional function to call after successful probe; intended
  *    for virtio-scsi to invoke a scan.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 3/9] virtio-console: switch to use .validate()
  2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
  2021-09-13  5:53 ` [PATCH 1/9] virtio-blk: validate num_queues during probe Jason Wang
  2021-09-13  5:53 ` [PATCH 2/9] virtio: add doc for validate() method Jason Wang
@ 2021-09-13  5:53 ` Jason Wang
  2021-09-13  5:53 ` [PATCH 4/9] virtio_console: validate max_nr_ports before trying to use it Jason Wang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  5:53 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Amit Shah

This patch switches to use validate() to filter out the features that
is not supported by the rproc.

Cc: Amit Shah <amit@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/char/virtio_console.c | 41 ++++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7eaf303a7a86..daeed31df622 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1172,9 +1172,7 @@ static void resize_console(struct port *port)
 
 	vdev = port->portdev->vdev;
 
-	/* Don't test F_SIZE at all if we're rproc: not a valid feature! */
-	if (!is_rproc_serial(vdev) &&
-	    virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE))
+	if (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE))
 		hvc_resize(port->cons.hvc, port->cons.ws);
 }
 
@@ -1981,6 +1979,29 @@ static void virtcons_remove(struct virtio_device *vdev)
 	kfree(portdev);
 }
 
+static int virtcons_validate(struct virtio_device *vdev)
+{
+	if (is_rproc_serial(vdev)) {
+		/* Don't test F_SIZE at all if we're rproc: not a
+		 * valid feature! */
+		__virtio_clear_bit(vdev, VIRTIO_CONSOLE_F_SIZE);
+		/* Don't test MULTIPORT at all if we're rproc: not a
+		 * valid feature! */
+		__virtio_clear_bit(vdev, VIRTIO_CONSOLE_F_MULTIPORT);
+	}
+
+	/* We only need a config space if features are offered */
+	if (!vdev->config->get &&
+	    (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE)
+	     || virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT))) {
+		dev_err(&vdev->dev, "%s failure: config access disabled\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /*
  * Once we're further in boot, we get probed like any other virtio
  * device.
@@ -1996,15 +2017,6 @@ static int virtcons_probe(struct virtio_device *vdev)
 	bool multiport;
 	bool early = early_put_chars != NULL;
 
-	/* We only need a config space if features are offered */
-	if (!vdev->config->get &&
-	    (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE)
-	     || virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT))) {
-		dev_err(&vdev->dev, "%s failure: config access disabled\n",
-			__func__);
-		return -EINVAL;
-	}
-
 	/* Ensure to read early_put_chars now */
 	barrier();
 
@@ -2031,9 +2043,7 @@ static int virtcons_probe(struct virtio_device *vdev)
 	multiport = false;
 	portdev->max_nr_ports = 1;
 
-	/* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
-	if (!is_rproc_serial(vdev) &&
-	    virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
+	if (virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
 				 struct virtio_console_config, max_nr_ports,
 				 &portdev->max_nr_ports) == 0) {
 		multiport = true;
@@ -2210,6 +2220,7 @@ static struct virtio_driver virtio_console = {
 	.driver.name =	KBUILD_MODNAME,
 	.driver.owner =	THIS_MODULE,
 	.id_table =	id_table,
+	.validate = 	virtcons_validate,
 	.probe =	virtcons_probe,
 	.remove =	virtcons_remove,
 	.config_changed = config_intr,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 4/9] virtio_console: validate max_nr_ports before trying to use it
  2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
                   ` (2 preceding siblings ...)
  2021-09-13  5:53 ` [PATCH 3/9] virtio-console: switch to use .validate() Jason Wang
@ 2021-09-13  5:53 ` Jason Wang
  2021-09-13  5:53 ` [PATCH 5/9] virtio_config: introduce a new ready method Jason Wang
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  5:53 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Amit Shah

We calculate nr_ports based on the max_nr_ports:

nr_queues = use_multiport(portdev) ? (nr_ports + 1) * 2 : 2;

If the device advertises a large max_nr_ports, we will end up with a
integer overflow. Fixing this by validating the max_nr_ports
advertised by the device in .validate() and clear the MULTIPORT is
it's greater than 0x8000 (which is guaranteed be safe).

Cc: Amit Shah <amit@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/char/virtio_console.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index daeed31df622..ef13763699c0 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -28,6 +28,7 @@
 #include "../tty/hvc/hvc_console.h"
 
 #define is_rproc_enabled IS_ENABLED(CONFIG_REMOTEPROC)
+#define VIRTCONS_MAX_PORTS 0x8000
 
 /*
  * This is a global struct for storing common data for all the devices
@@ -1981,6 +1982,8 @@ static void virtcons_remove(struct virtio_device *vdev)
 
 static int virtcons_validate(struct virtio_device *vdev)
 {
+	u32 max_nr_ports;
+
 	if (is_rproc_serial(vdev)) {
 		/* Don't test F_SIZE at all if we're rproc: not a
 		 * valid feature! */
@@ -1999,6 +2002,13 @@ static int virtcons_validate(struct virtio_device *vdev)
 		return -EINVAL;
 	}
 
+	if (virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
+				 struct virtio_console_config, max_nr_ports,
+				 &max_nr_ports) == 0) {
+		if (max_nr_ports == 0 || max_nr_ports > VIRTCONS_MAX_PORTS)
+			__virtio_clear_bit(vdev, VIRTIO_CONSOLE_F_MULTIPORT);
+	}
+
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 5/9] virtio_config: introduce a new ready method
  2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
                   ` (3 preceding siblings ...)
  2021-09-13  5:53 ` [PATCH 4/9] virtio_console: validate max_nr_ports before trying to use it Jason Wang
@ 2021-09-13  5:53 ` Jason Wang
  2021-09-13  5:53 ` [PATCH 6/9] virtio_pci: harden MSI-X interrupts Jason Wang
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  5:53 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/virtio_config.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 8519b3ae5d52..f2891c6221a1 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -23,6 +23,8 @@ struct virtio_shm_region {
  *       any of @get/@set, @get_status/@set_status, or @get_features/
  *       @finalize_features are NOT safe to be called from an atomic
  *       context.
+ * @ready: make the device ready
+ *      vdev: the virtio_device
  * @get: read the value of a configuration field
  *	vdev: the virtio_device
  *	offset: the offset of the configuration field
@@ -75,6 +77,7 @@ struct virtio_shm_region {
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
+	void (*ready)(struct virtio_device *vdev);
 	void (*get)(struct virtio_device *vdev, unsigned offset,
 		    void *buf, unsigned len);
 	void (*set)(struct virtio_device *vdev, unsigned offset,
@@ -229,6 +232,9 @@ void virtio_device_ready(struct virtio_device *dev)
 {
 	unsigned status = dev->config->get_status(dev);
 
+	if (dev->config->ready)
+                  dev->config->ready(dev);
+
 	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
 	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
                   ` (4 preceding siblings ...)
  2021-09-13  5:53 ` [PATCH 5/9] virtio_config: introduce a new ready method Jason Wang
@ 2021-09-13  5:53 ` Jason Wang
  2021-09-13  6:03   ` Michael S. Tsirkin
  2021-09-13  5:53 ` [PATCH 7/9] virtio-pci: harden INTX interrupts Jason Wang
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-09-13  5:53 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

We used to synchronize pending MSI-X irq handlers via
synchronize_irq(), this may not work for the untrusted device which
may keep sending interrupts after reset which may lead unexpected
results. Similarly, we should not enable MSI-X interrupt until the
device is ready. So this patch fixes those two issues by:

1) switching to use disable_irq() to prevent the virtio interrupt
   handlers to be called after the device is reset.
2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()

This can make sure the virtio interrupt handler won't be called before
virtio_device_ready() and after reset.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
 drivers/virtio/virtio_pci_common.h |  6 ++++--
 drivers/virtio/virtio_pci_legacy.c |  5 +++--
 drivers/virtio/virtio_pci_modern.c |  6 ++++--
 4 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index b35bb2d57f62..0b9523e6dd39 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
 		 "Force legacy mode for transitional virtio 1 devices");
 #endif
 
-/* wait for pending irq handlers */
-void vp_synchronize_vectors(struct virtio_device *vdev)
+/* disable irq handlers */
+void vp_disable_vectors(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	int i;
@@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
 		synchronize_irq(vp_dev->pci_dev->irq);
 
 	for (i = 0; i < vp_dev->msix_vectors; ++i)
-		synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
+		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
+}
+
+/* enable irq handlers */
+void vp_enable_vectors(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int i;
+
+	if (vp_dev->intx_enabled)
+		return;
+
+	for (i = 0; i < vp_dev->msix_vectors; ++i)
+		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
 }
 
 /* the notify function used when creating a virt queue */
@@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
 		 "%s-config", name);
 	err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
-			  vp_config_changed, 0, vp_dev->msix_names[v],
+			  vp_config_changed, IRQF_NO_AUTOEN,
+			  vp_dev->msix_names[v],
 			  vp_dev);
 	if (err)
 		goto error;
@@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
 			 "%s-virtqueues", name);
 		err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
-				  vp_vring_interrupt, 0, vp_dev->msix_names[v],
+				  vp_vring_interrupt, IRQF_NO_AUTOEN,
+				  vp_dev->msix_names[v],
 				  vp_dev);
 		if (err)
 			goto error;
@@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
 			 "%s-%s",
 			 dev_name(&vp_dev->vdev.dev), names[i]);
 		err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
-				  vring_interrupt, 0,
+				  vring_interrupt, IRQF_NO_AUTOEN,
 				  vp_dev->msix_names[msix_vec],
 				  vqs[i]);
 		if (err)
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index beec047a8f8d..a235ce9ff6a5 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
 	return container_of(vdev, struct virtio_pci_device, vdev);
 }
 
-/* wait for pending irq handlers */
-void vp_synchronize_vectors(struct virtio_device *vdev);
+/* disable irq handlers */
+void vp_disable_vectors(struct virtio_device *vdev);
+/* enable irq handlers */
+void vp_enable_vectors(struct virtio_device *vdev);
 /* the notify function used when creating a virt queue */
 bool vp_notify(struct virtqueue *vq);
 /* the config->del_vqs() implementation */
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index d62e9835aeec..bdf6bc667ab5 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
 	/* Flush out the status write, and flush in device writes,
 	 * including MSi-X interrupts, if any. */
 	ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
-	/* Flush pending VQ/configuration callbacks. */
-	vp_synchronize_vectors(vdev);
+	/* Disable VQ/configuration callbacks. */
+	vp_disable_vectors(vdev);
 }
 
 static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
@@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
 }
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
+	.ready		= vp_enable_vectors,
 	.get		= vp_get,
 	.set		= vp_set,
 	.get_status	= vp_get_status,
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 30654d3a0b41..acf0f6b6381d 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
 	 */
 	while (vp_modern_get_status(mdev))
 		msleep(1);
-	/* Flush pending VQ/configuration callbacks. */
-	vp_synchronize_vectors(vdev);
+	/* Disable VQ/configuration callbacks. */
+	vp_disable_vectors(vdev);
 }
 
 static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
@@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
 }
 
 static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
+	.ready		= vp_enable_vectors,
 	.get		= NULL,
 	.set		= NULL,
 	.generation	= vp_generation,
@@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 };
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
+	.ready		= vp_enable_vectors,
 	.get		= vp_get,
 	.set		= vp_set,
 	.generation	= vp_generation,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
                   ` (5 preceding siblings ...)
  2021-09-13  5:53 ` [PATCH 6/9] virtio_pci: harden MSI-X interrupts Jason Wang
@ 2021-09-13  5:53 ` Jason Wang
  2021-09-13  6:33   ` Michael S. Tsirkin
  2021-09-13 21:36   ` Thomas Gleixner
  2021-09-13  5:53 ` [PATCH 8/9] virtio_ring: fix typos in vring_desc_extra Jason Wang
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  5:53 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

This patch tries to make sure the virtio interrupt handler for INTX
won't be called after a reset and before virtio_device_ready(). We
can't use IRQF_NO_AUTOEN since we're using shared interrupt
(IRQF_SHARED). So this patch tracks the INTX enabling status in a new
intx_soft_enabled variable and toggle it during in
vp_disable/enable_vectors(). The INTX interrupt handler will check
intx_soft_enabled before processing the actual interrupt.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_pci_common.c | 18 ++++++++++++++++--
 drivers/virtio/virtio_pci_common.h |  1 +
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 0b9523e6dd39..835197151dc1 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -30,8 +30,12 @@ void vp_disable_vectors(struct virtio_device *vdev)
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	int i;
 
-	if (vp_dev->intx_enabled)
+	if (vp_dev->intx_enabled) {
+		vp_dev->intx_soft_enabled = false;
+		/* ensure the vp_interrupt see this intx_soft_enabled value */
+		smp_wmb();
 		synchronize_irq(vp_dev->pci_dev->irq);
+	}
 
 	for (i = 0; i < vp_dev->msix_vectors; ++i)
 		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
@@ -43,8 +47,12 @@ void vp_enable_vectors(struct virtio_device *vdev)
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	int i;
 
-	if (vp_dev->intx_enabled)
+	if (vp_dev->intx_enabled) {
+		vp_dev->intx_soft_enabled = true;
+		/* ensure the vp_interrupt see this intx_soft_enabled value */
+		smp_wmb();
 		return;
+	}
 
 	for (i = 0; i < vp_dev->msix_vectors; ++i)
 		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
@@ -97,6 +105,12 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
 	struct virtio_pci_device *vp_dev = opaque;
 	u8 isr;
 
+	if (!vp_dev->intx_soft_enabled)
+		return IRQ_NONE;
+
+	/* read intx_soft_enabled before read others */
+	smp_rmb();
+
 	/* reading the ISR has the effect of also clearing it so it's very
 	 * important to save off the value. */
 	isr = ioread8(vp_dev->isr);
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index a235ce9ff6a5..3c06e0f92ee4 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -64,6 +64,7 @@ struct virtio_pci_device {
 	/* MSI-X support */
 	int msix_enabled;
 	int intx_enabled;
+	bool intx_soft_enabled;
 	cpumask_var_t *msix_affinity_masks;
 	/* Name strings for interrupts. This size should be enough,
 	 * and I'm too lazy to allocate each name separately. */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 8/9] virtio_ring: fix typos in vring_desc_extra
  2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
                   ` (6 preceding siblings ...)
  2021-09-13  5:53 ` [PATCH 7/9] virtio-pci: harden INTX interrupts Jason Wang
@ 2021-09-13  5:53 ` Jason Wang
  2021-09-13  5:53 ` [PATCH 9/9] virtio_ring: validate used buffer length Jason Wang
  2021-10-05  7:42 ` [PATCH 0/9] More virtio hardening Michael S. Tsirkin
  9 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  5:53 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

We're actually tracking descriptor address and length instead of the
buffer.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dd95dfd85e98..d2ca0a7365f8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -79,8 +79,8 @@ struct vring_desc_state_packed {
 };
 
 struct vring_desc_extra {
-	dma_addr_t addr;		/* Buffer DMA addr. */
-	u32 len;			/* Buffer length. */
+	dma_addr_t addr;		/* Descriptor DMA addr. */
+	u32 len;			/* Descriptor length. */
 	u16 flags;			/* Descriptor flags. */
 	u16 next;			/* The next desc state in a list. */
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 9/9] virtio_ring: validate used buffer length
  2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
                   ` (7 preceding siblings ...)
  2021-09-13  5:53 ` [PATCH 8/9] virtio_ring: fix typos in vring_desc_extra Jason Wang
@ 2021-09-13  5:53 ` Jason Wang
  2021-09-13  6:36   ` Michael S. Tsirkin
  2021-10-05  7:42 ` [PATCH 0/9] More virtio hardening Michael S. Tsirkin
  9 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-09-13  5:53 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

This patch validate the used buffer length provided by the device
before trying to use it. This is done by record the in buffer length
in a new field in desc_state structure during virtqueue_add(), then we
can fail the virtqueue_get_buf() when we find the device is trying to
give us a used buffer length which is greater than the in buffer
length.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d2ca0a7365f8..b8374a6144f3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -69,6 +69,7 @@
 struct vring_desc_state_split {
 	void *data;			/* Data for callback. */
 	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
+	u64 buflen;			/* In buffer length */
 };
 
 struct vring_desc_state_packed {
@@ -76,6 +77,7 @@ struct vring_desc_state_packed {
 	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
 	u16 num;			/* Descriptor list length. */
 	u16 last;			/* The last desc state in a list. */
+	u64 buflen;			/* In buffer length */
 };
 
 struct vring_desc_extra {
@@ -490,6 +492,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	unsigned int i, n, avail, descs_used, prev, err_idx;
 	int head;
 	bool indirect;
+	u64 buflen = 0;
 
 	START_USE(vq);
 
@@ -571,6 +574,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 						     VRING_DESC_F_NEXT |
 						     VRING_DESC_F_WRITE,
 						     indirect);
+			buflen += sg->length;
 		}
 	}
 	/* Last one doesn't continue. */
@@ -605,6 +609,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 	/* Store token and indirect buffer state. */
 	vq->split.desc_state[head].data = data;
+	vq->split.desc_state[head].buflen = buflen;
 	if (indirect)
 		vq->split.desc_state[head].indir_desc = desc;
 	else
@@ -784,6 +789,11 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 		BAD_RING(vq, "id %u is not a head!\n", i);
 		return NULL;
 	}
+	if (unlikely(*len > vq->split.desc_state[i].buflen)) {
+		BAD_RING(vq, "used len %d is larger than in buflen %lld\n",
+			*len, vq->split.desc_state[i].buflen);
+		return NULL;
+	}
 
 	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->split.desc_state[i].data;
@@ -1062,6 +1072,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	unsigned int i, n, err_idx;
 	u16 head, id;
 	dma_addr_t addr;
+	u64 buflen = 0;
 
 	head = vq->packed.next_avail_idx;
 	desc = alloc_indirect_packed(total_sg, gfp);
@@ -1089,6 +1100,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 			desc[i].addr = cpu_to_le64(addr);
 			desc[i].len = cpu_to_le32(sg->length);
 			i++;
+			if (n >= out_sgs)
+				buflen += sg->length;
 		}
 	}
 
@@ -1141,6 +1154,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	vq->packed.desc_state[id].data = data;
 	vq->packed.desc_state[id].indir_desc = desc;
 	vq->packed.desc_state[id].last = id;
+	vq->packed.desc_state[id].buflen = buflen;
 
 	vq->num_added += 1;
 
@@ -1176,6 +1190,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	unsigned int i, n, c, descs_used, err_idx;
 	__le16 head_flags, flags;
 	u16 head, id, prev, curr, avail_used_flags;
+	u64 buflen = 0;
 
 	START_USE(vq);
 
@@ -1250,6 +1265,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 					1 << VRING_PACKED_DESC_F_AVAIL |
 					1 << VRING_PACKED_DESC_F_USED;
 			}
+			if (n >= out_sgs)
+				buflen += sg->length;
 		}
 	}
 
@@ -1268,6 +1285,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	vq->packed.desc_state[id].data = data;
 	vq->packed.desc_state[id].indir_desc = ctx;
 	vq->packed.desc_state[id].last = prev;
+	vq->packed.desc_state[id].buflen = buflen;
 
 	/*
 	 * A driver MUST NOT make the first descriptor in the list
@@ -1455,6 +1473,11 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 		BAD_RING(vq, "id %u is not a head!\n", id);
 		return NULL;
 	}
+	if (unlikely(*len > vq->packed.desc_state[id].buflen)) {
+		BAD_RING(vq, "used len %d is larger than in buflen %lld\n",
+			*len, vq->packed.desc_state[id].buflen);
+		return NULL;
+	}
 
 	/* detach_buf_packed clears data, so grab it now. */
 	ret = vq->packed.desc_state[id].data;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  5:53 ` [PATCH 6/9] virtio_pci: harden MSI-X interrupts Jason Wang
@ 2021-09-13  6:03   ` Michael S. Tsirkin
  2021-09-13  6:08     ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13  6:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, pbonzini, Andi Kleen, Dan Williams, Kuppuswamy,
	Sathyanarayanan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 01:53:50PM +0800, Jason Wang wrote:
> We used to synchronize pending MSI-X irq handlers via
> synchronize_irq(), this may not work for the untrusted device which
> may keep sending interrupts after reset which may lead unexpected
> results. Similarly, we should not enable MSI-X interrupt until the
> device is ready. So this patch fixes those two issues by:
> 
> 1) switching to use disable_irq() to prevent the virtio interrupt
>    handlers to be called after the device is reset.
> 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> 
> This can make sure the virtio interrupt handler won't be called before
> virtio_device_ready() and after reset.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I don't get the threat model here. Isn't disabling irqs done by the 
hypervisor anyway? Is there a reason to trust disable_irq but not
device reset?

Cc a bunch more people ...


> ---
>  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
>  drivers/virtio/virtio_pci_common.h |  6 ++++--
>  drivers/virtio/virtio_pci_legacy.c |  5 +++--
>  drivers/virtio/virtio_pci_modern.c |  6 ++++--
>  4 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index b35bb2d57f62..0b9523e6dd39 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
>  		 "Force legacy mode for transitional virtio 1 devices");
>  #endif
>  
> -/* wait for pending irq handlers */
> -void vp_synchronize_vectors(struct virtio_device *vdev)
> +/* disable irq handlers */
> +void vp_disable_vectors(struct virtio_device *vdev)
>  {
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  	int i;
> @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
>  		synchronize_irq(vp_dev->pci_dev->irq);
>  
>  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> -		synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> +		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> +}
> +
> +/* enable irq handlers */
> +void vp_enable_vectors(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	int i;
> +
> +	if (vp_dev->intx_enabled)
> +		return;
> +
> +	for (i = 0; i < vp_dev->msix_vectors; ++i)
> +		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
>  }
>  
>  /* the notify function used when creating a virt queue */
> @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
>  	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
>  		 "%s-config", name);
>  	err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> -			  vp_config_changed, 0, vp_dev->msix_names[v],
> +			  vp_config_changed, IRQF_NO_AUTOEN,
> +			  vp_dev->msix_names[v],
>  			  vp_dev);
>  	if (err)
>  		goto error;
> @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
>  		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
>  			 "%s-virtqueues", name);
>  		err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> -				  vp_vring_interrupt, 0, vp_dev->msix_names[v],
> +				  vp_vring_interrupt, IRQF_NO_AUTOEN,
> +				  vp_dev->msix_names[v],
>  				  vp_dev);
>  		if (err)
>  			goto error;
> @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
>  			 "%s-%s",
>  			 dev_name(&vp_dev->vdev.dev), names[i]);
>  		err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> -				  vring_interrupt, 0,
> +				  vring_interrupt, IRQF_NO_AUTOEN,
>  				  vp_dev->msix_names[msix_vec],
>  				  vqs[i]);
>  		if (err)
> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> index beec047a8f8d..a235ce9ff6a5 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
>  	return container_of(vdev, struct virtio_pci_device, vdev);
>  }
>  
> -/* wait for pending irq handlers */
> -void vp_synchronize_vectors(struct virtio_device *vdev);
> +/* disable irq handlers */
> +void vp_disable_vectors(struct virtio_device *vdev);
> +/* enable irq handlers */
> +void vp_enable_vectors(struct virtio_device *vdev);
>  /* the notify function used when creating a virt queue */
>  bool vp_notify(struct virtqueue *vq);
>  /* the config->del_vqs() implementation */
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index d62e9835aeec..bdf6bc667ab5 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
>  	/* Flush out the status write, and flush in device writes,
>  	 * including MSi-X interrupts, if any. */
>  	ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
> -	/* Flush pending VQ/configuration callbacks. */
> -	vp_synchronize_vectors(vdev);
> +	/* Disable VQ/configuration callbacks. */
> +	vp_disable_vectors(vdev);
>  }
>  
>  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> @@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
>  }
>  
>  static const struct virtio_config_ops virtio_pci_config_ops = {
> +	.ready		= vp_enable_vectors,
>  	.get		= vp_get,
>  	.set		= vp_set,
>  	.get_status	= vp_get_status,
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index 30654d3a0b41..acf0f6b6381d 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
>  	 */
>  	while (vp_modern_get_status(mdev))
>  		msleep(1);
> -	/* Flush pending VQ/configuration callbacks. */
> -	vp_synchronize_vectors(vdev);
> +	/* Disable VQ/configuration callbacks. */
> +	vp_disable_vectors(vdev);
>  }
>  
>  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> @@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
>  }
>  
>  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> +	.ready		= vp_enable_vectors,
>  	.get		= NULL,
>  	.set		= NULL,
>  	.generation	= vp_generation,
> @@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>  };
>  
>  static const struct virtio_config_ops virtio_pci_config_ops = {
> +	.ready		= vp_enable_vectors,
>  	.get		= vp_get,
>  	.set		= vp_set,
>  	.generation	= vp_generation,
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  6:03   ` Michael S. Tsirkin
@ 2021-09-13  6:08     ` Jason Wang
  2021-09-13  6:28       ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-09-13  6:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 2:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 01:53:50PM +0800, Jason Wang wrote:
> > We used to synchronize pending MSI-X irq handlers via
> > synchronize_irq(), this may not work for the untrusted device which
> > may keep sending interrupts after reset which may lead unexpected
> > results. Similarly, we should not enable MSI-X interrupt until the
> > device is ready. So this patch fixes those two issues by:
> >
> > 1) switching to use disable_irq() to prevent the virtio interrupt
> >    handlers to be called after the device is reset.
> > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> >
> > This can make sure the virtio interrupt handler won't be called before
> > virtio_device_ready() and after reset.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
> I don't get the threat model here. Isn't disabling irqs done by the
> hypervisor anyway? Is there a reason to trust disable_irq but not
> device reset?

My understanding is that e.g in the case of SEV/TDX we don't trust the
hypervisor. So the hypervisor can keep sending interrupts even if the
device is reset. The guest can only trust its own software interrupt
management logic to avoid call virtio callback in this case.

Thanks

>
> Cc a bunch more people ...
>
>
> > ---
> >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> >  4 files changed, 32 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > index b35bb2d57f62..0b9523e6dd39 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> >                "Force legacy mode for transitional virtio 1 devices");
> >  #endif
> >
> > -/* wait for pending irq handlers */
> > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > +/* disable irq handlers */
> > +void vp_disable_vectors(struct virtio_device *vdev)
> >  {
> >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >       int i;
> > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> >               synchronize_irq(vp_dev->pci_dev->irq);
> >
> >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > -             synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > +             disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > +}
> > +
> > +/* enable irq handlers */
> > +void vp_enable_vectors(struct virtio_device *vdev)
> > +{
> > +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > +     int i;
> > +
> > +     if (vp_dev->intx_enabled)
> > +             return;
> > +
> > +     for (i = 0; i < vp_dev->msix_vectors; ++i)
> > +             enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >  }
> >
> >  /* the notify function used when creating a virt queue */
> > @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> >       snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> >                "%s-config", name);
> >       err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > -                       vp_config_changed, 0, vp_dev->msix_names[v],
> > +                       vp_config_changed, IRQF_NO_AUTOEN,
> > +                       vp_dev->msix_names[v],
> >                         vp_dev);
> >       if (err)
> >               goto error;
> > @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> >               snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> >                        "%s-virtqueues", name);
> >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > -                               vp_vring_interrupt, 0, vp_dev->msix_names[v],
> > +                               vp_vring_interrupt, IRQF_NO_AUTOEN,
> > +                               vp_dev->msix_names[v],
> >                                 vp_dev);
> >               if (err)
> >                       goto error;
> > @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
> >                        "%s-%s",
> >                        dev_name(&vp_dev->vdev.dev), names[i]);
> >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> > -                               vring_interrupt, 0,
> > +                               vring_interrupt, IRQF_NO_AUTOEN,
> >                                 vp_dev->msix_names[msix_vec],
> >                                 vqs[i]);
> >               if (err)
> > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > index beec047a8f8d..a235ce9ff6a5 100644
> > --- a/drivers/virtio/virtio_pci_common.h
> > +++ b/drivers/virtio/virtio_pci_common.h
> > @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
> >       return container_of(vdev, struct virtio_pci_device, vdev);
> >  }
> >
> > -/* wait for pending irq handlers */
> > -void vp_synchronize_vectors(struct virtio_device *vdev);
> > +/* disable irq handlers */
> > +void vp_disable_vectors(struct virtio_device *vdev);
> > +/* enable irq handlers */
> > +void vp_enable_vectors(struct virtio_device *vdev);
> >  /* the notify function used when creating a virt queue */
> >  bool vp_notify(struct virtqueue *vq);
> >  /* the config->del_vqs() implementation */
> > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > index d62e9835aeec..bdf6bc667ab5 100644
> > --- a/drivers/virtio/virtio_pci_legacy.c
> > +++ b/drivers/virtio/virtio_pci_legacy.c
> > @@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
> >       /* Flush out the status write, and flush in device writes,
> >        * including MSi-X interrupts, if any. */
> >       ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
> > -     /* Flush pending VQ/configuration callbacks. */
> > -     vp_synchronize_vectors(vdev);
> > +     /* Disable VQ/configuration callbacks. */
> > +     vp_disable_vectors(vdev);
> >  }
> >
> >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > @@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
> >  }
> >
> >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > +     .ready          = vp_enable_vectors,
> >       .get            = vp_get,
> >       .set            = vp_set,
> >       .get_status     = vp_get_status,
> > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > index 30654d3a0b41..acf0f6b6381d 100644
> > --- a/drivers/virtio/virtio_pci_modern.c
> > +++ b/drivers/virtio/virtio_pci_modern.c
> > @@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
> >        */
> >       while (vp_modern_get_status(mdev))
> >               msleep(1);
> > -     /* Flush pending VQ/configuration callbacks. */
> > -     vp_synchronize_vectors(vdev);
> > +     /* Disable VQ/configuration callbacks. */
> > +     vp_disable_vectors(vdev);
> >  }
> >
> >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > @@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
> >  }
> >
> >  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > +     .ready          = vp_enable_vectors,
> >       .get            = NULL,
> >       .set            = NULL,
> >       .generation     = vp_generation,
> > @@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> >  };
> >
> >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > +     .ready          = vp_enable_vectors,
> >       .get            = vp_get,
> >       .set            = vp_set,
> >       .generation     = vp_generation,
> > --
> > 2.25.1
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  6:08     ` Jason Wang
@ 2021-09-13  6:28       ` Michael S. Tsirkin
  2021-09-13  6:34         ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13  6:28 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 02:08:02PM +0800, Jason Wang wrote:
> On Mon, Sep 13, 2021 at 2:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Sep 13, 2021 at 01:53:50PM +0800, Jason Wang wrote:
> > > We used to synchronize pending MSI-X irq handlers via
> > > synchronize_irq(), this may not work for the untrusted device which
> > > may keep sending interrupts after reset which may lead unexpected
> > > results. Similarly, we should not enable MSI-X interrupt until the
> > > device is ready. So this patch fixes those two issues by:
> > >
> > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > >    handlers to be called after the device is reset.
> > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > >
> > > This can make sure the virtio interrupt handler won't be called before
> > > virtio_device_ready() and after reset.
> > >
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> >
> > I don't get the threat model here. Isn't disabling irqs done by the
> > hypervisor anyway? Is there a reason to trust disable_irq but not
> > device reset?
> 
> My understanding is that e.g in the case of SEV/TDX we don't trust the
> hypervisor. So the hypervisor can keep sending interrupts even if the
> device is reset. The guest can only trust its own software interrupt
> management logic to avoid call virtio callback in this case.
> 
> Thanks

Hmm but I don't see how do these patches do this.
They call disable_irq but can't the hypervisor keep
sending interrupts after disable_irq, too?



> >
> > Cc a bunch more people ...
> >
> >
> > > ---
> > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > index b35bb2d57f62..0b9523e6dd39 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > >                "Force legacy mode for transitional virtio 1 devices");
> > >  #endif
> > >
> > > -/* wait for pending irq handlers */
> > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > +/* disable irq handlers */
> > > +void vp_disable_vectors(struct virtio_device *vdev)
> > >  {
> > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > >       int i;
> > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > >               synchronize_irq(vp_dev->pci_dev->irq);
> > >
> > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > -             synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > +             disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > +}
> > > +
> > > +/* enable irq handlers */
> > > +void vp_enable_vectors(struct virtio_device *vdev)
> > > +{
> > > +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > +     int i;
> > > +
> > > +     if (vp_dev->intx_enabled)
> > > +             return;
> > > +
> > > +     for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > +             enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > >  }
> > >
> > >  /* the notify function used when creating a virt queue */
> > > @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > >       snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > >                "%s-config", name);
> > >       err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > -                       vp_config_changed, 0, vp_dev->msix_names[v],
> > > +                       vp_config_changed, IRQF_NO_AUTOEN,
> > > +                       vp_dev->msix_names[v],
> > >                         vp_dev);
> > >       if (err)
> > >               goto error;
> > > @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > >               snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > >                        "%s-virtqueues", name);
> > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > -                               vp_vring_interrupt, 0, vp_dev->msix_names[v],
> > > +                               vp_vring_interrupt, IRQF_NO_AUTOEN,
> > > +                               vp_dev->msix_names[v],
> > >                                 vp_dev);
> > >               if (err)
> > >                       goto error;
> > > @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
> > >                        "%s-%s",
> > >                        dev_name(&vp_dev->vdev.dev), names[i]);
> > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> > > -                               vring_interrupt, 0,
> > > +                               vring_interrupt, IRQF_NO_AUTOEN,
> > >                                 vp_dev->msix_names[msix_vec],
> > >                                 vqs[i]);
> > >               if (err)
> > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > index beec047a8f8d..a235ce9ff6a5 100644
> > > --- a/drivers/virtio/virtio_pci_common.h
> > > +++ b/drivers/virtio/virtio_pci_common.h
> > > @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
> > >       return container_of(vdev, struct virtio_pci_device, vdev);
> > >  }
> > >
> > > -/* wait for pending irq handlers */
> > > -void vp_synchronize_vectors(struct virtio_device *vdev);
> > > +/* disable irq handlers */
> > > +void vp_disable_vectors(struct virtio_device *vdev);
> > > +/* enable irq handlers */
> > > +void vp_enable_vectors(struct virtio_device *vdev);
> > >  /* the notify function used when creating a virt queue */
> > >  bool vp_notify(struct virtqueue *vq);
> > >  /* the config->del_vqs() implementation */
> > > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > > index d62e9835aeec..bdf6bc667ab5 100644
> > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > @@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
> > >       /* Flush out the status write, and flush in device writes,
> > >        * including MSi-X interrupts, if any. */
> > >       ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
> > > -     /* Flush pending VQ/configuration callbacks. */
> > > -     vp_synchronize_vectors(vdev);
> > > +     /* Disable VQ/configuration callbacks. */
> > > +     vp_disable_vectors(vdev);
> > >  }
> > >
> > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > @@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
> > >  }
> > >
> > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > +     .ready          = vp_enable_vectors,
> > >       .get            = vp_get,
> > >       .set            = vp_set,
> > >       .get_status     = vp_get_status,
> > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > index 30654d3a0b41..acf0f6b6381d 100644
> > > --- a/drivers/virtio/virtio_pci_modern.c
> > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > @@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
> > >        */
> > >       while (vp_modern_get_status(mdev))
> > >               msleep(1);
> > > -     /* Flush pending VQ/configuration callbacks. */
> > > -     vp_synchronize_vectors(vdev);
> > > +     /* Disable VQ/configuration callbacks. */
> > > +     vp_disable_vectors(vdev);
> > >  }
> > >
> > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > @@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
> > >  }
> > >
> > >  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > +     .ready          = vp_enable_vectors,
> > >       .get            = NULL,
> > >       .set            = NULL,
> > >       .generation     = vp_generation,
> > > @@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > >  };
> > >
> > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > +     .ready          = vp_enable_vectors,
> > >       .get            = vp_get,
> > >       .set            = vp_set,
> > >       .generation     = vp_generation,
> > > --
> > > 2.25.1
> >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13  5:53 ` [PATCH 7/9] virtio-pci: harden INTX interrupts Jason Wang
@ 2021-09-13  6:33   ` Michael S. Tsirkin
  2021-09-13  6:36     ` Jason Wang
  2021-09-13 21:36   ` Thomas Gleixner
  1 sibling, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13  6:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

On Mon, Sep 13, 2021 at 01:53:51PM +0800, Jason Wang wrote:
> This patch tries to make sure the virtio interrupt handler for INTX
> won't be called after a reset and before virtio_device_ready(). We
> can't use IRQF_NO_AUTOEN since we're using shared interrupt
> (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> intx_soft_enabled variable and toggle it during in
> vp_disable/enable_vectors(). The INTX interrupt handler will check
> intx_soft_enabled before processing the actual interrupt.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Not all that excited about all the memory barriers for something
that should be an extremely rare event (for most kernels -
literally once per boot). Can't we do better?

> ---
>  drivers/virtio/virtio_pci_common.c | 18 ++++++++++++++++--
>  drivers/virtio/virtio_pci_common.h |  1 +
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index 0b9523e6dd39..835197151dc1 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -30,8 +30,12 @@ void vp_disable_vectors(struct virtio_device *vdev)
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  	int i;
>  
> -	if (vp_dev->intx_enabled)
> +	if (vp_dev->intx_enabled) {
> +		vp_dev->intx_soft_enabled = false;
> +		/* ensure the vp_interrupt see this intx_soft_enabled value */
> +		smp_wmb();
>  		synchronize_irq(vp_dev->pci_dev->irq);
> +	}
>  
>  	for (i = 0; i < vp_dev->msix_vectors; ++i)
>  		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> @@ -43,8 +47,12 @@ void vp_enable_vectors(struct virtio_device *vdev)
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  	int i;
>  
> -	if (vp_dev->intx_enabled)
> +	if (vp_dev->intx_enabled) {
> +		vp_dev->intx_soft_enabled = true;
> +		/* ensure the vp_interrupt see this intx_soft_enabled value */
> +		smp_wmb();
>  		return;
> +	}
>  
>  	for (i = 0; i < vp_dev->msix_vectors; ++i)
>  		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> @@ -97,6 +105,12 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
>  	struct virtio_pci_device *vp_dev = opaque;
>  	u8 isr;
>  
> +	if (!vp_dev->intx_soft_enabled)
> +		return IRQ_NONE;
> +
> +	/* read intx_soft_enabled before read others */
> +	smp_rmb();
> +
>  	/* reading the ISR has the effect of also clearing it so it's very
>  	 * important to save off the value. */
>  	isr = ioread8(vp_dev->isr);
> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> index a235ce9ff6a5..3c06e0f92ee4 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -64,6 +64,7 @@ struct virtio_pci_device {
>  	/* MSI-X support */
>  	int msix_enabled;
>  	int intx_enabled;
> +	bool intx_soft_enabled;
>  	cpumask_var_t *msix_affinity_masks;
>  	/* Name strings for interrupts. This size should be enough,
>  	 * and I'm too lazy to allocate each name separately. */
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  6:28       ` Michael S. Tsirkin
@ 2021-09-13  6:34         ` Jason Wang
  2021-09-13  6:37           ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-09-13  6:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 2:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 02:08:02PM +0800, Jason Wang wrote:
> > On Mon, Sep 13, 2021 at 2:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 13, 2021 at 01:53:50PM +0800, Jason Wang wrote:
> > > > We used to synchronize pending MSI-X irq handlers via
> > > > synchronize_irq(), this may not work for the untrusted device which
> > > > may keep sending interrupts after reset which may lead unexpected
> > > > results. Similarly, we should not enable MSI-X interrupt until the
> > > > device is ready. So this patch fixes those two issues by:
> > > >
> > > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > >    handlers to be called after the device is reset.
> > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > >
> > > > This can make sure the virtio interrupt handler won't be called before
> > > > virtio_device_ready() and after reset.
> > > >
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > >
> > > I don't get the threat model here. Isn't disabling irqs done by the
> > > hypervisor anyway? Is there a reason to trust disable_irq but not
> > > device reset?
> >
> > My understanding is that e.g in the case of SEV/TDX we don't trust the
> > hypervisor. So the hypervisor can keep sending interrupts even if the
> > device is reset. The guest can only trust its own software interrupt
> > management logic to avoid call virtio callback in this case.
> >
> > Thanks
>
> Hmm but I don't see how do these patches do this.
> They call disable_irq but can't the hypervisor keep
> sending interrupts after disable_irq, too?

Yes, but since the irq is disabled, the vring or config callback won't
be called in this case.

Thanks

>
>
>
> > >
> > > Cc a bunch more people ...
> > >
> > >
> > > > ---
> > > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > index b35bb2d57f62..0b9523e6dd39 100644
> > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > >                "Force legacy mode for transitional virtio 1 devices");
> > > >  #endif
> > > >
> > > > -/* wait for pending irq handlers */
> > > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > +/* disable irq handlers */
> > > > +void vp_disable_vectors(struct virtio_device *vdev)
> > > >  {
> > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > >       int i;
> > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > >
> > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > -             synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > +             disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > +}
> > > > +
> > > > +/* enable irq handlers */
> > > > +void vp_enable_vectors(struct virtio_device *vdev)
> > > > +{
> > > > +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > +     int i;
> > > > +
> > > > +     if (vp_dev->intx_enabled)
> > > > +             return;
> > > > +
> > > > +     for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > +             enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > >  }
> > > >
> > > >  /* the notify function used when creating a virt queue */
> > > > @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > >       snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > >                "%s-config", name);
> > > >       err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > -                       vp_config_changed, 0, vp_dev->msix_names[v],
> > > > +                       vp_config_changed, IRQF_NO_AUTOEN,
> > > > +                       vp_dev->msix_names[v],
> > > >                         vp_dev);
> > > >       if (err)
> > > >               goto error;
> > > > @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > >               snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > >                        "%s-virtqueues", name);
> > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > -                               vp_vring_interrupt, 0, vp_dev->msix_names[v],
> > > > +                               vp_vring_interrupt, IRQF_NO_AUTOEN,
> > > > +                               vp_dev->msix_names[v],
> > > >                                 vp_dev);
> > > >               if (err)
> > > >                       goto error;
> > > > @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
> > > >                        "%s-%s",
> > > >                        dev_name(&vp_dev->vdev.dev), names[i]);
> > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> > > > -                               vring_interrupt, 0,
> > > > +                               vring_interrupt, IRQF_NO_AUTOEN,
> > > >                                 vp_dev->msix_names[msix_vec],
> > > >                                 vqs[i]);
> > > >               if (err)
> > > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > > index beec047a8f8d..a235ce9ff6a5 100644
> > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
> > > >       return container_of(vdev, struct virtio_pci_device, vdev);
> > > >  }
> > > >
> > > > -/* wait for pending irq handlers */
> > > > -void vp_synchronize_vectors(struct virtio_device *vdev);
> > > > +/* disable irq handlers */
> > > > +void vp_disable_vectors(struct virtio_device *vdev);
> > > > +/* enable irq handlers */
> > > > +void vp_enable_vectors(struct virtio_device *vdev);
> > > >  /* the notify function used when creating a virt queue */
> > > >  bool vp_notify(struct virtqueue *vq);
> > > >  /* the config->del_vqs() implementation */
> > > > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > > > index d62e9835aeec..bdf6bc667ab5 100644
> > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > @@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > >       /* Flush out the status write, and flush in device writes,
> > > >        * including MSi-X interrupts, if any. */
> > > >       ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
> > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > -     vp_synchronize_vectors(vdev);
> > > > +     /* Disable VQ/configuration callbacks. */
> > > > +     vp_disable_vectors(vdev);
> > > >  }
> > > >
> > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > @@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
> > > >  }
> > > >
> > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > +     .ready          = vp_enable_vectors,
> > > >       .get            = vp_get,
> > > >       .set            = vp_set,
> > > >       .get_status     = vp_get_status,
> > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > index 30654d3a0b41..acf0f6b6381d 100644
> > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > @@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > >        */
> > > >       while (vp_modern_get_status(mdev))
> > > >               msleep(1);
> > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > -     vp_synchronize_vectors(vdev);
> > > > +     /* Disable VQ/configuration callbacks. */
> > > > +     vp_disable_vectors(vdev);
> > > >  }
> > > >
> > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > @@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
> > > >  }
> > > >
> > > >  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > +     .ready          = vp_enable_vectors,
> > > >       .get            = NULL,
> > > >       .set            = NULL,
> > > >       .generation     = vp_generation,
> > > > @@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > >  };
> > > >
> > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > +     .ready          = vp_enable_vectors,
> > > >       .get            = vp_get,
> > > >       .set            = vp_set,
> > > >       .generation     = vp_generation,
> > > > --
> > > > 2.25.1
> > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 9/9] virtio_ring: validate used buffer length
  2021-09-13  5:53 ` [PATCH 9/9] virtio_ring: validate used buffer length Jason Wang
@ 2021-09-13  6:36   ` Michael S. Tsirkin
  2021-09-13  6:40     ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13  6:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

On Mon, Sep 13, 2021 at 01:53:53PM +0800, Jason Wang wrote:
> This patch validate the used buffer length provided by the device
> before trying to use it. This is done by record the in buffer length
> in a new field in desc_state structure during virtqueue_add(), then we
> can fail the virtqueue_get_buf() when we find the device is trying to
> give us a used buffer length which is greater than the in buffer
> length.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Hmm this was proposed in the past. The overhead here is
not negligeable, so I'd like to know more -
when is it a problem if the used len is too big?
Don't the affected drivers already track the length somewhere
and so can validated it without the extra cost in
virtio core?

> ---
>  drivers/virtio/virtio_ring.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d2ca0a7365f8..b8374a6144f3 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -69,6 +69,7 @@
>  struct vring_desc_state_split {
>  	void *data;			/* Data for callback. */
>  	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
> +	u64 buflen;			/* In buffer length */
>  };
>  
>  struct vring_desc_state_packed {
> @@ -76,6 +77,7 @@ struct vring_desc_state_packed {
>  	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
>  	u16 num;			/* Descriptor list length. */
>  	u16 last;			/* The last desc state in a list. */
> +	u64 buflen;			/* In buffer length */
>  };
>  
>  struct vring_desc_extra {
> @@ -490,6 +492,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  	unsigned int i, n, avail, descs_used, prev, err_idx;
>  	int head;
>  	bool indirect;
> +	u64 buflen = 0;
>  
>  	START_USE(vq);
>  
> @@ -571,6 +574,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  						     VRING_DESC_F_NEXT |
>  						     VRING_DESC_F_WRITE,
>  						     indirect);
> +			buflen += sg->length;
>  		}
>  	}
>  	/* Last one doesn't continue. */
> @@ -605,6 +609,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  
>  	/* Store token and indirect buffer state. */
>  	vq->split.desc_state[head].data = data;
> +	vq->split.desc_state[head].buflen = buflen;
>  	if (indirect)
>  		vq->split.desc_state[head].indir_desc = desc;
>  	else
> @@ -784,6 +789,11 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>  		BAD_RING(vq, "id %u is not a head!\n", i);
>  		return NULL;
>  	}
> +	if (unlikely(*len > vq->split.desc_state[i].buflen)) {
> +		BAD_RING(vq, "used len %d is larger than in buflen %lld\n",
> +			*len, vq->split.desc_state[i].buflen);
> +		return NULL;
> +	}
>  
>  	/* detach_buf_split clears data, so grab it now. */
>  	ret = vq->split.desc_state[i].data;
> @@ -1062,6 +1072,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>  	unsigned int i, n, err_idx;
>  	u16 head, id;
>  	dma_addr_t addr;
> +	u64 buflen = 0;
>  
>  	head = vq->packed.next_avail_idx;
>  	desc = alloc_indirect_packed(total_sg, gfp);
> @@ -1089,6 +1100,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>  			desc[i].addr = cpu_to_le64(addr);
>  			desc[i].len = cpu_to_le32(sg->length);
>  			i++;
> +			if (n >= out_sgs)
> +				buflen += sg->length;
>  		}
>  	}
>  
> @@ -1141,6 +1154,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>  	vq->packed.desc_state[id].data = data;
>  	vq->packed.desc_state[id].indir_desc = desc;
>  	vq->packed.desc_state[id].last = id;
> +	vq->packed.desc_state[id].buflen = buflen;
>  
>  	vq->num_added += 1;
>  
> @@ -1176,6 +1190,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>  	unsigned int i, n, c, descs_used, err_idx;
>  	__le16 head_flags, flags;
>  	u16 head, id, prev, curr, avail_used_flags;
> +	u64 buflen = 0;
>  
>  	START_USE(vq);
>  
> @@ -1250,6 +1265,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>  					1 << VRING_PACKED_DESC_F_AVAIL |
>  					1 << VRING_PACKED_DESC_F_USED;
>  			}
> +			if (n >= out_sgs)
> +				buflen += sg->length;
>  		}
>  	}
>  
> @@ -1268,6 +1285,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>  	vq->packed.desc_state[id].data = data;
>  	vq->packed.desc_state[id].indir_desc = ctx;
>  	vq->packed.desc_state[id].last = prev;
> +	vq->packed.desc_state[id].buflen = buflen;
>  
>  	/*
>  	 * A driver MUST NOT make the first descriptor in the list
> @@ -1455,6 +1473,11 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>  		BAD_RING(vq, "id %u is not a head!\n", id);
>  		return NULL;
>  	}
> +	if (unlikely(*len > vq->packed.desc_state[id].buflen)) {
> +		BAD_RING(vq, "used len %d is larger than in buflen %lld\n",
> +			*len, vq->packed.desc_state[id].buflen);
> +		return NULL;
> +	}
>  
>  	/* detach_buf_packed clears data, so grab it now. */
>  	ret = vq->packed.desc_state[id].data;
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13  6:33   ` Michael S. Tsirkin
@ 2021-09-13  6:36     ` Jason Wang
  2021-09-13  6:41       ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-09-13  6:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Mon, Sep 13, 2021 at 2:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 01:53:51PM +0800, Jason Wang wrote:
> > This patch tries to make sure the virtio interrupt handler for INTX
> > won't be called after a reset and before virtio_device_ready(). We
> > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > intx_soft_enabled variable and toggle it during in
> > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > intx_soft_enabled before processing the actual interrupt.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
>
> Not all that excited about all the memory barriers for something
> that should be an extremely rare event (for most kernels -
> literally once per boot). Can't we do better?

I'm not sure, but do we need to care about the slow path (INTX)?

(Or do you have a better approach?)

Thanks

>
> > ---
> >  drivers/virtio/virtio_pci_common.c | 18 ++++++++++++++++--
> >  drivers/virtio/virtio_pci_common.h |  1 +
> >  2 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > index 0b9523e6dd39..835197151dc1 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -30,8 +30,12 @@ void vp_disable_vectors(struct virtio_device *vdev)
> >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >       int i;
> >
> > -     if (vp_dev->intx_enabled)
> > +     if (vp_dev->intx_enabled) {
> > +             vp_dev->intx_soft_enabled = false;
> > +             /* ensure the vp_interrupt see this intx_soft_enabled value */
> > +             smp_wmb();
> >               synchronize_irq(vp_dev->pci_dev->irq);
> > +     }
> >
> >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> >               disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > @@ -43,8 +47,12 @@ void vp_enable_vectors(struct virtio_device *vdev)
> >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >       int i;
> >
> > -     if (vp_dev->intx_enabled)
> > +     if (vp_dev->intx_enabled) {
> > +             vp_dev->intx_soft_enabled = true;
> > +             /* ensure the vp_interrupt see this intx_soft_enabled value */
> > +             smp_wmb();
> >               return;
> > +     }
> >
> >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> >               enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > @@ -97,6 +105,12 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
> >       struct virtio_pci_device *vp_dev = opaque;
> >       u8 isr;
> >
> > +     if (!vp_dev->intx_soft_enabled)
> > +             return IRQ_NONE;
> > +
> > +     /* read intx_soft_enabled before read others */
> > +     smp_rmb();
> > +
> >       /* reading the ISR has the effect of also clearing it so it's very
> >        * important to save off the value. */
> >       isr = ioread8(vp_dev->isr);
> > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > index a235ce9ff6a5..3c06e0f92ee4 100644
> > --- a/drivers/virtio/virtio_pci_common.h
> > +++ b/drivers/virtio/virtio_pci_common.h
> > @@ -64,6 +64,7 @@ struct virtio_pci_device {
> >       /* MSI-X support */
> >       int msix_enabled;
> >       int intx_enabled;
> > +     bool intx_soft_enabled;
> >       cpumask_var_t *msix_affinity_masks;
> >       /* Name strings for interrupts. This size should be enough,
> >        * and I'm too lazy to allocate each name separately. */
> > --
> > 2.25.1
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  6:34         ` Jason Wang
@ 2021-09-13  6:37           ` Michael S. Tsirkin
  2021-09-13  6:43             ` Jason Wang
  2021-09-13  6:50             ` Michael S. Tsirkin
  0 siblings, 2 replies; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13  6:37 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 02:34:01PM +0800, Jason Wang wrote:
> On Mon, Sep 13, 2021 at 2:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Sep 13, 2021 at 02:08:02PM +0800, Jason Wang wrote:
> > > On Mon, Sep 13, 2021 at 2:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 13, 2021 at 01:53:50PM +0800, Jason Wang wrote:
> > > > > We used to synchronize pending MSI-X irq handlers via
> > > > > synchronize_irq(), this may not work for the untrusted device which
> > > > > may keep sending interrupts after reset which may lead unexpected
> > > > > results. Similarly, we should not enable MSI-X interrupt until the
> > > > > device is ready. So this patch fixes those two issues by:
> > > > >
> > > > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > > >    handlers to be called after the device is reset.
> > > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > > >
> > > > > This can make sure the virtio interrupt handler won't be called before
> > > > > virtio_device_ready() and after reset.
> > > > >
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > >
> > > > I don't get the threat model here. Isn't disabling irqs done by the
> > > > hypervisor anyway? Is there a reason to trust disable_irq but not
> > > > device reset?
> > >
> > > My understanding is that e.g in the case of SEV/TDX we don't trust the
> > > hypervisor. So the hypervisor can keep sending interrupts even if the
> > > device is reset. The guest can only trust its own software interrupt
> > > management logic to avoid call virtio callback in this case.
> > >
> > > Thanks
> >
> > Hmm but I don't see how do these patches do this.
> > They call disable_irq but can't the hypervisor keep
> > sending interrupts after disable_irq, too?
> 
> Yes, but since the irq is disabled, the vring or config callback won't
> be called in this case.
> 
> Thanks

But doen't "irq is disabled" basically mean "we told the hypervisor
to disable the irq"?  What extractly prevents hypervisor from
sending the irq even if guest thinks it disabled it?

> >
> >
> >
> > > >
> > > > Cc a bunch more people ...
> > > >
> > > >
> > > > > ---
> > > > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > > > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > index b35bb2d57f62..0b9523e6dd39 100644
> > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > > >                "Force legacy mode for transitional virtio 1 devices");
> > > > >  #endif
> > > > >
> > > > > -/* wait for pending irq handlers */
> > > > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > +/* disable irq handlers */
> > > > > +void vp_disable_vectors(struct virtio_device *vdev)
> > > > >  {
> > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > >       int i;
> > > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > > >
> > > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > -             synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > +             disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > +}
> > > > > +
> > > > > +/* enable irq handlers */
> > > > > +void vp_enable_vectors(struct virtio_device *vdev)
> > > > > +{
> > > > > +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > +     int i;
> > > > > +
> > > > > +     if (vp_dev->intx_enabled)
> > > > > +             return;
> > > > > +
> > > > > +     for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > +             enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > >  }
> > > > >
> > > > >  /* the notify function used when creating a virt queue */
> > > > > @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > >       snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > >                "%s-config", name);
> > > > >       err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > -                       vp_config_changed, 0, vp_dev->msix_names[v],
> > > > > +                       vp_config_changed, IRQF_NO_AUTOEN,
> > > > > +                       vp_dev->msix_names[v],
> > > > >                         vp_dev);
> > > > >       if (err)
> > > > >               goto error;
> > > > > @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > >               snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > >                        "%s-virtqueues", name);
> > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > -                               vp_vring_interrupt, 0, vp_dev->msix_names[v],
> > > > > +                               vp_vring_interrupt, IRQF_NO_AUTOEN,
> > > > > +                               vp_dev->msix_names[v],
> > > > >                                 vp_dev);
> > > > >               if (err)
> > > > >                       goto error;
> > > > > @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
> > > > >                        "%s-%s",
> > > > >                        dev_name(&vp_dev->vdev.dev), names[i]);
> > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> > > > > -                               vring_interrupt, 0,
> > > > > +                               vring_interrupt, IRQF_NO_AUTOEN,
> > > > >                                 vp_dev->msix_names[msix_vec],
> > > > >                                 vqs[i]);
> > > > >               if (err)
> > > > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > > > index beec047a8f8d..a235ce9ff6a5 100644
> > > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > > @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
> > > > >       return container_of(vdev, struct virtio_pci_device, vdev);
> > > > >  }
> > > > >
> > > > > -/* wait for pending irq handlers */
> > > > > -void vp_synchronize_vectors(struct virtio_device *vdev);
> > > > > +/* disable irq handlers */
> > > > > +void vp_disable_vectors(struct virtio_device *vdev);
> > > > > +/* enable irq handlers */
> > > > > +void vp_enable_vectors(struct virtio_device *vdev);
> > > > >  /* the notify function used when creating a virt queue */
> > > > >  bool vp_notify(struct virtqueue *vq);
> > > > >  /* the config->del_vqs() implementation */
> > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > > > > index d62e9835aeec..bdf6bc667ab5 100644
> > > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > > @@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > >       /* Flush out the status write, and flush in device writes,
> > > > >        * including MSi-X interrupts, if any. */
> > > > >       ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
> > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > -     vp_synchronize_vectors(vdev);
> > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > +     vp_disable_vectors(vdev);
> > > > >  }
> > > > >
> > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > @@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
> > > > >  }
> > > > >
> > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > +     .ready          = vp_enable_vectors,
> > > > >       .get            = vp_get,
> > > > >       .set            = vp_set,
> > > > >       .get_status     = vp_get_status,
> > > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > > index 30654d3a0b41..acf0f6b6381d 100644
> > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > @@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > >        */
> > > > >       while (vp_modern_get_status(mdev))
> > > > >               msleep(1);
> > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > -     vp_synchronize_vectors(vdev);
> > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > +     vp_disable_vectors(vdev);
> > > > >  }
> > > > >
> > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > @@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
> > > > >  }
> > > > >
> > > > >  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > +     .ready          = vp_enable_vectors,
> > > > >       .get            = NULL,
> > > > >       .set            = NULL,
> > > > >       .generation     = vp_generation,
> > > > > @@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > >  };
> > > > >
> > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > +     .ready          = vp_enable_vectors,
> > > > >       .get            = vp_get,
> > > > >       .set            = vp_set,
> > > > >       .generation     = vp_generation,
> > > > > --
> > > > > 2.25.1
> > > >
> >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 9/9] virtio_ring: validate used buffer length
  2021-09-13  6:36   ` Michael S. Tsirkin
@ 2021-09-13  6:40     ` Jason Wang
  2021-09-13  6:57       ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-09-13  6:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Mon, Sep 13, 2021 at 2:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 01:53:53PM +0800, Jason Wang wrote:
> > This patch validate the used buffer length provided by the device
> > before trying to use it. This is done by record the in buffer length
> > in a new field in desc_state structure during virtqueue_add(), then we
> > can fail the virtqueue_get_buf() when we find the device is trying to
> > give us a used buffer length which is greater than the in buffer
> > length.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
> Hmm this was proposed in the past. The overhead here is
> not negligeable, so I'd like to know more -
> when is it a problem if the used len is too big?

One example is: https://github.com/fuzzsa/fuzzsa-bugs/blob/master/virtio_rng.md

And there would be more I guess.

> Don't the affected drivers already track the length somewhere
> and so can validated it without the extra cost in
> virtio core?

Probably, but this requires the changes in each device driver. And it
would be easily forgotten if new drivers are introduced?

Thanks

>
> > ---
> >  drivers/virtio/virtio_ring.c | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index d2ca0a7365f8..b8374a6144f3 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -69,6 +69,7 @@
> >  struct vring_desc_state_split {
> >       void *data;                     /* Data for callback. */
> >       struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > +     u64 buflen;                     /* In buffer length */
> >  };
> >
> >  struct vring_desc_state_packed {
> > @@ -76,6 +77,7 @@ struct vring_desc_state_packed {
> >       struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> >       u16 num;                        /* Descriptor list length. */
> >       u16 last;                       /* The last desc state in a list. */
> > +     u64 buflen;                     /* In buffer length */
> >  };
> >
> >  struct vring_desc_extra {
> > @@ -490,6 +492,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >       unsigned int i, n, avail, descs_used, prev, err_idx;
> >       int head;
> >       bool indirect;
> > +     u64 buflen = 0;
> >
> >       START_USE(vq);
> >
> > @@ -571,6 +574,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                                                    VRING_DESC_F_NEXT |
> >                                                    VRING_DESC_F_WRITE,
> >                                                    indirect);
> > +                     buflen += sg->length;
> >               }
> >       }
> >       /* Last one doesn't continue. */
> > @@ -605,6 +609,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >
> >       /* Store token and indirect buffer state. */
> >       vq->split.desc_state[head].data = data;
> > +     vq->split.desc_state[head].buflen = buflen;
> >       if (indirect)
> >               vq->split.desc_state[head].indir_desc = desc;
> >       else
> > @@ -784,6 +789,11 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> >               BAD_RING(vq, "id %u is not a head!\n", i);
> >               return NULL;
> >       }
> > +     if (unlikely(*len > vq->split.desc_state[i].buflen)) {
> > +             BAD_RING(vq, "used len %d is larger than in buflen %lld\n",
> > +                     *len, vq->split.desc_state[i].buflen);
> > +             return NULL;
> > +     }
> >
> >       /* detach_buf_split clears data, so grab it now. */
> >       ret = vq->split.desc_state[i].data;
> > @@ -1062,6 +1072,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >       unsigned int i, n, err_idx;
> >       u16 head, id;
> >       dma_addr_t addr;
> > +     u64 buflen = 0;
> >
> >       head = vq->packed.next_avail_idx;
> >       desc = alloc_indirect_packed(total_sg, gfp);
> > @@ -1089,6 +1100,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >                       desc[i].addr = cpu_to_le64(addr);
> >                       desc[i].len = cpu_to_le32(sg->length);
> >                       i++;
> > +                     if (n >= out_sgs)
> > +                             buflen += sg->length;
> >               }
> >       }
> >
> > @@ -1141,6 +1154,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >       vq->packed.desc_state[id].data = data;
> >       vq->packed.desc_state[id].indir_desc = desc;
> >       vq->packed.desc_state[id].last = id;
> > +     vq->packed.desc_state[id].buflen = buflen;
> >
> >       vq->num_added += 1;
> >
> > @@ -1176,6 +1190,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >       unsigned int i, n, c, descs_used, err_idx;
> >       __le16 head_flags, flags;
> >       u16 head, id, prev, curr, avail_used_flags;
> > +     u64 buflen = 0;
> >
> >       START_USE(vq);
> >
> > @@ -1250,6 +1265,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >                                       1 << VRING_PACKED_DESC_F_AVAIL |
> >                                       1 << VRING_PACKED_DESC_F_USED;
> >                       }
> > +                     if (n >= out_sgs)
> > +                             buflen += sg->length;
> >               }
> >       }
> >
> > @@ -1268,6 +1285,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >       vq->packed.desc_state[id].data = data;
> >       vq->packed.desc_state[id].indir_desc = ctx;
> >       vq->packed.desc_state[id].last = prev;
> > +     vq->packed.desc_state[id].buflen = buflen;
> >
> >       /*
> >        * A driver MUST NOT make the first descriptor in the list
> > @@ -1455,6 +1473,11 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> >               BAD_RING(vq, "id %u is not a head!\n", id);
> >               return NULL;
> >       }
> > +     if (unlikely(*len > vq->packed.desc_state[id].buflen)) {
> > +             BAD_RING(vq, "used len %d is larger than in buflen %lld\n",
> > +                     *len, vq->packed.desc_state[id].buflen);
> > +             return NULL;
> > +     }
> >
> >       /* detach_buf_packed clears data, so grab it now. */
> >       ret = vq->packed.desc_state[id].data;
> > --
> > 2.25.1
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13  6:36     ` Jason Wang
@ 2021-09-13  6:41       ` Michael S. Tsirkin
  2021-09-13  6:45         ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13  6:41 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Mon, Sep 13, 2021 at 02:36:54PM +0800, Jason Wang wrote:
> On Mon, Sep 13, 2021 at 2:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Sep 13, 2021 at 01:53:51PM +0800, Jason Wang wrote:
> > > This patch tries to make sure the virtio interrupt handler for INTX
> > > won't be called after a reset and before virtio_device_ready(). We
> > > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > > intx_soft_enabled variable and toggle it during in
> > > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > > intx_soft_enabled before processing the actual interrupt.
> > >
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> >
> >
> > Not all that excited about all the memory barriers for something
> > that should be an extremely rare event (for most kernels -
> > literally once per boot). Can't we do better?
> 
> I'm not sure, but do we need to care about the slow path (INTX)?

Otherwise we won't try to support this, right?

> (Or do you have a better approach?)
> 
> Thanks

Don't know really, maybe rcu or whatever?
But let's try to be much more specific - is there anything
specific we are trying to protect against here?



> >
> > > ---
> > >  drivers/virtio/virtio_pci_common.c | 18 ++++++++++++++++--
> > >  drivers/virtio/virtio_pci_common.h |  1 +
> > >  2 files changed, 17 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > index 0b9523e6dd39..835197151dc1 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -30,8 +30,12 @@ void vp_disable_vectors(struct virtio_device *vdev)
> > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > >       int i;
> > >
> > > -     if (vp_dev->intx_enabled)
> > > +     if (vp_dev->intx_enabled) {
> > > +             vp_dev->intx_soft_enabled = false;
> > > +             /* ensure the vp_interrupt see this intx_soft_enabled value */
> > > +             smp_wmb();
> > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > +     }
> > >
> > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > >               disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > @@ -43,8 +47,12 @@ void vp_enable_vectors(struct virtio_device *vdev)
> > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > >       int i;
> > >
> > > -     if (vp_dev->intx_enabled)
> > > +     if (vp_dev->intx_enabled) {
> > > +             vp_dev->intx_soft_enabled = true;
> > > +             /* ensure the vp_interrupt see this intx_soft_enabled value */
> > > +             smp_wmb();
> > >               return;
> > > +     }
> > >
> > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > >               enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > @@ -97,6 +105,12 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
> > >       struct virtio_pci_device *vp_dev = opaque;
> > >       u8 isr;
> > >
> > > +     if (!vp_dev->intx_soft_enabled)
> > > +             return IRQ_NONE;
> > > +
> > > +     /* read intx_soft_enabled before read others */
> > > +     smp_rmb();
> > > +
> > >       /* reading the ISR has the effect of also clearing it so it's very
> > >        * important to save off the value. */
> > >       isr = ioread8(vp_dev->isr);
> > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > index a235ce9ff6a5..3c06e0f92ee4 100644
> > > --- a/drivers/virtio/virtio_pci_common.h
> > > +++ b/drivers/virtio/virtio_pci_common.h
> > > @@ -64,6 +64,7 @@ struct virtio_pci_device {
> > >       /* MSI-X support */
> > >       int msix_enabled;
> > >       int intx_enabled;
> > > +     bool intx_soft_enabled;
> > >       cpumask_var_t *msix_affinity_masks;
> > >       /* Name strings for interrupts. This size should be enough,
> > >        * and I'm too lazy to allocate each name separately. */
> > > --
> > > 2.25.1
> >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  6:37           ` Michael S. Tsirkin
@ 2021-09-13  6:43             ` Jason Wang
  2021-09-13  7:01               ` Michael S. Tsirkin
  2021-09-13  6:50             ` Michael S. Tsirkin
  1 sibling, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-09-13  6:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 2:37 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 02:34:01PM +0800, Jason Wang wrote:
> > On Mon, Sep 13, 2021 at 2:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 13, 2021 at 02:08:02PM +0800, Jason Wang wrote:
> > > > On Mon, Sep 13, 2021 at 2:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 13, 2021 at 01:53:50PM +0800, Jason Wang wrote:
> > > > > > We used to synchronize pending MSI-X irq handlers via
> > > > > > synchronize_irq(), this may not work for the untrusted device which
> > > > > > may keep sending interrupts after reset which may lead unexpected
> > > > > > results. Similarly, we should not enable MSI-X interrupt until the
> > > > > > device is ready. So this patch fixes those two issues by:
> > > > > >
> > > > > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > > > >    handlers to be called after the device is reset.
> > > > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > > > >
> > > > > > This can make sure the virtio interrupt handler won't be called before
> > > > > > virtio_device_ready() and after reset.
> > > > > >
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > >
> > > > > I don't get the threat model here. Isn't disabling irqs done by the
> > > > > hypervisor anyway? Is there a reason to trust disable_irq but not
> > > > > device reset?
> > > >
> > > > My understanding is that e.g in the case of SEV/TDX we don't trust the
> > > > hypervisor. So the hypervisor can keep sending interrupts even if the
> > > > device is reset. The guest can only trust its own software interrupt
> > > > management logic to avoid call virtio callback in this case.
> > > >
> > > > Thanks
> > >
> > > Hmm but I don't see how do these patches do this.
> > > They call disable_irq but can't the hypervisor keep
> > > sending interrupts after disable_irq, too?
> >
> > Yes, but since the irq is disabled, the vring or config callback won't
> > be called in this case.
> >
> > Thanks
>
> But doen't "irq is disabled" basically mean "we told the hypervisor
> to disable the irq"?  What extractly prevents hypervisor from
> sending the irq even if guest thinks it disabled it?

It can't prevent the hypersior from sending irq. But it can make sure
the irq descriptor is disabled (e.g IRQD_IRQ_DISABLED). Is this
sufficient?

Thanks

>
> > >
> > >
> > >
> > > > >
> > > > > Cc a bunch more people ...
> > > > >
> > > > >
> > > > > > ---
> > > > > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > > > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > > > > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > > index b35bb2d57f62..0b9523e6dd39 100644
> > > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > > > >                "Force legacy mode for transitional virtio 1 devices");
> > > > > >  #endif
> > > > > >
> > > > > > -/* wait for pending irq handlers */
> > > > > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > > +/* disable irq handlers */
> > > > > > +void vp_disable_vectors(struct virtio_device *vdev)
> > > > > >  {
> > > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > >       int i;
> > > > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > > > >
> > > > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > > -             synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > +             disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > +}
> > > > > > +
> > > > > > +/* enable irq handlers */
> > > > > > +void vp_enable_vectors(struct virtio_device *vdev)
> > > > > > +{
> > > > > > +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > +     int i;
> > > > > > +
> > > > > > +     if (vp_dev->intx_enabled)
> > > > > > +             return;
> > > > > > +
> > > > > > +     for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > > +             enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > >  }
> > > > > >
> > > > > >  /* the notify function used when creating a virt queue */
> > > > > > @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > > >       snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > > >                "%s-config", name);
> > > > > >       err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > > -                       vp_config_changed, 0, vp_dev->msix_names[v],
> > > > > > +                       vp_config_changed, IRQF_NO_AUTOEN,
> > > > > > +                       vp_dev->msix_names[v],
> > > > > >                         vp_dev);
> > > > > >       if (err)
> > > > > >               goto error;
> > > > > > @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > > >               snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > > >                        "%s-virtqueues", name);
> > > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > > -                               vp_vring_interrupt, 0, vp_dev->msix_names[v],
> > > > > > +                               vp_vring_interrupt, IRQF_NO_AUTOEN,
> > > > > > +                               vp_dev->msix_names[v],
> > > > > >                                 vp_dev);
> > > > > >               if (err)
> > > > > >                       goto error;
> > > > > > @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
> > > > > >                        "%s-%s",
> > > > > >                        dev_name(&vp_dev->vdev.dev), names[i]);
> > > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> > > > > > -                               vring_interrupt, 0,
> > > > > > +                               vring_interrupt, IRQF_NO_AUTOEN,
> > > > > >                                 vp_dev->msix_names[msix_vec],
> > > > > >                                 vqs[i]);
> > > > > >               if (err)
> > > > > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > > > > index beec047a8f8d..a235ce9ff6a5 100644
> > > > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > > > @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
> > > > > >       return container_of(vdev, struct virtio_pci_device, vdev);
> > > > > >  }
> > > > > >
> > > > > > -/* wait for pending irq handlers */
> > > > > > -void vp_synchronize_vectors(struct virtio_device *vdev);
> > > > > > +/* disable irq handlers */
> > > > > > +void vp_disable_vectors(struct virtio_device *vdev);
> > > > > > +/* enable irq handlers */
> > > > > > +void vp_enable_vectors(struct virtio_device *vdev);
> > > > > >  /* the notify function used when creating a virt queue */
> > > > > >  bool vp_notify(struct virtqueue *vq);
> > > > > >  /* the config->del_vqs() implementation */
> > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > > > > > index d62e9835aeec..bdf6bc667ab5 100644
> > > > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > > > @@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > > >       /* Flush out the status write, and flush in device writes,
> > > > > >        * including MSi-X interrupts, if any. */
> > > > > >       ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
> > > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > > -     vp_synchronize_vectors(vdev);
> > > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > > +     vp_disable_vectors(vdev);
> > > > > >  }
> > > > > >
> > > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > > @@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
> > > > > >  }
> > > > > >
> > > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > > +     .ready          = vp_enable_vectors,
> > > > > >       .get            = vp_get,
> > > > > >       .set            = vp_set,
> > > > > >       .get_status     = vp_get_status,
> > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > > > index 30654d3a0b41..acf0f6b6381d 100644
> > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > @@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > > >        */
> > > > > >       while (vp_modern_get_status(mdev))
> > > > > >               msleep(1);
> > > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > > -     vp_synchronize_vectors(vdev);
> > > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > > +     vp_disable_vectors(vdev);
> > > > > >  }
> > > > > >
> > > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > > @@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
> > > > > >  }
> > > > > >
> > > > > >  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > > +     .ready          = vp_enable_vectors,
> > > > > >       .get            = NULL,
> > > > > >       .set            = NULL,
> > > > > >       .generation     = vp_generation,
> > > > > > @@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > >  };
> > > > > >
> > > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > > +     .ready          = vp_enable_vectors,
> > > > > >       .get            = vp_get,
> > > > > >       .set            = vp_set,
> > > > > >       .generation     = vp_generation,
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13  6:41       ` Michael S. Tsirkin
@ 2021-09-13  6:45         ` Jason Wang
  2021-09-13  7:02           ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-09-13  6:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Mon, Sep 13, 2021 at 2:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 02:36:54PM +0800, Jason Wang wrote:
> > On Mon, Sep 13, 2021 at 2:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 13, 2021 at 01:53:51PM +0800, Jason Wang wrote:
> > > > This patch tries to make sure the virtio interrupt handler for INTX
> > > > won't be called after a reset and before virtio_device_ready(). We
> > > > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > > > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > > > intx_soft_enabled variable and toggle it during in
> > > > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > > > intx_soft_enabled before processing the actual interrupt.
> > > >
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > >
> > >
> > > Not all that excited about all the memory barriers for something
> > > that should be an extremely rare event (for most kernels -
> > > literally once per boot). Can't we do better?
> >
> > I'm not sure, but do we need to care about the slow path (INTX)?
>
> Otherwise we won't try to support this, right?

Sorry, what I meant is "do we need to care about the performance of
the slow path".

>
> > (Or do you have a better approach?)
> >
> > Thanks
>
> Don't know really, maybe rcu or whatever?

I am sure it's worth it to bother since it's the slow path.

> But let's try to be much more specific - is there anything
> specific we are trying to protect against here?

The unexpected calling of the vring or config interrupt handler. (The
same as MSI-X, e.g the untrusted device can send irq at any time).

Thanks

>
>
>
> > >
> > > > ---
> > > >  drivers/virtio/virtio_pci_common.c | 18 ++++++++++++++++--
> > > >  drivers/virtio/virtio_pci_common.h |  1 +
> > > >  2 files changed, 17 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > index 0b9523e6dd39..835197151dc1 100644
> > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > @@ -30,8 +30,12 @@ void vp_disable_vectors(struct virtio_device *vdev)
> > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > >       int i;
> > > >
> > > > -     if (vp_dev->intx_enabled)
> > > > +     if (vp_dev->intx_enabled) {
> > > > +             vp_dev->intx_soft_enabled = false;
> > > > +             /* ensure the vp_interrupt see this intx_soft_enabled value */
> > > > +             smp_wmb();
> > > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > > +     }
> > > >
> > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > >               disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > @@ -43,8 +47,12 @@ void vp_enable_vectors(struct virtio_device *vdev)
> > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > >       int i;
> > > >
> > > > -     if (vp_dev->intx_enabled)
> > > > +     if (vp_dev->intx_enabled) {
> > > > +             vp_dev->intx_soft_enabled = true;
> > > > +             /* ensure the vp_interrupt see this intx_soft_enabled value */
> > > > +             smp_wmb();
> > > >               return;
> > > > +     }
> > > >
> > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > >               enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > @@ -97,6 +105,12 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
> > > >       struct virtio_pci_device *vp_dev = opaque;
> > > >       u8 isr;
> > > >
> > > > +     if (!vp_dev->intx_soft_enabled)
> > > > +             return IRQ_NONE;
> > > > +
> > > > +     /* read intx_soft_enabled before read others */
> > > > +     smp_rmb();
> > > > +
> > > >       /* reading the ISR has the effect of also clearing it so it's very
> > > >        * important to save off the value. */
> > > >       isr = ioread8(vp_dev->isr);
> > > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > > index a235ce9ff6a5..3c06e0f92ee4 100644
> > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > @@ -64,6 +64,7 @@ struct virtio_pci_device {
> > > >       /* MSI-X support */
> > > >       int msix_enabled;
> > > >       int intx_enabled;
> > > > +     bool intx_soft_enabled;
> > > >       cpumask_var_t *msix_affinity_masks;
> > > >       /* Name strings for interrupts. This size should be enough,
> > > >        * and I'm too lazy to allocate each name separately. */
> > > > --
> > > > 2.25.1
> > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  6:37           ` Michael S. Tsirkin
  2021-09-13  6:43             ` Jason Wang
@ 2021-09-13  6:50             ` Michael S. Tsirkin
  2021-09-13  7:07               ` Jason Wang
  1 sibling, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13  6:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 02:37:42AM -0400, Michael S. Tsirkin wrote:
> On Mon, Sep 13, 2021 at 02:34:01PM +0800, Jason Wang wrote:
> > On Mon, Sep 13, 2021 at 2:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 13, 2021 at 02:08:02PM +0800, Jason Wang wrote:
> > > > On Mon, Sep 13, 2021 at 2:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 13, 2021 at 01:53:50PM +0800, Jason Wang wrote:
> > > > > > We used to synchronize pending MSI-X irq handlers via
> > > > > > synchronize_irq(), this may not work for the untrusted device which
> > > > > > may keep sending interrupts after reset which may lead unexpected
> > > > > > results. Similarly, we should not enable MSI-X interrupt until the
> > > > > > device is ready. So this patch fixes those two issues by:
> > > > > >
> > > > > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > > > >    handlers to be called after the device is reset.
> > > > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > > > >
> > > > > > This can make sure the virtio interrupt handler won't be called before
> > > > > > virtio_device_ready() and after reset.
> > > > > >
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > >
> > > > > I don't get the threat model here. Isn't disabling irqs done by the
> > > > > hypervisor anyway? Is there a reason to trust disable_irq but not
> > > > > device reset?
> > > >
> > > > My understanding is that e.g in the case of SEV/TDX we don't trust the
> > > > hypervisor. So the hypervisor can keep sending interrupts even if the
> > > > device is reset. The guest can only trust its own software interrupt
> > > > management logic to avoid call virtio callback in this case.
> > > >
> > > > Thanks
> > >
> > > Hmm but I don't see how do these patches do this.
> > > They call disable_irq but can't the hypervisor keep
> > > sending interrupts after disable_irq, too?
> > 
> > Yes, but since the irq is disabled, the vring or config callback won't
> > be called in this case.
> > 
> > Thanks
> 
> But doen't "irq is disabled" basically mean "we told the hypervisor
> to disable the irq"?  What extractly prevents hypervisor from
> sending the irq even if guest thinks it disabled it?

More generally, can't we for example blow away the
indir_desc array that we use to keep the ctx pointers?
Won't that be enough?


And looking at all this closely, I suspect it is wise to just completely
disable hotplug/unplug for encrypted guests. Or maybe it's already
the case?


> > >
> > >
> > >
> > > > >
> > > > > Cc a bunch more people ...
> > > > >
> > > > >
> > > > > > ---
> > > > > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > > > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > > > > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > > index b35bb2d57f62..0b9523e6dd39 100644
> > > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > > > >                "Force legacy mode for transitional virtio 1 devices");
> > > > > >  #endif
> > > > > >
> > > > > > -/* wait for pending irq handlers */
> > > > > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > > +/* disable irq handlers */
> > > > > > +void vp_disable_vectors(struct virtio_device *vdev)
> > > > > >  {
> > > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > >       int i;
> > > > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > > > >
> > > > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > > -             synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > +             disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > +}
> > > > > > +
> > > > > > +/* enable irq handlers */
> > > > > > +void vp_enable_vectors(struct virtio_device *vdev)
> > > > > > +{
> > > > > > +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > +     int i;
> > > > > > +
> > > > > > +     if (vp_dev->intx_enabled)
> > > > > > +             return;
> > > > > > +
> > > > > > +     for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > > +             enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > >  }
> > > > > >
> > > > > >  /* the notify function used when creating a virt queue */
> > > > > > @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > > >       snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > > >                "%s-config", name);
> > > > > >       err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > > -                       vp_config_changed, 0, vp_dev->msix_names[v],
> > > > > > +                       vp_config_changed, IRQF_NO_AUTOEN,
> > > > > > +                       vp_dev->msix_names[v],
> > > > > >                         vp_dev);
> > > > > >       if (err)
> > > > > >               goto error;
> > > > > > @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > > >               snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > > >                        "%s-virtqueues", name);
> > > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > > -                               vp_vring_interrupt, 0, vp_dev->msix_names[v],
> > > > > > +                               vp_vring_interrupt, IRQF_NO_AUTOEN,
> > > > > > +                               vp_dev->msix_names[v],
> > > > > >                                 vp_dev);
> > > > > >               if (err)
> > > > > >                       goto error;
> > > > > > @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
> > > > > >                        "%s-%s",
> > > > > >                        dev_name(&vp_dev->vdev.dev), names[i]);
> > > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> > > > > > -                               vring_interrupt, 0,
> > > > > > +                               vring_interrupt, IRQF_NO_AUTOEN,
> > > > > >                                 vp_dev->msix_names[msix_vec],
> > > > > >                                 vqs[i]);
> > > > > >               if (err)
> > > > > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > > > > index beec047a8f8d..a235ce9ff6a5 100644
> > > > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > > > @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
> > > > > >       return container_of(vdev, struct virtio_pci_device, vdev);
> > > > > >  }
> > > > > >
> > > > > > -/* wait for pending irq handlers */
> > > > > > -void vp_synchronize_vectors(struct virtio_device *vdev);
> > > > > > +/* disable irq handlers */
> > > > > > +void vp_disable_vectors(struct virtio_device *vdev);
> > > > > > +/* enable irq handlers */
> > > > > > +void vp_enable_vectors(struct virtio_device *vdev);
> > > > > >  /* the notify function used when creating a virt queue */
> > > > > >  bool vp_notify(struct virtqueue *vq);
> > > > > >  /* the config->del_vqs() implementation */
> > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > > > > > index d62e9835aeec..bdf6bc667ab5 100644
> > > > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > > > @@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > > >       /* Flush out the status write, and flush in device writes,
> > > > > >        * including MSi-X interrupts, if any. */
> > > > > >       ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
> > > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > > -     vp_synchronize_vectors(vdev);
> > > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > > +     vp_disable_vectors(vdev);
> > > > > >  }
> > > > > >
> > > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > > @@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
> > > > > >  }
> > > > > >
> > > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > > +     .ready          = vp_enable_vectors,
> > > > > >       .get            = vp_get,
> > > > > >       .set            = vp_set,
> > > > > >       .get_status     = vp_get_status,
> > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > > > index 30654d3a0b41..acf0f6b6381d 100644
> > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > @@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > > >        */
> > > > > >       while (vp_modern_get_status(mdev))
> > > > > >               msleep(1);
> > > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > > -     vp_synchronize_vectors(vdev);
> > > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > > +     vp_disable_vectors(vdev);
> > > > > >  }
> > > > > >
> > > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > > @@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
> > > > > >  }
> > > > > >
> > > > > >  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > > +     .ready          = vp_enable_vectors,
> > > > > >       .get            = NULL,
> > > > > >       .set            = NULL,
> > > > > >       .generation     = vp_generation,
> > > > > > @@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > >  };
> > > > > >
> > > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > > +     .ready          = vp_enable_vectors,
> > > > > >       .get            = vp_get,
> > > > > >       .set            = vp_set,
> > > > > >       .generation     = vp_generation,
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 9/9] virtio_ring: validate used buffer length
  2021-09-13  6:40     ` Jason Wang
@ 2021-09-13  6:57       ` Michael S. Tsirkin
  2021-09-13  7:13         ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13  6:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Mon, Sep 13, 2021 at 02:40:09PM +0800, Jason Wang wrote:
> On Mon, Sep 13, 2021 at 2:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Sep 13, 2021 at 01:53:53PM +0800, Jason Wang wrote:
> > > This patch validate the used buffer length provided by the device
> > > before trying to use it. This is done by record the in buffer length
> > > in a new field in desc_state structure during virtqueue_add(), then we
> > > can fail the virtqueue_get_buf() when we find the device is trying to
> > > give us a used buffer length which is greater than the in buffer
> > > length.
> > >
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> >
> > Hmm this was proposed in the past. The overhead here is
> > not negligeable, so I'd like to know more -
> > when is it a problem if the used len is too big?
> 
> One example is: https://github.com/fuzzsa/fuzzsa-bugs/blob/master/virtio_rng.md
> 
> And there would be more I guess.

That seems to suggest hwrng validation is better, and
I think it makes sense: will fix all rng drivers in one go.

> > Don't the affected drivers already track the length somewhere
> > and so can validated it without the extra cost in
> > virtio core?
> 
> Probably, but this requires the changes in each device driver. And it
> would be easily forgotten if new drivers are introduced?
> 
> Thanks

My thinking is one just has to be aware that before enabling
any drivers they have to be audited. We can validate
used length but e.g. for virtio net the length is inside
the buffer anyway.

If we really have to, maybe use extra->len?
And maybe have a mod param so the check can be turned off e.g.
for benchmarking purposes.



> >
> > > ---
> > >  drivers/virtio/virtio_ring.c | 23 +++++++++++++++++++++++
> > >  1 file changed, 23 insertions(+)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index d2ca0a7365f8..b8374a6144f3 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -69,6 +69,7 @@
> > >  struct vring_desc_state_split {
> > >       void *data;                     /* Data for callback. */
> > >       struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > > +     u64 buflen;                     /* In buffer length */
> > >  };
> > >
> > >  struct vring_desc_state_packed {
> > > @@ -76,6 +77,7 @@ struct vring_desc_state_packed {
> > >       struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> > >       u16 num;                        /* Descriptor list length. */
> > >       u16 last;                       /* The last desc state in a list. */
> > > +     u64 buflen;                     /* In buffer length */
> > >  };
> > >
> > >  struct vring_desc_extra {
> > > @@ -490,6 +492,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >       unsigned int i, n, avail, descs_used, prev, err_idx;
> > >       int head;
> > >       bool indirect;
> > > +     u64 buflen = 0;
> > >
> > >       START_USE(vq);
> > >
> > > @@ -571,6 +574,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >                                                    VRING_DESC_F_NEXT |
> > >                                                    VRING_DESC_F_WRITE,
> > >                                                    indirect);
> > > +                     buflen += sg->length;
> > >               }
> > >       }
> > >       /* Last one doesn't continue. */
> > > @@ -605,6 +609,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >
> > >       /* Store token and indirect buffer state. */
> > >       vq->split.desc_state[head].data = data;
> > > +     vq->split.desc_state[head].buflen = buflen;
> > >       if (indirect)
> > >               vq->split.desc_state[head].indir_desc = desc;
> > >       else
> > > @@ -784,6 +789,11 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> > >               BAD_RING(vq, "id %u is not a head!\n", i);
> > >               return NULL;
> > >       }
> > > +     if (unlikely(*len > vq->split.desc_state[i].buflen)) {
> > > +             BAD_RING(vq, "used len %d is larger than in buflen %lld\n",
> > > +                     *len, vq->split.desc_state[i].buflen);
> > > +             return NULL;
> > > +     }
> > >
> > >       /* detach_buf_split clears data, so grab it now. */
> > >       ret = vq->split.desc_state[i].data;
> > > @@ -1062,6 +1072,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >       unsigned int i, n, err_idx;
> > >       u16 head, id;
> > >       dma_addr_t addr;
> > > +     u64 buflen = 0;
> > >
> > >       head = vq->packed.next_avail_idx;
> > >       desc = alloc_indirect_packed(total_sg, gfp);
> > > @@ -1089,6 +1100,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >                       desc[i].addr = cpu_to_le64(addr);
> > >                       desc[i].len = cpu_to_le32(sg->length);
> > >                       i++;
> > > +                     if (n >= out_sgs)
> > > +                             buflen += sg->length;
> > >               }
> > >       }
> > >
> > > @@ -1141,6 +1154,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >       vq->packed.desc_state[id].data = data;
> > >       vq->packed.desc_state[id].indir_desc = desc;
> > >       vq->packed.desc_state[id].last = id;
> > > +     vq->packed.desc_state[id].buflen = buflen;
> > >
> > >       vq->num_added += 1;
> > >
> > > @@ -1176,6 +1190,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > >       unsigned int i, n, c, descs_used, err_idx;
> > >       __le16 head_flags, flags;
> > >       u16 head, id, prev, curr, avail_used_flags;
> > > +     u64 buflen = 0;
> > >
> > >       START_USE(vq);
> > >
> > > @@ -1250,6 +1265,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > >                                       1 << VRING_PACKED_DESC_F_AVAIL |
> > >                                       1 << VRING_PACKED_DESC_F_USED;
> > >                       }
> > > +                     if (n >= out_sgs)
> > > +                             buflen += sg->length;
> > >               }
> > >       }
> > >
> > > @@ -1268,6 +1285,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > >       vq->packed.desc_state[id].data = data;
> > >       vq->packed.desc_state[id].indir_desc = ctx;
> > >       vq->packed.desc_state[id].last = prev;
> > > +     vq->packed.desc_state[id].buflen = buflen;
> > >
> > >       /*
> > >        * A driver MUST NOT make the first descriptor in the list
> > > @@ -1455,6 +1473,11 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> > >               BAD_RING(vq, "id %u is not a head!\n", id);
> > >               return NULL;
> > >       }
> > > +     if (unlikely(*len > vq->packed.desc_state[id].buflen)) {
> > > +             BAD_RING(vq, "used len %d is larger than in buflen %lld\n",
> > > +                     *len, vq->packed.desc_state[id].buflen);
> > > +             return NULL;
> > > +     }
> > >
> > >       /* detach_buf_packed clears data, so grab it now. */
> > >       ret = vq->packed.desc_state[id].data;
> > > --
> > > 2.25.1
> >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  6:43             ` Jason Wang
@ 2021-09-13  7:01               ` Michael S. Tsirkin
  2021-09-13  7:15                 ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13  7:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 02:43:08PM +0800, Jason Wang wrote:
> On Mon, Sep 13, 2021 at 2:37 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Sep 13, 2021 at 02:34:01PM +0800, Jason Wang wrote:
> > > On Mon, Sep 13, 2021 at 2:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 13, 2021 at 02:08:02PM +0800, Jason Wang wrote:
> > > > > On Mon, Sep 13, 2021 at 2:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 13, 2021 at 01:53:50PM +0800, Jason Wang wrote:
> > > > > > > We used to synchronize pending MSI-X irq handlers via
> > > > > > > synchronize_irq(), this may not work for the untrusted device which
> > > > > > > may keep sending interrupts after reset which may lead unexpected
> > > > > > > results. Similarly, we should not enable MSI-X interrupt until the
> > > > > > > device is ready. So this patch fixes those two issues by:
> > > > > > >
> > > > > > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > > > > >    handlers to be called after the device is reset.
> > > > > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > > > > >
> > > > > > > This can make sure the virtio interrupt handler won't be called before
> > > > > > > virtio_device_ready() and after reset.
> > > > > > >
> > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > >
> > > > > > I don't get the threat model here. Isn't disabling irqs done by the
> > > > > > hypervisor anyway? Is there a reason to trust disable_irq but not
> > > > > > device reset?
> > > > >
> > > > > My understanding is that e.g in the case of SEV/TDX we don't trust the
> > > > > hypervisor. So the hypervisor can keep sending interrupts even if the
> > > > > device is reset. The guest can only trust its own software interrupt
> > > > > management logic to avoid call virtio callback in this case.
> > > > >
> > > > > Thanks
> > > >
> > > > Hmm but I don't see how do these patches do this.
> > > > They call disable_irq but can't the hypervisor keep
> > > > sending interrupts after disable_irq, too?
> > >
> > > Yes, but since the irq is disabled, the vring or config callback won't
> > > be called in this case.
> > >
> > > Thanks
> >
> > But doen't "irq is disabled" basically mean "we told the hypervisor
> > to disable the irq"?  What extractly prevents hypervisor from
> > sending the irq even if guest thinks it disabled it?
> 
> It can't prevent the hypersior from sending irq. But it can make sure
> the irq descriptor is disabled (e.g IRQD_IRQ_DISABLED). Is this
> sufficient?
> 
> Thanks

Maybe, maybe not ... there's not a lot in the way of
memory barriers around code using that bit, that's for sure.
Did anyone look at it from point of view of what
can a bad interrupt do?


> >
> > > >
> > > >
> > > >
> > > > > >
> > > > > > Cc a bunch more people ...
> > > > > >
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > > > > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > > > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > > > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > > > > > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > > > index b35bb2d57f62..0b9523e6dd39 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > > > > >                "Force legacy mode for transitional virtio 1 devices");
> > > > > > >  #endif
> > > > > > >
> > > > > > > -/* wait for pending irq handlers */
> > > > > > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > > > +/* disable irq handlers */
> > > > > > > +void vp_disable_vectors(struct virtio_device *vdev)
> > > > > > >  {
> > > > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > >       int i;
> > > > > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > > > > >
> > > > > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > > > -             synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > > +             disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* enable irq handlers */
> > > > > > > +void vp_enable_vectors(struct virtio_device *vdev)
> > > > > > > +{
> > > > > > > +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > > +     int i;
> > > > > > > +
> > > > > > > +     if (vp_dev->intx_enabled)
> > > > > > > +             return;
> > > > > > > +
> > > > > > > +     for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > > > +             enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > >  }
> > > > > > >
> > > > > > >  /* the notify function used when creating a virt queue */
> > > > > > > @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > > > >       snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > > > >                "%s-config", name);
> > > > > > >       err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > > > -                       vp_config_changed, 0, vp_dev->msix_names[v],
> > > > > > > +                       vp_config_changed, IRQF_NO_AUTOEN,
> > > > > > > +                       vp_dev->msix_names[v],
> > > > > > >                         vp_dev);
> > > > > > >       if (err)
> > > > > > >               goto error;
> > > > > > > @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > > > >               snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > > > >                        "%s-virtqueues", name);
> > > > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > > > -                               vp_vring_interrupt, 0, vp_dev->msix_names[v],
> > > > > > > +                               vp_vring_interrupt, IRQF_NO_AUTOEN,
> > > > > > > +                               vp_dev->msix_names[v],
> > > > > > >                                 vp_dev);
> > > > > > >               if (err)
> > > > > > >                       goto error;
> > > > > > > @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
> > > > > > >                        "%s-%s",
> > > > > > >                        dev_name(&vp_dev->vdev.dev), names[i]);
> > > > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> > > > > > > -                               vring_interrupt, 0,
> > > > > > > +                               vring_interrupt, IRQF_NO_AUTOEN,
> > > > > > >                                 vp_dev->msix_names[msix_vec],
> > > > > > >                                 vqs[i]);
> > > > > > >               if (err)
> > > > > > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > > > > > index beec047a8f8d..a235ce9ff6a5 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > > > > @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
> > > > > > >       return container_of(vdev, struct virtio_pci_device, vdev);
> > > > > > >  }
> > > > > > >
> > > > > > > -/* wait for pending irq handlers */
> > > > > > > -void vp_synchronize_vectors(struct virtio_device *vdev);
> > > > > > > +/* disable irq handlers */
> > > > > > > +void vp_disable_vectors(struct virtio_device *vdev);
> > > > > > > +/* enable irq handlers */
> > > > > > > +void vp_enable_vectors(struct virtio_device *vdev);
> > > > > > >  /* the notify function used when creating a virt queue */
> > > > > > >  bool vp_notify(struct virtqueue *vq);
> > > > > > >  /* the config->del_vqs() implementation */
> > > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > > > > > > index d62e9835aeec..bdf6bc667ab5 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > > > > @@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > > > >       /* Flush out the status write, and flush in device writes,
> > > > > > >        * including MSi-X interrupts, if any. */
> > > > > > >       ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
> > > > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > > > -     vp_synchronize_vectors(vdev);
> > > > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > > > +     vp_disable_vectors(vdev);
> > > > > > >  }
> > > > > > >
> > > > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > > > @@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
> > > > > > >  }
> > > > > > >
> > > > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > > > +     .ready          = vp_enable_vectors,
> > > > > > >       .get            = vp_get,
> > > > > > >       .set            = vp_set,
> > > > > > >       .get_status     = vp_get_status,
> > > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > > > > index 30654d3a0b41..acf0f6b6381d 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > > @@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > > > >        */
> > > > > > >       while (vp_modern_get_status(mdev))
> > > > > > >               msleep(1);
> > > > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > > > -     vp_synchronize_vectors(vdev);
> > > > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > > > +     vp_disable_vectors(vdev);
> > > > > > >  }
> > > > > > >
> > > > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > > > @@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
> > > > > > >  }
> > > > > > >
> > > > > > >  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > > > +     .ready          = vp_enable_vectors,
> > > > > > >       .get            = NULL,
> > > > > > >       .set            = NULL,
> > > > > > >       .generation     = vp_generation,
> > > > > > > @@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > > >  };
> > > > > > >
> > > > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > > > +     .ready          = vp_enable_vectors,
> > > > > > >       .get            = vp_get,
> > > > > > >       .set            = vp_set,
> > > > > > >       .generation     = vp_generation,
> > > > > > > --
> > > > > > > 2.25.1
> > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13  6:45         ` Jason Wang
@ 2021-09-13  7:02           ` Michael S. Tsirkin
  2021-09-13  7:17             ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13  7:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Mon, Sep 13, 2021 at 02:45:38PM +0800, Jason Wang wrote:
> On Mon, Sep 13, 2021 at 2:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Sep 13, 2021 at 02:36:54PM +0800, Jason Wang wrote:
> > > On Mon, Sep 13, 2021 at 2:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 13, 2021 at 01:53:51PM +0800, Jason Wang wrote:
> > > > > This patch tries to make sure the virtio interrupt handler for INTX
> > > > > won't be called after a reset and before virtio_device_ready(). We
> > > > > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > > > > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > > > > intx_soft_enabled variable and toggle it during in
> > > > > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > > > > intx_soft_enabled before processing the actual interrupt.
> > > > >
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > >
> > > >
> > > > Not all that excited about all the memory barriers for something
> > > > that should be an extremely rare event (for most kernels -
> > > > literally once per boot). Can't we do better?
> > >
> > > I'm not sure, but do we need to care about the slow path (INTX)?
> >
> > Otherwise we won't try to support this, right?
> 
> Sorry, what I meant is "do we need to care about the performance of
> the slow path".
> 
> >
> > > (Or do you have a better approach?)
> > >
> > > Thanks
> >
> > Don't know really, maybe rcu or whatever?
> 
> I am sure it's worth it to bother since it's the slow path.
> 
> > But let's try to be much more specific - is there anything
> > specific we are trying to protect against here?
> 
> The unexpected calling of the vring or config interrupt handler. (The
> same as MSI-X, e.g the untrusted device can send irq at any time).
> 
> Thanks

And so, does this do more than crash the guest?  Hypervisors
already can do that ...


> >
> >
> >
> > > >
> > > > > ---
> > > > >  drivers/virtio/virtio_pci_common.c | 18 ++++++++++++++++--
> > > > >  drivers/virtio/virtio_pci_common.h |  1 +
> > > > >  2 files changed, 17 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > index 0b9523e6dd39..835197151dc1 100644
> > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > @@ -30,8 +30,12 @@ void vp_disable_vectors(struct virtio_device *vdev)
> > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > >       int i;
> > > > >
> > > > > -     if (vp_dev->intx_enabled)
> > > > > +     if (vp_dev->intx_enabled) {
> > > > > +             vp_dev->intx_soft_enabled = false;
> > > > > +             /* ensure the vp_interrupt see this intx_soft_enabled value */
> > > > > +             smp_wmb();
> > > > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > > > +     }
> > > > >
> > > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > >               disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > @@ -43,8 +47,12 @@ void vp_enable_vectors(struct virtio_device *vdev)
> > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > >       int i;
> > > > >
> > > > > -     if (vp_dev->intx_enabled)
> > > > > +     if (vp_dev->intx_enabled) {
> > > > > +             vp_dev->intx_soft_enabled = true;
> > > > > +             /* ensure the vp_interrupt see this intx_soft_enabled value */
> > > > > +             smp_wmb();
> > > > >               return;
> > > > > +     }
> > > > >
> > > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > >               enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > @@ -97,6 +105,12 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
> > > > >       struct virtio_pci_device *vp_dev = opaque;
> > > > >       u8 isr;
> > > > >
> > > > > +     if (!vp_dev->intx_soft_enabled)
> > > > > +             return IRQ_NONE;
> > > > > +
> > > > > +     /* read intx_soft_enabled before read others */
> > > > > +     smp_rmb();
> > > > > +
> > > > >       /* reading the ISR has the effect of also clearing it so it's very
> > > > >        * important to save off the value. */
> > > > >       isr = ioread8(vp_dev->isr);
> > > > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > > > index a235ce9ff6a5..3c06e0f92ee4 100644
> > > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > > @@ -64,6 +64,7 @@ struct virtio_pci_device {
> > > > >       /* MSI-X support */
> > > > >       int msix_enabled;
> > > > >       int intx_enabled;
> > > > > +     bool intx_soft_enabled;
> > > > >       cpumask_var_t *msix_affinity_masks;
> > > > >       /* Name strings for interrupts. This size should be enough,
> > > > >        * and I'm too lazy to allocate each name separately. */
> > > > > --
> > > > > 2.25.1
> > > >
> >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  6:50             ` Michael S. Tsirkin
@ 2021-09-13  7:07               ` Jason Wang
  2021-09-13 19:38                 ` Thomas Gleixner
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-09-13  7:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 2:50 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 02:37:42AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Sep 13, 2021 at 02:34:01PM +0800, Jason Wang wrote:
> > > On Mon, Sep 13, 2021 at 2:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 13, 2021 at 02:08:02PM +0800, Jason Wang wrote:
> > > > > On Mon, Sep 13, 2021 at 2:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 13, 2021 at 01:53:50PM +0800, Jason Wang wrote:
> > > > > > > We used to synchronize pending MSI-X irq handlers via
> > > > > > > synchronize_irq(), this may not work for the untrusted device which
> > > > > > > may keep sending interrupts after reset which may lead unexpected
> > > > > > > results. Similarly, we should not enable MSI-X interrupt until the
> > > > > > > device is ready. So this patch fixes those two issues by:
> > > > > > >
> > > > > > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > > > > >    handlers to be called after the device is reset.
> > > > > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > > > > >
> > > > > > > This can make sure the virtio interrupt handler won't be called before
> > > > > > > virtio_device_ready() and after reset.
> > > > > > >
> > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > >
> > > > > > I don't get the threat model here. Isn't disabling irqs done by the
> > > > > > hypervisor anyway? Is there a reason to trust disable_irq but not
> > > > > > device reset?
> > > > >
> > > > > My understanding is that e.g in the case of SEV/TDX we don't trust the
> > > > > hypervisor. So the hypervisor can keep sending interrupts even if the
> > > > > device is reset. The guest can only trust its own software interrupt
> > > > > management logic to avoid call virtio callback in this case.
> > > > >
> > > > > Thanks
> > > >
> > > > Hmm but I don't see how do these patches do this.
> > > > They call disable_irq but can't the hypervisor keep
> > > > sending interrupts after disable_irq, too?
> > >
> > > Yes, but since the irq is disabled, the vring or config callback won't
> > > be called in this case.
> > >
> > > Thanks
> >
> > But doen't "irq is disabled" basically mean "we told the hypervisor
> > to disable the irq"?  What extractly prevents hypervisor from
> > sending the irq even if guest thinks it disabled it?
>
> More generally, can't we for example blow away the
> indir_desc array that we use to keep the ctx pointers?
> Won't that be enough?

I'm not sure how it is related to the indirect descriptor but an
example is that all the current driver will assume:

1) the interrupt won't be raised before virtio_device_ready()
2) the interrupt won't be raised after reset()

All of these above two are not true if we don't trust the device. If
we want to audit all the drivers it would be not trivial and hard to
be correct. So it looks to me it's better to avoid those in the virtio
core.

>
>
> And looking at all this closely, I suspect it is wise to just completely
> disable hotplug/unplug for encrypted guests. Or maybe it's already
> the case?

I think not, it's more about:

1) what happens if the vq interrupt handler is called between
init_vqs() and virito_device_ready()
2) what happens if vq interrupt handler is called after reset

Instead of trying to answer those hard questions, it's better simply
not to have those conditions.

Thanks

>
>
> > > >
> > > >
> > > >
> > > > > >
> > > > > > Cc a bunch more people ...
> > > > > >
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > > > > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > > > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > > > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > > > > > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > > > index b35bb2d57f62..0b9523e6dd39 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > > > > >                "Force legacy mode for transitional virtio 1 devices");
> > > > > > >  #endif
> > > > > > >
> > > > > > > -/* wait for pending irq handlers */
> > > > > > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > > > +/* disable irq handlers */
> > > > > > > +void vp_disable_vectors(struct virtio_device *vdev)
> > > > > > >  {
> > > > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > >       int i;
> > > > > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > > > > >
> > > > > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > > > -             synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > > +             disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* enable irq handlers */
> > > > > > > +void vp_enable_vectors(struct virtio_device *vdev)
> > > > > > > +{
> > > > > > > +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > > +     int i;
> > > > > > > +
> > > > > > > +     if (vp_dev->intx_enabled)
> > > > > > > +             return;
> > > > > > > +
> > > > > > > +     for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > > > +             enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > >  }
> > > > > > >
> > > > > > >  /* the notify function used when creating a virt queue */
> > > > > > > @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > > > >       snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > > > >                "%s-config", name);
> > > > > > >       err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > > > -                       vp_config_changed, 0, vp_dev->msix_names[v],
> > > > > > > +                       vp_config_changed, IRQF_NO_AUTOEN,
> > > > > > > +                       vp_dev->msix_names[v],
> > > > > > >                         vp_dev);
> > > > > > >       if (err)
> > > > > > >               goto error;
> > > > > > > @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > > > >               snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > > > >                        "%s-virtqueues", name);
> > > > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > > > -                               vp_vring_interrupt, 0, vp_dev->msix_names[v],
> > > > > > > +                               vp_vring_interrupt, IRQF_NO_AUTOEN,
> > > > > > > +                               vp_dev->msix_names[v],
> > > > > > >                                 vp_dev);
> > > > > > >               if (err)
> > > > > > >                       goto error;
> > > > > > > @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
> > > > > > >                        "%s-%s",
> > > > > > >                        dev_name(&vp_dev->vdev.dev), names[i]);
> > > > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> > > > > > > -                               vring_interrupt, 0,
> > > > > > > +                               vring_interrupt, IRQF_NO_AUTOEN,
> > > > > > >                                 vp_dev->msix_names[msix_vec],
> > > > > > >                                 vqs[i]);
> > > > > > >               if (err)
> > > > > > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > > > > > index beec047a8f8d..a235ce9ff6a5 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > > > > @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
> > > > > > >       return container_of(vdev, struct virtio_pci_device, vdev);
> > > > > > >  }
> > > > > > >
> > > > > > > -/* wait for pending irq handlers */
> > > > > > > -void vp_synchronize_vectors(struct virtio_device *vdev);
> > > > > > > +/* disable irq handlers */
> > > > > > > +void vp_disable_vectors(struct virtio_device *vdev);
> > > > > > > +/* enable irq handlers */
> > > > > > > +void vp_enable_vectors(struct virtio_device *vdev);
> > > > > > >  /* the notify function used when creating a virt queue */
> > > > > > >  bool vp_notify(struct virtqueue *vq);
> > > > > > >  /* the config->del_vqs() implementation */
> > > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > > > > > > index d62e9835aeec..bdf6bc667ab5 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > > > > @@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > > > >       /* Flush out the status write, and flush in device writes,
> > > > > > >        * including MSi-X interrupts, if any. */
> > > > > > >       ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
> > > > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > > > -     vp_synchronize_vectors(vdev);
> > > > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > > > +     vp_disable_vectors(vdev);
> > > > > > >  }
> > > > > > >
> > > > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > > > @@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
> > > > > > >  }
> > > > > > >
> > > > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > > > +     .ready          = vp_enable_vectors,
> > > > > > >       .get            = vp_get,
> > > > > > >       .set            = vp_set,
> > > > > > >       .get_status     = vp_get_status,
> > > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > > > > index 30654d3a0b41..acf0f6b6381d 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > > @@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > > > >        */
> > > > > > >       while (vp_modern_get_status(mdev))
> > > > > > >               msleep(1);
> > > > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > > > -     vp_synchronize_vectors(vdev);
> > > > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > > > +     vp_disable_vectors(vdev);
> > > > > > >  }
> > > > > > >
> > > > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > > > @@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
> > > > > > >  }
> > > > > > >
> > > > > > >  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > > > +     .ready          = vp_enable_vectors,
> > > > > > >       .get            = NULL,
> > > > > > >       .set            = NULL,
> > > > > > >       .generation     = vp_generation,
> > > > > > > @@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > > >  };
> > > > > > >
> > > > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > > > +     .ready          = vp_enable_vectors,
> > > > > > >       .get            = vp_get,
> > > > > > >       .set            = vp_set,
> > > > > > >       .generation     = vp_generation,
> > > > > > > --
> > > > > > > 2.25.1
> > > > > >
> > > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 9/9] virtio_ring: validate used buffer length
  2021-09-13  6:57       ` Michael S. Tsirkin
@ 2021-09-13  7:13         ` Jason Wang
  0 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  7:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Mon, Sep 13, 2021 at 2:57 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 02:40:09PM +0800, Jason Wang wrote:
> > On Mon, Sep 13, 2021 at 2:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 13, 2021 at 01:53:53PM +0800, Jason Wang wrote:
> > > > This patch validate the used buffer length provided by the device
> > > > before trying to use it. This is done by record the in buffer length
> > > > in a new field in desc_state structure during virtqueue_add(), then we
> > > > can fail the virtqueue_get_buf() when we find the device is trying to
> > > > give us a used buffer length which is greater than the in buffer
> > > > length.
> > > >
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > >
> > > Hmm this was proposed in the past. The overhead here is
> > > not negligeable, so I'd like to know more -
> > > when is it a problem if the used len is too big?
> >
> > One example is: https://github.com/fuzzsa/fuzzsa-bugs/blob/master/virtio_rng.md
> >
> > And there would be more I guess.
>
> That seems to suggest hwrng validation is better, and
> I think it makes sense: will fix all rng drivers in one go.
>
> > > Don't the affected drivers already track the length somewhere
> > > and so can validated it without the extra cost in
> > > virtio core?
> >
> > Probably, but this requires the changes in each device driver. And it
> > would be easily forgotten if new drivers are introduced?
> >
> > Thanks
>
> My thinking is one just has to be aware that before enabling
> any drivers they have to be audited. We can validate
> used length but e.g. for virtio net the length is inside
> the buffer anyway.

Yes, maybe we can introduce a boolean to tell the virtio core that the
driver can do the validation by itself.

>
> If we really have to, maybe use extra->len?

The extra->len has already been used for per descriptor length.

> And maybe have a mod param so the check can be turned off e.g.
> for benchmarking purposes.

Right. And I will benchmark the differences.

Thanks

>
>
>
> > >
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 23 +++++++++++++++++++++++
> > > >  1 file changed, 23 insertions(+)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index d2ca0a7365f8..b8374a6144f3 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -69,6 +69,7 @@
> > > >  struct vring_desc_state_split {
> > > >       void *data;                     /* Data for callback. */
> > > >       struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > > > +     u64 buflen;                     /* In buffer length */
> > > >  };
> > > >
> > > >  struct vring_desc_state_packed {
> > > > @@ -76,6 +77,7 @@ struct vring_desc_state_packed {
> > > >       struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> > > >       u16 num;                        /* Descriptor list length. */
> > > >       u16 last;                       /* The last desc state in a list. */
> > > > +     u64 buflen;                     /* In buffer length */
> > > >  };
> > > >
> > > >  struct vring_desc_extra {
> > > > @@ -490,6 +492,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > >       unsigned int i, n, avail, descs_used, prev, err_idx;
> > > >       int head;
> > > >       bool indirect;
> > > > +     u64 buflen = 0;
> > > >
> > > >       START_USE(vq);
> > > >
> > > > @@ -571,6 +574,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > >                                                    VRING_DESC_F_NEXT |
> > > >                                                    VRING_DESC_F_WRITE,
> > > >                                                    indirect);
> > > > +                     buflen += sg->length;
> > > >               }
> > > >       }
> > > >       /* Last one doesn't continue. */
> > > > @@ -605,6 +609,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > >
> > > >       /* Store token and indirect buffer state. */
> > > >       vq->split.desc_state[head].data = data;
> > > > +     vq->split.desc_state[head].buflen = buflen;
> > > >       if (indirect)
> > > >               vq->split.desc_state[head].indir_desc = desc;
> > > >       else
> > > > @@ -784,6 +789,11 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> > > >               BAD_RING(vq, "id %u is not a head!\n", i);
> > > >               return NULL;
> > > >       }
> > > > +     if (unlikely(*len > vq->split.desc_state[i].buflen)) {
> > > > +             BAD_RING(vq, "used len %d is larger than in buflen %lld\n",
> > > > +                     *len, vq->split.desc_state[i].buflen);
> > > > +             return NULL;
> > > > +     }
> > > >
> > > >       /* detach_buf_split clears data, so grab it now. */
> > > >       ret = vq->split.desc_state[i].data;
> > > > @@ -1062,6 +1072,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > > >       unsigned int i, n, err_idx;
> > > >       u16 head, id;
> > > >       dma_addr_t addr;
> > > > +     u64 buflen = 0;
> > > >
> > > >       head = vq->packed.next_avail_idx;
> > > >       desc = alloc_indirect_packed(total_sg, gfp);
> > > > @@ -1089,6 +1100,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > > >                       desc[i].addr = cpu_to_le64(addr);
> > > >                       desc[i].len = cpu_to_le32(sg->length);
> > > >                       i++;
> > > > +                     if (n >= out_sgs)
> > > > +                             buflen += sg->length;
> > > >               }
> > > >       }
> > > >
> > > > @@ -1141,6 +1154,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > > >       vq->packed.desc_state[id].data = data;
> > > >       vq->packed.desc_state[id].indir_desc = desc;
> > > >       vq->packed.desc_state[id].last = id;
> > > > +     vq->packed.desc_state[id].buflen = buflen;
> > > >
> > > >       vq->num_added += 1;
> > > >
> > > > @@ -1176,6 +1190,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > >       unsigned int i, n, c, descs_used, err_idx;
> > > >       __le16 head_flags, flags;
> > > >       u16 head, id, prev, curr, avail_used_flags;
> > > > +     u64 buflen = 0;
> > > >
> > > >       START_USE(vq);
> > > >
> > > > @@ -1250,6 +1265,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > >                                       1 << VRING_PACKED_DESC_F_AVAIL |
> > > >                                       1 << VRING_PACKED_DESC_F_USED;
> > > >                       }
> > > > +                     if (n >= out_sgs)
> > > > +                             buflen += sg->length;
> > > >               }
> > > >       }
> > > >
> > > > @@ -1268,6 +1285,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > >       vq->packed.desc_state[id].data = data;
> > > >       vq->packed.desc_state[id].indir_desc = ctx;
> > > >       vq->packed.desc_state[id].last = prev;
> > > > +     vq->packed.desc_state[id].buflen = buflen;
> > > >
> > > >       /*
> > > >        * A driver MUST NOT make the first descriptor in the list
> > > > @@ -1455,6 +1473,11 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> > > >               BAD_RING(vq, "id %u is not a head!\n", id);
> > > >               return NULL;
> > > >       }
> > > > +     if (unlikely(*len > vq->packed.desc_state[id].buflen)) {
> > > > +             BAD_RING(vq, "used len %d is larger than in buflen %lld\n",
> > > > +                     *len, vq->packed.desc_state[id].buflen);
> > > > +             return NULL;
> > > > +     }
> > > >
> > > >       /* detach_buf_packed clears data, so grab it now. */
> > > >       ret = vq->packed.desc_state[id].data;
> > > > --
> > > > 2.25.1
> > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  7:01               ` Michael S. Tsirkin
@ 2021-09-13  7:15                 ` Jason Wang
  0 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  7:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 3:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 02:43:08PM +0800, Jason Wang wrote:
> > On Mon, Sep 13, 2021 at 2:37 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 13, 2021 at 02:34:01PM +0800, Jason Wang wrote:
> > > > On Mon, Sep 13, 2021 at 2:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 13, 2021 at 02:08:02PM +0800, Jason Wang wrote:
> > > > > > On Mon, Sep 13, 2021 at 2:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 13, 2021 at 01:53:50PM +0800, Jason Wang wrote:
> > > > > > > > We used to synchronize pending MSI-X irq handlers via
> > > > > > > > synchronize_irq(), this may not work for the untrusted device which
> > > > > > > > may keep sending interrupts after reset which may lead unexpected
> > > > > > > > results. Similarly, we should not enable MSI-X interrupt until the
> > > > > > > > device is ready. So this patch fixes those two issues by:
> > > > > > > >
> > > > > > > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > > > > > >    handlers to be called after the device is reset.
> > > > > > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > > > > > >
> > > > > > > > This can make sure the virtio interrupt handler won't be called before
> > > > > > > > virtio_device_ready() and after reset.
> > > > > > > >
> > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > >
> > > > > > > I don't get the threat model here. Isn't disabling irqs done by the
> > > > > > > hypervisor anyway? Is there a reason to trust disable_irq but not
> > > > > > > device reset?
> > > > > >
> > > > > > My understanding is that e.g in the case of SEV/TDX we don't trust the
> > > > > > hypervisor. So the hypervisor can keep sending interrupts even if the
> > > > > > device is reset. The guest can only trust its own software interrupt
> > > > > > management logic to avoid call virtio callback in this case.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Hmm but I don't see how do these patches do this.
> > > > > They call disable_irq but can't the hypervisor keep
> > > > > sending interrupts after disable_irq, too?
> > > >
> > > > Yes, but since the irq is disabled, the vring or config callback won't
> > > > be called in this case.
> > > >
> > > > Thanks
> > >
> > > But doen't "irq is disabled" basically mean "we told the hypervisor
> > > to disable the irq"?  What extractly prevents hypervisor from
> > > sending the irq even if guest thinks it disabled it?
> >
> > It can't prevent the hypersior from sending irq. But it can make sure
> > the irq descriptor is disabled (e.g IRQD_IRQ_DISABLED). Is this
> > sufficient?
> >
> > Thanks
>
> Maybe, maybe not ... there's not a lot in the way of
> memory barriers around code using that bit, that's for sure.

Ok, I think the irq core should be robust enough for such unexpected
irq for many years but maybe I was wrong.

But anyhow, the virtio core should be prepared for this, since the irq
core doesn't know when the irq should be raised.

Thanks

> Did anyone look at it from point of view of what
> can a bad interrupt do?
>
>
> > >
> > > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > Cc a bunch more people ...
> > > > > > >
> > > > > > >
> > > > > > > > ---
> > > > > > > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > > > > > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > > > > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > > > > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > > > > > > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > > > > index b35bb2d57f62..0b9523e6dd39 100644
> > > > > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > > > > > >                "Force legacy mode for transitional virtio 1 devices");
> > > > > > > >  #endif
> > > > > > > >
> > > > > > > > -/* wait for pending irq handlers */
> > > > > > > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > > > > +/* disable irq handlers */
> > > > > > > > +void vp_disable_vectors(struct virtio_device *vdev)
> > > > > > > >  {
> > > > > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > > >       int i;
> > > > > > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > > > > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > > > > > >
> > > > > > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > > > > -             synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > > > +             disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +/* enable irq handlers */
> > > > > > > > +void vp_enable_vectors(struct virtio_device *vdev)
> > > > > > > > +{
> > > > > > > > +     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > > > +     int i;
> > > > > > > > +
> > > > > > > > +     if (vp_dev->intx_enabled)
> > > > > > > > +             return;
> > > > > > > > +
> > > > > > > > +     for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > > > > +             enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  /* the notify function used when creating a virt queue */
> > > > > > > > @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > > > > >       snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > > > > >                "%s-config", name);
> > > > > > > >       err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > > > > -                       vp_config_changed, 0, vp_dev->msix_names[v],
> > > > > > > > +                       vp_config_changed, IRQF_NO_AUTOEN,
> > > > > > > > +                       vp_dev->msix_names[v],
> > > > > > > >                         vp_dev);
> > > > > > > >       if (err)
> > > > > > > >               goto error;
> > > > > > > > @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> > > > > > > >               snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> > > > > > > >                        "%s-virtqueues", name);
> > > > > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> > > > > > > > -                               vp_vring_interrupt, 0, vp_dev->msix_names[v],
> > > > > > > > +                               vp_vring_interrupt, IRQF_NO_AUTOEN,
> > > > > > > > +                               vp_dev->msix_names[v],
> > > > > > > >                                 vp_dev);
> > > > > > > >               if (err)
> > > > > > > >                       goto error;
> > > > > > > > @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
> > > > > > > >                        "%s-%s",
> > > > > > > >                        dev_name(&vp_dev->vdev.dev), names[i]);
> > > > > > > >               err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> > > > > > > > -                               vring_interrupt, 0,
> > > > > > > > +                               vring_interrupt, IRQF_NO_AUTOEN,
> > > > > > > >                                 vp_dev->msix_names[msix_vec],
> > > > > > > >                                 vqs[i]);
> > > > > > > >               if (err)
> > > > > > > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > > > > > > index beec047a8f8d..a235ce9ff6a5 100644
> > > > > > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > > > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > > > > > @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
> > > > > > > >       return container_of(vdev, struct virtio_pci_device, vdev);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > -/* wait for pending irq handlers */
> > > > > > > > -void vp_synchronize_vectors(struct virtio_device *vdev);
> > > > > > > > +/* disable irq handlers */
> > > > > > > > +void vp_disable_vectors(struct virtio_device *vdev);
> > > > > > > > +/* enable irq handlers */
> > > > > > > > +void vp_enable_vectors(struct virtio_device *vdev);
> > > > > > > >  /* the notify function used when creating a virt queue */
> > > > > > > >  bool vp_notify(struct virtqueue *vq);
> > > > > > > >  /* the config->del_vqs() implementation */
> > > > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > > > > > > > index d62e9835aeec..bdf6bc667ab5 100644
> > > > > > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > > > > > @@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > > > > >       /* Flush out the status write, and flush in device writes,
> > > > > > > >        * including MSi-X interrupts, if any. */
> > > > > > > >       ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
> > > > > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > > > > -     vp_synchronize_vectors(vdev);
> > > > > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > > > > +     vp_disable_vectors(vdev);
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > > > > @@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > > > > +     .ready          = vp_enable_vectors,
> > > > > > > >       .get            = vp_get,
> > > > > > > >       .set            = vp_set,
> > > > > > > >       .get_status     = vp_get_status,
> > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > index 30654d3a0b41..acf0f6b6381d 100644
> > > > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > @@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
> > > > > > > >        */
> > > > > > > >       while (vp_modern_get_status(mdev))
> > > > > > > >               msleep(1);
> > > > > > > > -     /* Flush pending VQ/configuration callbacks. */
> > > > > > > > -     vp_synchronize_vectors(vdev);
> > > > > > > > +     /* Disable VQ/configuration callbacks. */
> > > > > > > > +     vp_disable_vectors(vdev);
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
> > > > > > > > @@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > > > > +     .ready          = vp_enable_vectors,
> > > > > > > >       .get            = NULL,
> > > > > > > >       .set            = NULL,
> > > > > > > >       .generation     = vp_generation,
> > > > > > > > @@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > > > > > > >  };
> > > > > > > >
> > > > > > > >  static const struct virtio_config_ops virtio_pci_config_ops = {
> > > > > > > > +     .ready          = vp_enable_vectors,
> > > > > > > >       .get            = vp_get,
> > > > > > > >       .set            = vp_set,
> > > > > > > >       .generation     = vp_generation,
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13  7:02           ` Michael S. Tsirkin
@ 2021-09-13  7:17             ` Jason Wang
  0 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-13  7:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Mon, Sep 13, 2021 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 02:45:38PM +0800, Jason Wang wrote:
> > On Mon, Sep 13, 2021 at 2:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 13, 2021 at 02:36:54PM +0800, Jason Wang wrote:
> > > > On Mon, Sep 13, 2021 at 2:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 13, 2021 at 01:53:51PM +0800, Jason Wang wrote:
> > > > > > This patch tries to make sure the virtio interrupt handler for INTX
> > > > > > won't be called after a reset and before virtio_device_ready(). We
> > > > > > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > > > > > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > > > > > intx_soft_enabled variable and toggle it during in
> > > > > > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > > > > > intx_soft_enabled before processing the actual interrupt.
> > > > > >
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > >
> > > > >
> > > > > Not all that excited about all the memory barriers for something
> > > > > that should be an extremely rare event (for most kernels -
> > > > > literally once per boot). Can't we do better?
> > > >
> > > > I'm not sure, but do we need to care about the slow path (INTX)?
> > >
> > > Otherwise we won't try to support this, right?
> >
> > Sorry, what I meant is "do we need to care about the performance of
> > the slow path".
> >
> > >
> > > > (Or do you have a better approach?)
> > > >
> > > > Thanks
> > >
> > > Don't know really, maybe rcu or whatever?
> >
> > I am sure it's worth it to bother since it's the slow path.
> >
> > > But let's try to be much more specific - is there anything
> > > specific we are trying to protect against here?
> >
> > The unexpected calling of the vring or config interrupt handler. (The
> > same as MSI-X, e.g the untrusted device can send irq at any time).
> >
> > Thanks
>
> And so, does this do more than crash the guest?  Hypervisors
> already can do that ...

Yes, so for DOS and shutdown it should be fine, but for other kinds of
crash, it would be very hard to say what can happen (e.g manipulating
SWIOTLB or page table etc). So it's better to avoid non DOS crashes.

Thanks

>
>
> > >
> > >
> > >
> > > > >
> > > > > > ---
> > > > > >  drivers/virtio/virtio_pci_common.c | 18 ++++++++++++++++--
> > > > > >  drivers/virtio/virtio_pci_common.h |  1 +
> > > > > >  2 files changed, 17 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > > index 0b9523e6dd39..835197151dc1 100644
> > > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > > @@ -30,8 +30,12 @@ void vp_disable_vectors(struct virtio_device *vdev)
> > > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > >       int i;
> > > > > >
> > > > > > -     if (vp_dev->intx_enabled)
> > > > > > +     if (vp_dev->intx_enabled) {
> > > > > > +             vp_dev->intx_soft_enabled = false;
> > > > > > +             /* ensure the vp_interrupt see this intx_soft_enabled value */
> > > > > > +             smp_wmb();
> > > > > >               synchronize_irq(vp_dev->pci_dev->irq);
> > > > > > +     }
> > > > > >
> > > > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > >               disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > @@ -43,8 +47,12 @@ void vp_enable_vectors(struct virtio_device *vdev)
> > > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > >       int i;
> > > > > >
> > > > > > -     if (vp_dev->intx_enabled)
> > > > > > +     if (vp_dev->intx_enabled) {
> > > > > > +             vp_dev->intx_soft_enabled = true;
> > > > > > +             /* ensure the vp_interrupt see this intx_soft_enabled value */
> > > > > > +             smp_wmb();
> > > > > >               return;
> > > > > > +     }
> > > > > >
> > > > > >       for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > > >               enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > > > @@ -97,6 +105,12 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
> > > > > >       struct virtio_pci_device *vp_dev = opaque;
> > > > > >       u8 isr;
> > > > > >
> > > > > > +     if (!vp_dev->intx_soft_enabled)
> > > > > > +             return IRQ_NONE;
> > > > > > +
> > > > > > +     /* read intx_soft_enabled before read others */
> > > > > > +     smp_rmb();
> > > > > > +
> > > > > >       /* reading the ISR has the effect of also clearing it so it's very
> > > > > >        * important to save off the value. */
> > > > > >       isr = ioread8(vp_dev->isr);
> > > > > > diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > > > > > index a235ce9ff6a5..3c06e0f92ee4 100644
> > > > > > --- a/drivers/virtio/virtio_pci_common.h
> > > > > > +++ b/drivers/virtio/virtio_pci_common.h
> > > > > > @@ -64,6 +64,7 @@ struct virtio_pci_device {
> > > > > >       /* MSI-X support */
> > > > > >       int msix_enabled;
> > > > > >       int intx_enabled;
> > > > > > +     bool intx_soft_enabled;
> > > > > >       cpumask_var_t *msix_affinity_masks;
> > > > > >       /* Name strings for interrupts. This size should be enough,
> > > > > >        * and I'm too lazy to allocate each name separately. */
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/9] virtio-blk: validate num_queues during probe
  2021-09-13  5:53 ` [PATCH 1/9] virtio-blk: validate num_queues during probe Jason Wang
@ 2021-09-13  7:48   ` Stefano Garzarella
  2021-09-14  2:29     ` Jason Wang
  2021-09-13 12:05   ` Stefan Hajnoczi
  1 sibling, 1 reply; 55+ messages in thread
From: Stefano Garzarella @ 2021-09-13  7:48 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, david.kaplan, konrad.wilk, f.hetzelt, linux-kernel,
	virtualization, Stefan Hajnoczi, Paolo Bonzini

On Mon, Sep 13, 2021 at 01:53:45PM +0800, Jason Wang wrote:
>If an untrusted device neogitates BLK_F_MQ but advertises a zero
>num_queues, the driver may end up trying to allocating zero size
>buffers where ZERO_SIZE_PTR is returned which may pass the checking
>against the NULL. This will lead unexpected results.
>
>Fixing this by using single queue if num_queues is zero.
>
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: Stefan Hajnoczi <stefanha@redhat.com>
>Signed-off-by: Jason Wang <jasowang@redhat.com>
>---
> drivers/block/virtio_blk.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>index e574fbf5e6df..f130d12df4b9 100644
>--- a/drivers/block/virtio_blk.c
>+++ b/drivers/block/virtio_blk.c
>@@ -498,7 +498,8 @@ static int init_vq(struct virtio_blk *vblk)
> 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
> 				   struct virtio_blk_config, num_queues,
> 				   &num_vqs);
>-	if (err)
>+	/* We need at least on virtqueue */

s/on/one/

The rest LGTM.

Stefano

>+	if (err || !num_vqs)
> 		num_vqs = 1;
>
> 	num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
>-- 
>2.25.1
>
>_______________________________________________
>Virtualization mailing list
>Virtualization@lists.linux-foundation.org
>https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/9] virtio-blk: validate num_queues during probe
  2021-09-13  5:53 ` [PATCH 1/9] virtio-blk: validate num_queues during probe Jason Wang
  2021-09-13  7:48   ` Stefano Garzarella
@ 2021-09-13 12:05   ` Stefan Hajnoczi
  1 sibling, 0 replies; 55+ messages in thread
From: Stefan Hajnoczi @ 2021-09-13 12:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

On Mon, Sep 13, 2021 at 01:53:45PM +0800, Jason Wang wrote:
> If an untrusted device neogitates BLK_F_MQ but advertises a zero
> num_queues, the driver may end up trying to allocating zero size
> buffers where ZERO_SIZE_PTR is returned which may pass the checking
> against the NULL. This will lead unexpected results.
> 
> Fixing this by using single queue if num_queues is zero.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/block/virtio_blk.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13  7:07               ` Jason Wang
@ 2021-09-13 19:38                 ` Thomas Gleixner
  2021-09-13 20:54                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Thomas Gleixner @ 2021-09-13 19:38 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13 2021 at 15:07, Jason Wang wrote:
> On Mon, Sep 13, 2021 at 2:50 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> > But doen't "irq is disabled" basically mean "we told the hypervisor
>> > to disable the irq"?  What extractly prevents hypervisor from
>> > sending the irq even if guest thinks it disabled it?
>>
>> More generally, can't we for example blow away the
>> indir_desc array that we use to keep the ctx pointers?
>> Won't that be enough?
>
> I'm not sure how it is related to the indirect descriptor but an
> example is that all the current driver will assume:
>
> 1) the interrupt won't be raised before virtio_device_ready()
> 2) the interrupt won't be raised after reset()

If that assumption exists, then you better keep the interrupt line
disabled until virtio_device_ready() has completed and disable it again
before reset() is invoked. That's a question of general robustness and
not really a question of trusted hypervisors and encrypted guests.

>> > > > > > > +void vp_disable_vectors(struct virtio_device *vdev)
>> > > > > > >  {
>> > > > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> > > > > > >       int i;
>> > > > > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
>> > > > > > >               synchronize_irq(vp_dev->pci_dev->irq);

Don't you want the same change for non-MSI interrupts?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13 19:38                 ` Thomas Gleixner
@ 2021-09-13 20:54                   ` Michael S. Tsirkin
  2021-09-13 22:31                     ` Thomas Gleixner
  2021-09-14  8:29                     ` Thomas Gleixner
  0 siblings, 2 replies; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13 20:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jason Wang, virtualization, linux-kernel, Hetzelt, Felicitas,
	kaplan, david, Konrad Rzeszutek Wilk, pbonzini, Andi Kleen,
	Dan Williams, Kuppuswamy, Sathyanarayanan, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13, 2021 at 09:38:30PM +0200, Thomas Gleixner wrote:
> On Mon, Sep 13 2021 at 15:07, Jason Wang wrote:
> > On Mon, Sep 13, 2021 at 2:50 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > But doen't "irq is disabled" basically mean "we told the hypervisor
> >> > to disable the irq"?  What extractly prevents hypervisor from
> >> > sending the irq even if guest thinks it disabled it?
> >>
> >> More generally, can't we for example blow away the
> >> indir_desc array that we use to keep the ctx pointers?
> >> Won't that be enough?
> >
> > I'm not sure how it is related to the indirect descriptor but an
> > example is that all the current driver will assume:
> >
> > 1) the interrupt won't be raised before virtio_device_ready()
> > 2) the interrupt won't be raised after reset()
> 
> If that assumption exists, then you better keep the interrupt line
> disabled until virtio_device_ready() has completed

started not completed. device is allowed to send
config interrupts right after DRIVER_OK status is set by
virtio_device_ready.

> and disable it again
> before reset() is invoked. That's a question of general robustness and
> not really a question of trusted hypervisors and encrypted guests.

We can do this for some MSIX interrupts, sure. Not for shared interrupts though.

> >> > > > > > > +void vp_disable_vectors(struct virtio_device *vdev)
> >> > > > > > >  {
> >> > > > > > >       struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >> > > > > > >       int i;
> >> > > > > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> >> > > > > > >               synchronize_irq(vp_dev->pci_dev->irq);
> 
> Don't you want the same change for non-MSI interrupts?


We can't disable them - these are shared.

> Thanks,
> 
>         tglx


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13  5:53 ` [PATCH 7/9] virtio-pci: harden INTX interrupts Jason Wang
  2021-09-13  6:33   ` Michael S. Tsirkin
@ 2021-09-13 21:36   ` Thomas Gleixner
  2021-09-13 22:01     ` Michael S. Tsirkin
                       ` (3 more replies)
  1 sibling, 4 replies; 55+ messages in thread
From: Thomas Gleixner @ 2021-09-13 21:36 UTC (permalink / raw)
  To: Jason Wang, mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Peter Zijlstra, Will Deacon, Boqun Feng,
	Paul E. McKenney

Jason,

On Mon, Sep 13 2021 at 13:53, Jason Wang wrote:
> This patch tries to make sure the virtio interrupt handler for INTX
> won't be called after a reset and before virtio_device_ready(). We
> can't use IRQF_NO_AUTOEN since we're using shared interrupt
> (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> intx_soft_enabled variable and toggle it during in
> vp_disable/enable_vectors(). The INTX interrupt handler will check
> intx_soft_enabled before processing the actual interrupt.

Ah, there it is :)

Cc'ed our memory ordering wizards as I might be wrong as usual.

> -	if (vp_dev->intx_enabled)
> +	if (vp_dev->intx_enabled) {
> +		vp_dev->intx_soft_enabled = false;
> +		/* ensure the vp_interrupt see this intx_soft_enabled value */
> +		smp_wmb();
>  		synchronize_irq(vp_dev->pci_dev->irq);

As you are synchronizing the interrupt here anyway, what is the value of
the barrier?

 		vp_dev->intx_soft_enabled = false;
  		synchronize_irq(vp_dev->pci_dev->irq);

is sufficient because of:

synchronize_irq()
   do {
   	raw_spin_lock(desc->lock);
        in_progress = check_inprogress(desc);
   	raw_spin_unlock(desc->lock);
   } while (in_progress);     

raw_spin_lock() has ACQUIRE semantics so the store to intx_soft_enabled
can complete after lock has been acquired which is uninteresting.

raw_spin_unlock() has RELEASE semantics so the store to intx_soft_enabled
has to be completed before the unlock completes.

So if the interrupt is on the flight then it might or might not see
intx_soft_enabled == false. But that's true for your barrier construct
as well.

The important part is that any interrupt for this line arriving after
synchronize_irq() has completed is guaranteed to see intx_soft_enabled
== false.

That is what you want to achieve, right?

>  	for (i = 0; i < vp_dev->msix_vectors; ++i)
>  		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> @@ -43,8 +47,12 @@ void vp_enable_vectors(struct virtio_device *vdev)
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  	int i;
>  
> -	if (vp_dev->intx_enabled)
> +	if (vp_dev->intx_enabled) {
> +		vp_dev->intx_soft_enabled = true;
> +		/* ensure the vp_interrupt see this intx_soft_enabled value */
> +		smp_wmb();

For the enable case the barrier is pointless vs. intx_soft_enabled

CPU 0                                           CPU 1

interrupt                                       vp_enable_vectors()
  vp_interrupt()                                
    if (!vp_dev->intx_soft_enabled)
       return IRQ_NONE;
                                                  vp_dev->intx_soft_enabled = true;

IOW, the concurrent interrupt might or might not see the store. That's
not a problem for legacy PCI interrupts. If it did not see the store and
the interrupt originated from that device then it will account it as one
spurious interrupt which will get raised again because those interrupts
are level triggered and nothing acknowledged it at the device level.

Now, what's more interesting is that is has to be guaranteed that the
interrupt which observes

        vp_dev->intx_soft_enabled == true

also observes all preceeding stores, i.e. those which make the interrupt
handler capable of handling the interrupt.

That's the real problem and for that your barrier is at the wrong place
because you want to make sure that those stores are visible before the
store to intx_soft_enabled becomes visible, i.e. this should be:


        /* Ensure that all preceeding stores are visible before intx_soft_enabled */
	smp_wmb();
	vp_dev->intx_soft_enabled = true;

Now Micheal is not really enthusiatic about the barrier in the interrupt
handler hotpath, which is understandable.

As the device startup is not really happening often it's sensible to do
the following

        disable_irq();
        vp_dev->intx_soft_enabled = true;
        enable_irq();

because:

        disable_irq()
          synchronize_irq()

acts as a barrier for the preceeding stores:

        disable_irq()
   	  raw_spin_lock(desc->lock);
          __disable_irq(desc);
   	  raw_spin_unlock(desc->lock);

          synchronize_irq()
            do {
   	      raw_spin_lock(desc->lock);
              in_progress = check_inprogress(desc);
   	      raw_spin_unlock(desc->lock);
            } while (in_progress);     

        intx_soft_enabled = true;

        enable_irq();

In this case synchronize_irq() prevents the subsequent store to
intx_soft_enabled to leak into the __disable_irq(desc) section which in
turn makes it impossible for an interrupt handler to observe
intx_soft_enabled == true before the prerequisites which preceed the
call to disable_irq() are visible.

Of course the memory ordering wizards might disagree, but if they do,
then we have a massive chase of ordering problems vs. similar constructs
all over the tree ahead of us.

From the interrupt perspective the sequence:

        disable_irq();
        vp_dev->intx_soft_enabled = true;
        enable_irq();

is perfectly fine as well. Any interrupt arriving during the disabled
section will be reraised on enable_irq() in hardware because it's a
level interrupt. Any resulting failure is either a hardware or a
hypervisor bug.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13 21:36   ` Thomas Gleixner
@ 2021-09-13 22:01     ` Michael S. Tsirkin
  2021-09-13 22:20       ` Thomas Gleixner
  2021-09-14  2:50     ` Jason Wang
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-09-13 22:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jason Wang, virtualization, linux-kernel, f.hetzelt,
	david.kaplan, konrad.wilk, Peter Zijlstra, Will Deacon,
	Boqun Feng, Paul E. McKenney

On Mon, Sep 13, 2021 at 11:36:24PM +0200, Thomas Gleixner wrote:
> >From the interrupt perspective the sequence:
> 
>         disable_irq();
>         vp_dev->intx_soft_enabled = true;
>         enable_irq();
> 
> is perfectly fine as well. Any interrupt arriving during the disabled
> section will be reraised on enable_irq() in hardware because it's a
> level interrupt. Any resulting failure is either a hardware or a
> hypervisor bug.

yes but it's a shared interrupt. what happens if multiple callers do
this in parallel?


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13 22:01     ` Michael S. Tsirkin
@ 2021-09-13 22:20       ` Thomas Gleixner
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gleixner @ 2021-09-13 22:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, linux-kernel, f.hetzelt,
	david.kaplan, konrad.wilk, Peter Zijlstra, Will Deacon,
	Boqun Feng, Paul E. McKenney

On Mon, Sep 13 2021 at 18:01, Michael S. Tsirkin wrote:
> On Mon, Sep 13, 2021 at 11:36:24PM +0200, Thomas Gleixner wrote:
>> >From the interrupt perspective the sequence:
>> 
>>         disable_irq();
>>         vp_dev->intx_soft_enabled = true;
>>         enable_irq();
>> 
>> is perfectly fine as well. Any interrupt arriving during the disabled
>> section will be reraised on enable_irq() in hardware because it's a
>> level interrupt. Any resulting failure is either a hardware or a
>> hypervisor bug.
>
> yes but it's a shared interrupt. what happens if multiple callers do
> this in parallel?

Nothing as each caller is serialized vs. itself and its own interrupt
handler it cares about.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13 20:54                   ` Michael S. Tsirkin
@ 2021-09-13 22:31                     ` Thomas Gleixner
  2021-09-14  2:20                       ` Jason Wang
  2021-09-14  8:29                     ` Thomas Gleixner
  1 sibling, 1 reply; 55+ messages in thread
From: Thomas Gleixner @ 2021-09-13 22:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, linux-kernel, Hetzelt, Felicitas,
	kaplan, david, Konrad Rzeszutek Wilk, pbonzini, Andi Kleen,
	Dan Williams, Kuppuswamy, Sathyanarayanan, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13 2021 at 16:54, Michael S. Tsirkin wrote:

> On Mon, Sep 13, 2021 at 09:38:30PM +0200, Thomas Gleixner wrote:
>> On Mon, Sep 13 2021 at 15:07, Jason Wang wrote:
>> > On Mon, Sep 13, 2021 at 2:50 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> > But doen't "irq is disabled" basically mean "we told the hypervisor
>> >> > to disable the irq"?  What extractly prevents hypervisor from
>> >> > sending the irq even if guest thinks it disabled it?
>> >>
>> >> More generally, can't we for example blow away the
>> >> indir_desc array that we use to keep the ctx pointers?
>> >> Won't that be enough?
>> >
>> > I'm not sure how it is related to the indirect descriptor but an
>> > example is that all the current driver will assume:
>> >
>> > 1) the interrupt won't be raised before virtio_device_ready()
>> > 2) the interrupt won't be raised after reset()
>> 
>> If that assumption exists, then you better keep the interrupt line
>> disabled until virtio_device_ready() has completed
>
> started not completed. device is allowed to send
> config interrupts right after DRIVER_OK status is set by
> virtio_device_ready.

Whatever:

 * Define the exact point from which on the driver is able to handle the
   interrupt and put the enable after that point

 * Define the exact point from which on the driver is unable to handle
   the interrupt and put the disable before that point

The above is blury.

>> and disable it again
>> before reset() is invoked. That's a question of general robustness and
>> not really a question of trusted hypervisors and encrypted guests.
>
> We can do this for some MSIX interrupts, sure. Not for shared interrupts though.

See my reply to the next patch. The problem is the same:

 * Define the exact point from which on the driver is able to handle the
   interrupt and allow the handler to proceed after that point

 * Define the exact point from which on the driver is unable to handle
   the interrupt and ensure that the handler denies to proceed before
   that point

Same story just a different mechanism.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13 22:31                     ` Thomas Gleixner
@ 2021-09-14  2:20                       ` Jason Wang
  0 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-14  2:20 UTC (permalink / raw)
  To: Thomas Gleixner, Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk, pbonzini, Andi Kleen, Dan Williams,
	Kuppuswamy, Sathyanarayanan, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML


在 2021/9/14 上午6:31, Thomas Gleixner 写道:
> On Mon, Sep 13 2021 at 16:54, Michael S. Tsirkin wrote:
>
>> On Mon, Sep 13, 2021 at 09:38:30PM +0200, Thomas Gleixner wrote:
>>> On Mon, Sep 13 2021 at 15:07, Jason Wang wrote:
>>>> On Mon, Sep 13, 2021 at 2:50 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>> But doen't "irq is disabled" basically mean "we told the hypervisor
>>>>>> to disable the irq"?  What extractly prevents hypervisor from
>>>>>> sending the irq even if guest thinks it disabled it?
>>>>> More generally, can't we for example blow away the
>>>>> indir_desc array that we use to keep the ctx pointers?
>>>>> Won't that be enough?
>>>> I'm not sure how it is related to the indirect descriptor but an
>>>> example is that all the current driver will assume:
>>>>
>>>> 1) the interrupt won't be raised before virtio_device_ready()
>>>> 2) the interrupt won't be raised after reset()
>>> If that assumption exists, then you better keep the interrupt line
>>> disabled until virtio_device_ready() has completed
>> started not completed. device is allowed to send
>> config interrupts right after DRIVER_OK status is set by
>> virtio_device_ready.
> Whatever:
>
>   * Define the exact point from which on the driver is able to handle the
>     interrupt and put the enable after that point
>
>   * Define the exact point from which on the driver is unable to handle
>     the interrupt and put the disable before that point


Yes, this is exactly what this patch (and INTX patch) want to achieve. 
The driver should only able to handle the interrupt after 
virtio_device_ready() but before reset().

Thanks


>
> The above is blury.
>
>>> and disable it again
>>> before reset() is invoked. That's a question of general robustness and
>>> not really a question of trusted hypervisors and encrypted guests.
>> We can do this for some MSIX interrupts, sure. Not for shared interrupts though.
> See my reply to the next patch. The problem is the same:
>
>   * Define the exact point from which on the driver is able to handle the
>     interrupt and allow the handler to proceed after that point
>
>   * Define the exact point from which on the driver is unable to handle
>     the interrupt and ensure that the handler denies to proceed before
>     that point
>
> Same story just a different mechanism.
>
> Thanks,
>
>          tglx
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/9] virtio-blk: validate num_queues during probe
  2021-09-13  7:48   ` Stefano Garzarella
@ 2021-09-14  2:29     ` Jason Wang
  0 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-14  2:29 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: mst, david.kaplan, konrad.wilk, f.hetzelt, linux-kernel,
	virtualization, Stefan Hajnoczi, Paolo Bonzini


在 2021/9/13 下午3:48, Stefano Garzarella 写道:
> On Mon, Sep 13, 2021 at 01:53:45PM +0800, Jason Wang wrote:
>> If an untrusted device neogitates BLK_F_MQ but advertises a zero
>> num_queues, the driver may end up trying to allocating zero size
>> buffers where ZERO_SIZE_PTR is returned which may pass the checking
>> against the NULL. This will lead unexpected results.
>>
>> Fixing this by using single queue if num_queues is zero.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/block/virtio_blk.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index e574fbf5e6df..f130d12df4b9 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -498,7 +498,8 @@ static int init_vq(struct virtio_blk *vblk)
>>     err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
>>                    struct virtio_blk_config, num_queues,
>>                    &num_vqs);
>> -    if (err)
>> +    /* We need at least on virtqueue */
>
> s/on/one/
>
> The rest LGTM.
>
> Stefano


Will fix in next version.

Thanks


>
>> +    if (err || !num_vqs)
>>         num_vqs = 1;
>>
>>     num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
>> -- 
>> 2.25.1
>>
>> _______________________________________________
>> Virtualization mailing list
>> Virtualization@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13 21:36   ` Thomas Gleixner
  2021-09-13 22:01     ` Michael S. Tsirkin
@ 2021-09-14  2:50     ` Jason Wang
  2021-09-14  9:34     ` Boqun Feng
  2021-09-14 11:03     ` Peter Zijlstra
  3 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-09-14  2:50 UTC (permalink / raw)
  To: Thomas Gleixner, mst
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Peter Zijlstra, Will Deacon, Boqun Feng,
	Paul E. McKenney


在 2021/9/14 上午5:36, Thomas Gleixner 写道:
> Jason,
>
> On Mon, Sep 13 2021 at 13:53, Jason Wang wrote:
>> This patch tries to make sure the virtio interrupt handler for INTX
>> won't be called after a reset and before virtio_device_ready(). We
>> can't use IRQF_NO_AUTOEN since we're using shared interrupt
>> (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
>> intx_soft_enabled variable and toggle it during in
>> vp_disable/enable_vectors(). The INTX interrupt handler will check
>> intx_soft_enabled before processing the actual interrupt.
> Ah, there it is :)
>
> Cc'ed our memory ordering wizards as I might be wrong as usual.
>
>> -	if (vp_dev->intx_enabled)
>> +	if (vp_dev->intx_enabled) {
>> +		vp_dev->intx_soft_enabled = false;
>> +		/* ensure the vp_interrupt see this intx_soft_enabled value */
>> +		smp_wmb();
>>   		synchronize_irq(vp_dev->pci_dev->irq);
> As you are synchronizing the interrupt here anyway, what is the value of
> the barrier?
>
>   		vp_dev->intx_soft_enabled = false;
>    		synchronize_irq(vp_dev->pci_dev->irq);
>
> is sufficient because of:
>
> synchronize_irq()
>     do {
>     	raw_spin_lock(desc->lock);
>          in_progress = check_inprogress(desc);
>     	raw_spin_unlock(desc->lock);
>     } while (in_progress);
>
> raw_spin_lock() has ACQUIRE semantics so the store to intx_soft_enabled
> can complete after lock has been acquired which is uninteresting.
>
> raw_spin_unlock() has RELEASE semantics so the store to intx_soft_enabled
> has to be completed before the unlock completes.
>
> So if the interrupt is on the flight then it might or might not see
> intx_soft_enabled == false. But that's true for your barrier construct
> as well.
>
> The important part is that any interrupt for this line arriving after
> synchronize_irq() has completed is guaranteed to see intx_soft_enabled
> == false.
>
> That is what you want to achieve, right?


Right.


>
>>   	for (i = 0; i < vp_dev->msix_vectors; ++i)
>>   		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
>> @@ -43,8 +47,12 @@ void vp_enable_vectors(struct virtio_device *vdev)
>>   	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>   	int i;
>>   
>> -	if (vp_dev->intx_enabled)
>> +	if (vp_dev->intx_enabled) {
>> +		vp_dev->intx_soft_enabled = true;
>> +		/* ensure the vp_interrupt see this intx_soft_enabled value */
>> +		smp_wmb();
> For the enable case the barrier is pointless vs. intx_soft_enabled
>
> CPU 0                                           CPU 1
>
> interrupt                                       vp_enable_vectors()
>    vp_interrupt()
>      if (!vp_dev->intx_soft_enabled)
>         return IRQ_NONE;
>                                                    vp_dev->intx_soft_enabled = true;
>
> IOW, the concurrent interrupt might or might not see the store. That's
> not a problem for legacy PCI interrupts. If it did not see the store and
> the interrupt originated from that device then it will account it as one
> spurious interrupt which will get raised again because those interrupts
> are level triggered and nothing acknowledged it at the device level.


I see.


>
> Now, what's more interesting is that is has to be guaranteed that the
> interrupt which observes
>
>          vp_dev->intx_soft_enabled == true
>
> also observes all preceeding stores, i.e. those which make the interrupt
> handler capable of handling the interrupt.
>
> That's the real problem and for that your barrier is at the wrong place
> because you want to make sure that those stores are visible before the
> store to intx_soft_enabled becomes visible, i.e. this should be:
>
>
>          /* Ensure that all preceeding stores are visible before intx_soft_enabled */
> 	smp_wmb();
> 	vp_dev->intx_soft_enabled = true;


Yes, I see.


>
> Now Micheal is not really enthusiatic about the barrier in the interrupt
> handler hotpath, which is understandable.
>
> As the device startup is not really happening often it's sensible to do
> the following
>
>          disable_irq();
>          vp_dev->intx_soft_enabled = true;
>          enable_irq();
>
> because:
>
>          disable_irq()
>            synchronize_irq()
>
> acts as a barrier for the preceeding stores:
>
>          disable_irq()
>     	  raw_spin_lock(desc->lock);
>            __disable_irq(desc);
>     	  raw_spin_unlock(desc->lock);
>
>            synchronize_irq()
>              do {
>     	      raw_spin_lock(desc->lock);
>                in_progress = check_inprogress(desc);
>     	      raw_spin_unlock(desc->lock);
>              } while (in_progress);
>
>          intx_soft_enabled = true;
>
>          enable_irq();
>
> In this case synchronize_irq() prevents the subsequent store to
> intx_soft_enabled to leak into the __disable_irq(desc) section which in
> turn makes it impossible for an interrupt handler to observe
> intx_soft_enabled == true before the prerequisites which preceed the
> call to disable_irq() are visible.
>
> Of course the memory ordering wizards might disagree, but if they do,
> then we have a massive chase of ordering problems vs. similar constructs
> all over the tree ahead of us.
>
>  From the interrupt perspective the sequence:
>
>          disable_irq();
>          vp_dev->intx_soft_enabled = true;
>          enable_irq();
>
> is perfectly fine as well. Any interrupt arriving during the disabled
> section will be reraised on enable_irq() in hardware because it's a
> level interrupt. Any resulting failure is either a hardware or a
> hypervisor bug.


Thanks a lot for the detail clarifications. Will switch to use 
disable_irq()/enable_irq() if no objection from memory ordering wizards.


>
> Thanks,
>
>          tglx
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] virtio_pci: harden MSI-X interrupts
  2021-09-13 20:54                   ` Michael S. Tsirkin
  2021-09-13 22:31                     ` Thomas Gleixner
@ 2021-09-14  8:29                     ` Thomas Gleixner
  1 sibling, 0 replies; 55+ messages in thread
From: Thomas Gleixner @ 2021-09-14  8:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, linux-kernel, Hetzelt, Felicitas,
	kaplan, david, Konrad Rzeszutek Wilk, pbonzini, Andi Kleen,
	Dan Williams, Kuppuswamy, Sathyanarayanan, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andy Lutomirski, Bjorn Helgaas,
	Richard Henderson, Thomas Bogendoerfer, James E J Bottomley,
	Helge Deller, David S . Miller, Arnd Bergmann, Jonathan Corbet,
	Peter H Anvin, Dave Hansen, Tony Luck, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, X86 ML

On Mon, Sep 13 2021 at 16:54, Michael S. Tsirkin wrote:
> On Mon, Sep 13, 2021 at 09:38:30PM +0200, Thomas Gleixner wrote:
>> and disable it again
>> before reset() is invoked. That's a question of general robustness and
>> not really a question of trusted hypervisors and encrypted guests.
>
> We can do this for some MSIX interrupts, sure. Not for shared interrupts though.

But you have to make sure that the handler does not run before and after
the defined points. And that's even more important for shared because
with shared interrupts the interrupt can be raised at any point in time
via the other devices which share the line.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13 21:36   ` Thomas Gleixner
  2021-09-13 22:01     ` Michael S. Tsirkin
  2021-09-14  2:50     ` Jason Wang
@ 2021-09-14  9:34     ` Boqun Feng
  2021-09-14 11:03     ` Peter Zijlstra
  3 siblings, 0 replies; 55+ messages in thread
From: Boqun Feng @ 2021-09-14  9:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jason Wang, mst, virtualization, linux-kernel, f.hetzelt,
	david.kaplan, konrad.wilk, Peter Zijlstra, Will Deacon,
	Paul E. McKenney

On Mon, Sep 13, 2021 at 11:36:24PM +0200, Thomas Gleixner wrote:
[...]
> As the device startup is not really happening often it's sensible to do
> the following
> 
>         disable_irq();
>         vp_dev->intx_soft_enabled = true;
>         enable_irq();
> 
> because:
> 
>         disable_irq()
>           synchronize_irq()
> 
> acts as a barrier for the preceeding stores:
> 
>         disable_irq()
>    	  raw_spin_lock(desc->lock);
>           __disable_irq(desc);
>    	  raw_spin_unlock(desc->lock);
> 
>           synchronize_irq()
>             do {
>    	      raw_spin_lock(desc->lock);
>               in_progress = check_inprogress(desc);
>    	      raw_spin_unlock(desc->lock);
>             } while (in_progress);     
> 
>         intx_soft_enabled = true;
> 
>         enable_irq();
> 
> In this case synchronize_irq() prevents the subsequent store to
> intx_soft_enabled to leak into the __disable_irq(desc) section which in
> turn makes it impossible for an interrupt handler to observe
> intx_soft_enabled == true before the prerequisites which preceed the
> call to disable_irq() are visible.
> 

Right. In our memory model, raw_spin_unlock(desc->lock) +
raw_spin_lock(desc->lock) provides the so-call RCtso ordering, that is
for the following code:

	A
	...
	raw_spin_unlock(desc->lock);
	...
	raw_spin_lock(desc->lock);
	...
	B

Memory accesses A and B will not be reordered unless A is a store and B
is a load. Such an ordering guarantee fulfils the requirement here.

For more information, see the LOCKING section of
tools/memory-model/Documentation/explanation.txt

Regards,
Boqun

> Of course the memory ordering wizards might disagree, but if they do,
> then we have a massive chase of ordering problems vs. similar constructs
> all over the tree ahead of us.
> 
> From the interrupt perspective the sequence:
> 
>         disable_irq();
>         vp_dev->intx_soft_enabled = true;
>         enable_irq();
> 
> is perfectly fine as well. Any interrupt arriving during the disabled
> section will be reraised on enable_irq() in hardware because it's a
> level interrupt. Any resulting failure is either a hardware or a
> hypervisor bug.
> 
> Thanks,
> 
>         tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-13 21:36   ` Thomas Gleixner
                       ` (2 preceding siblings ...)
  2021-09-14  9:34     ` Boqun Feng
@ 2021-09-14 11:03     ` Peter Zijlstra
  2021-09-14 11:09       ` Thomas Gleixner
  3 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2021-09-14 11:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jason Wang, mst, virtualization, linux-kernel, f.hetzelt,
	david.kaplan, konrad.wilk, Will Deacon, Boqun Feng,
	Paul E. McKenney

On Mon, Sep 13, 2021 at 11:36:24PM +0200, Thomas Gleixner wrote:

> That's the real problem and for that your barrier is at the wrong place
> because you want to make sure that those stores are visible before the
> store to intx_soft_enabled becomes visible, i.e. this should be:
> 
> 
>         /* Ensure that all preceeding stores are visible before intx_soft_enabled */
> 	smp_wmb();
> 	vp_dev->intx_soft_enabled = true;

That arguably wants to be smp_store_release() instead of smp_wmb() :-)

> Now Micheal is not really enthusiatic about the barrier in the interrupt
> handler hotpath, which is understandable.
> 
> As the device startup is not really happening often it's sensible to do
> the following
> 
>         disable_irq();
>         vp_dev->intx_soft_enabled = true;
>         enable_irq();
> 
> because:
> 
>         disable_irq()
>           synchronize_irq()
> 
> acts as a barrier for the preceeding stores:
> 
>         disable_irq()
>    	  raw_spin_lock(desc->lock);
>           __disable_irq(desc);
>    	  raw_spin_unlock(desc->lock);
> 
>           synchronize_irq()
>             do {
>    	      raw_spin_lock(desc->lock);
>               in_progress = check_inprogress(desc);
>    	      raw_spin_unlock(desc->lock);
>             } while (in_progress);     

Here you rely on the UNLOCK+LOCK pattern because we have two adjacent
critical sections (or rather, the same twice), which provides RCtso
ordering, which is sufficient to make the below store:

> 
>         intx_soft_enabled = true;

a RELEASE. still, I would suggest writing it at least using
WRITE_ONCE() with a comment on.

	disable_irq();
	/*
	 * The above disable_irq() provides TSO ordering and as such
	 * promotes the below store to store-release.
	 */
	WRITE_ONCE(intx_soft_enabled, true);
	enable_irq();

> In this case synchronize_irq() prevents the subsequent store to
> intx_soft_enabled to leak into the __disable_irq(desc) section which in
> turn makes it impossible for an interrupt handler to observe
> intx_soft_enabled == true before the prerequisites which preceed the
> call to disable_irq() are visible.
> 
> Of course the memory ordering wizards might disagree, but if they do,
> then we have a massive chase of ordering problems vs. similar constructs
> all over the tree ahead of us.

Your case, UNLOCK s + LOCK s, is fully documented to provide RCtso
ordering. The more general case of: UNLOCK r + LOCK s, will shortly
appear in documentation near you. Meaning we can forget about the
details an blanket state that any UNLOCK followed by a LOCK (on the same
CPU) will provide TSO ordering.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
  2021-09-14 11:03     ` Peter Zijlstra
@ 2021-09-14 11:09       ` Thomas Gleixner
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gleixner @ 2021-09-14 11:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Wang, mst, virtualization, linux-kernel, f.hetzelt,
	david.kaplan, konrad.wilk, Will Deacon, Boqun Feng,
	Paul E. McKenney

On Tue, Sep 14 2021 at 13:03, Peter Zijlstra wrote:
> On Mon, Sep 13, 2021 at 11:36:24PM +0200, Thomas Gleixner wrote:
> Here you rely on the UNLOCK+LOCK pattern because we have two adjacent
> critical sections (or rather, the same twice), which provides RCtso
> ordering, which is sufficient to make the below store:
>
>> 
>>         intx_soft_enabled = true;
>
> a RELEASE. still, I would suggest writing it at least using
> WRITE_ONCE() with a comment on.

Right. forgot about that.

> 	disable_irq();
> 	/*
> 	 * The above disable_irq() provides TSO ordering and as such
> 	 * promotes the below store to store-release.
> 	 */
> 	WRITE_ONCE(intx_soft_enabled, true);
> 	enable_irq();
>
>> In this case synchronize_irq() prevents the subsequent store to
>> intx_soft_enabled to leak into the __disable_irq(desc) section which in
>> turn makes it impossible for an interrupt handler to observe
>> intx_soft_enabled == true before the prerequisites which preceed the
>> call to disable_irq() are visible.
>> 
>> Of course the memory ordering wizards might disagree, but if they do,
>> then we have a massive chase of ordering problems vs. similar constructs
>> all over the tree ahead of us.
>
> Your case, UNLOCK s + LOCK s, is fully documented to provide RCtso
> ordering. The more general case of: UNLOCK r + LOCK s, will shortly
> appear in documentation near you. Meaning we can forget about the
> details an blanket state that any UNLOCK followed by a LOCK (on the same
> CPU) will provide TSO ordering.

I think we also should document the disable/synchronize_irq() scheme
somewhere.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] More virtio hardening
  2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
                   ` (8 preceding siblings ...)
  2021-09-13  5:53 ` [PATCH 9/9] virtio_ring: validate used buffer length Jason Wang
@ 2021-10-05  7:42 ` Michael S. Tsirkin
  2021-10-11  7:36   ` Jason Wang
  9 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-10-05  7:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> Hi All:
> 
> This series treis to do more hardening for virito.
> 
> patch 1 validates the num_queues for virio-blk device.
> patch 2-4 validates max_nr_ports for virito-console device.
> patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> interrupt handler is tiggered. If this makes sense we can do similar
> things in other transport drivers.
> patch 8-9 validate used ring length.
> 
> Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> 
> Please review.
> 
> Thanks

So I poked at console at least, and I think I see
an issue: if interrupt handler queues a work/bh,
then it can still run while reset is in progress.

I sent a patch to fix it for console removal specifically,
but I suspect it's not enough e.g. freeze is still broken.
And note this has been reported without any TDX things -
it's not a malicious device issue, can be triggered just
by module unload.

I am vaguely thinking about new APIs to disable/enable callbacks.
An alternative:

1. adding new remove_nocb/freeze_nocb calls
2. disabling/enabling interrupts automatically around these
3. gradually moving devices to using these
4. once/if all device move, removing the old callbacks

the advantage here is that we'll be sure calls are always
paired correctly.





> Jason Wang (9):
>   virtio-blk: validate num_queues during probe
>   virtio: add doc for validate() method
>   virtio-console: switch to use .validate()
>   virtio_console: validate max_nr_ports before trying to use it
>   virtio_config: introduce a new ready method
>   virtio_pci: harden MSI-X interrupts
>   virtio-pci: harden INTX interrupts
>   virtio_ring: fix typos in vring_desc_extra
>   virtio_ring: validate used buffer length
> 
>  drivers/block/virtio_blk.c         |  3 +-
>  drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
>  drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
>  drivers/virtio/virtio_pci_common.h |  7 ++--
>  drivers/virtio/virtio_pci_legacy.c |  5 +--
>  drivers/virtio/virtio_pci_modern.c |  6 ++--
>  drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
>  include/linux/virtio.h             |  1 +
>  include/linux/virtio_config.h      |  6 ++++
>  9 files changed, 118 insertions(+), 31 deletions(-)
> 
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] More virtio hardening
  2021-10-05  7:42 ` [PATCH 0/9] More virtio hardening Michael S. Tsirkin
@ 2021-10-11  7:36   ` Jason Wang
  2021-10-11 12:36     ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-10-11  7:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > Hi All:
> >
> > This series treis to do more hardening for virito.
> >
> > patch 1 validates the num_queues for virio-blk device.
> > patch 2-4 validates max_nr_ports for virito-console device.
> > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > interrupt handler is tiggered. If this makes sense we can do similar
> > things in other transport drivers.
> > patch 8-9 validate used ring length.
> >
> > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> >
> > Please review.
> >
> > Thanks
>
> So I poked at console at least, and I think I see
> an issue: if interrupt handler queues a work/bh,
> then it can still run while reset is in progress.

Looks like a bug which is unrelated to the hardening? E.g the driver
should sync with work/bh before reset.

>
> I sent a patch to fix it for console removal specifically,
> but I suspect it's not enough e.g. freeze is still broken.
> And note this has been reported without any TDX things -
> it's not a malicious device issue, can be triggered just
> by module unload.
>
> I am vaguely thinking about new APIs to disable/enable callbacks.
> An alternative:
>
> 1. adding new remove_nocb/freeze_nocb calls
> 2. disabling/enabling interrupts automatically around these
> 3. gradually moving devices to using these
> 4. once/if all device move, removing the old callbacks
>
> the advantage here is that we'll be sure calls are always
> paired correctly.

I'm not sure I get the idea, but my feeling is that it doesn't
conflict with the interrupt hardening here (or at least the same
method is required e.g NO_AUTO_EN).

Thanks

>
>
>
>
>
> > Jason Wang (9):
> >   virtio-blk: validate num_queues during probe
> >   virtio: add doc for validate() method
> >   virtio-console: switch to use .validate()
> >   virtio_console: validate max_nr_ports before trying to use it
> >   virtio_config: introduce a new ready method
> >   virtio_pci: harden MSI-X interrupts
> >   virtio-pci: harden INTX interrupts
> >   virtio_ring: fix typos in vring_desc_extra
> >   virtio_ring: validate used buffer length
> >
> >  drivers/block/virtio_blk.c         |  3 +-
> >  drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
> >  drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
> >  drivers/virtio/virtio_pci_common.h |  7 ++--
> >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> >  drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
> >  include/linux/virtio.h             |  1 +
> >  include/linux/virtio_config.h      |  6 ++++
> >  9 files changed, 118 insertions(+), 31 deletions(-)
> >
> > --
> > 2.25.1
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] More virtio hardening
  2021-10-11  7:36   ` Jason Wang
@ 2021-10-11 12:36     ` Michael S. Tsirkin
  2021-10-12  2:43       ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-10-11 12:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > Hi All:
> > >
> > > This series treis to do more hardening for virito.
> > >
> > > patch 1 validates the num_queues for virio-blk device.
> > > patch 2-4 validates max_nr_ports for virito-console device.
> > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > interrupt handler is tiggered. If this makes sense we can do similar
> > > things in other transport drivers.
> > > patch 8-9 validate used ring length.
> > >
> > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > >
> > > Please review.
> > >
> > > Thanks
> >
> > So I poked at console at least, and I think I see
> > an issue: if interrupt handler queues a work/bh,
> > then it can still run while reset is in progress.
> 
> Looks like a bug which is unrelated to the hardening?

Won't preventing use after free be relevant?
I frankly don't know what does hardening means then.
> E.g the driver
> should sync with work/bh before reset.

No, there's no way to fix it ATM without extra locks and state which I
think we should strive to avoid or make it generic, not per-driver,
since sync before reset is useless, new interrupts will just arrive and
queue more work. And a sync after reset is too late since driver will
try to add buffers.

Maybe we can break device. Two issues with that
- drivers might not be ready to handle add_buf failures
- restore needs to unbreak then and we don't have a way to do that yet

So .. careful reading of all device drivers and hoping we don't mess
things up even more ... here we come.

> >
> > I sent a patch to fix it for console removal specifically,
> > but I suspect it's not enough e.g. freeze is still broken.
> > And note this has been reported without any TDX things -
> > it's not a malicious device issue, can be triggered just
> > by module unload.
> >
> > I am vaguely thinking about new APIs to disable/enable callbacks.
> > An alternative:
> >
> > 1. adding new remove_nocb/freeze_nocb calls
> > 2. disabling/enabling interrupts automatically around these
> > 3. gradually moving devices to using these
> > 4. once/if all device move, removing the old callbacks
> >
> > the advantage here is that we'll be sure calls are always
> > paired correctly.
> 
> I'm not sure I get the idea, but my feeling is that it doesn't
> conflict with the interrupt hardening here (or at least the same
> method is required e.g NO_AUTO_EN).
> 
> Thanks

Right.  It's not that it conflicts, it's that I was hoping that
since you are working on hardening you can take up fixing that.
Let me know whether you have the time. Thanks!

> >
> >
> >
> >
> >
> > > Jason Wang (9):
> > >   virtio-blk: validate num_queues during probe
> > >   virtio: add doc for validate() method
> > >   virtio-console: switch to use .validate()
> > >   virtio_console: validate max_nr_ports before trying to use it
> > >   virtio_config: introduce a new ready method
> > >   virtio_pci: harden MSI-X interrupts
> > >   virtio-pci: harden INTX interrupts
> > >   virtio_ring: fix typos in vring_desc_extra
> > >   virtio_ring: validate used buffer length
> > >
> > >  drivers/block/virtio_blk.c         |  3 +-
> > >  drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
> > >  drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
> > >  drivers/virtio/virtio_pci_common.h |  7 ++--
> > >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> > >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> > >  drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
> > >  include/linux/virtio.h             |  1 +
> > >  include/linux/virtio_config.h      |  6 ++++
> > >  9 files changed, 118 insertions(+), 31 deletions(-)
> > >
> > > --
> > > 2.25.1
> >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] More virtio hardening
  2021-10-11 12:36     ` Michael S. Tsirkin
@ 2021-10-12  2:43       ` Jason Wang
  2021-10-12  5:44         ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-10-12  2:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Mon, Oct 11, 2021 at 8:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> > On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > > Hi All:
> > > >
> > > > This series treis to do more hardening for virito.
> > > >
> > > > patch 1 validates the num_queues for virio-blk device.
> > > > patch 2-4 validates max_nr_ports for virito-console device.
> > > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > > interrupt handler is tiggered. If this makes sense we can do similar
> > > > things in other transport drivers.
> > > > patch 8-9 validate used ring length.
> > > >
> > > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > > >
> > > > Please review.
> > > >
> > > > Thanks
> > >
> > > So I poked at console at least, and I think I see
> > > an issue: if interrupt handler queues a work/bh,
> > > then it can still run while reset is in progress.
> >
> > Looks like a bug which is unrelated to the hardening?
>
> Won't preventing use after free be relevant?

Oh right.

> I frankly don't know what does hardening means then.
> > E.g the driver
> > should sync with work/bh before reset.
>
> No, there's no way to fix it ATM without extra locks and state which I
> think we should strive to avoid or make it generic, not per-driver,
> since sync before reset is useless, new interrupts will just arrive and
> queue more work. And a sync after reset is too late since driver will
> try to add buffers.

Can we do something like

1) disable interrupt
2) sync bh

Or I guess this is somehow you meant in the following steps.

>
> Maybe we can break device. Two issues with that
> - drivers might not be ready to handle add_buf failures
> - restore needs to unbreak then and we don't have a way to do that yet
>
> So .. careful reading of all device drivers and hoping we don't mess
> things up even more ... here we come.

Yes.

>
> > >
> > > I sent a patch to fix it for console removal specifically,
> > > but I suspect it's not enough e.g. freeze is still broken.
> > > And note this has been reported without any TDX things -
> > > it's not a malicious device issue, can be triggered just
> > > by module unload.
> > >
> > > I am vaguely thinking about new APIs to disable/enable callbacks.
> > > An alternative:
> > >
> > > 1. adding new remove_nocb/freeze_nocb calls
> > > 2. disabling/enabling interrupts automatically around these
> > > 3. gradually moving devices to using these
> > > 4. once/if all device move, removing the old callbacks
> > >
> > > the advantage here is that we'll be sure calls are always
> > > paired correctly.
> >
> > I'm not sure I get the idea, but my feeling is that it doesn't
> > conflict with the interrupt hardening here (or at least the same
> > method is required e.g NO_AUTO_EN).
> >
> > Thanks
>
> Right.  It's not that it conflicts, it's that I was hoping that
> since you are working on hardening you can take up fixing that.
> Let me know whether you have the time. Thanks!

I can do that.

Thanks

>
> > >
> > >
> > >
> > >
> > >
> > > > Jason Wang (9):
> > > >   virtio-blk: validate num_queues during probe
> > > >   virtio: add doc for validate() method
> > > >   virtio-console: switch to use .validate()
> > > >   virtio_console: validate max_nr_ports before trying to use it
> > > >   virtio_config: introduce a new ready method
> > > >   virtio_pci: harden MSI-X interrupts
> > > >   virtio-pci: harden INTX interrupts
> > > >   virtio_ring: fix typos in vring_desc_extra
> > > >   virtio_ring: validate used buffer length
> > > >
> > > >  drivers/block/virtio_blk.c         |  3 +-
> > > >  drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
> > > >  drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
> > > >  drivers/virtio/virtio_pci_common.h |  7 ++--
> > > >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> > > >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> > > >  drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
> > > >  include/linux/virtio.h             |  1 +
> > > >  include/linux/virtio_config.h      |  6 ++++
> > > >  9 files changed, 118 insertions(+), 31 deletions(-)
> > > >
> > > > --
> > > > 2.25.1
> > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] More virtio hardening
  2021-10-12  2:43       ` Jason Wang
@ 2021-10-12  5:44         ` Michael S. Tsirkin
  2021-10-12  6:11           ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-10-12  5:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Tue, Oct 12, 2021 at 10:43:57AM +0800, Jason Wang wrote:
> On Mon, Oct 11, 2021 at 8:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> > > On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > > > Hi All:
> > > > >
> > > > > This series treis to do more hardening for virito.
> > > > >
> > > > > patch 1 validates the num_queues for virio-blk device.
> > > > > patch 2-4 validates max_nr_ports for virito-console device.
> > > > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > > > interrupt handler is tiggered. If this makes sense we can do similar
> > > > > things in other transport drivers.
> > > > > patch 8-9 validate used ring length.
> > > > >
> > > > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > > > >
> > > > > Please review.
> > > > >
> > > > > Thanks
> > > >
> > > > So I poked at console at least, and I think I see
> > > > an issue: if interrupt handler queues a work/bh,
> > > > then it can still run while reset is in progress.
> > >
> > > Looks like a bug which is unrelated to the hardening?
> >
> > Won't preventing use after free be relevant?
> 
> Oh right.
> 
> > I frankly don't know what does hardening means then.
> > > E.g the driver
> > > should sync with work/bh before reset.
> >
> > No, there's no way to fix it ATM without extra locks and state which I
> > think we should strive to avoid or make it generic, not per-driver,
> > since sync before reset is useless, new interrupts will just arrive and
> > queue more work. And a sync after reset is too late since driver will
> > try to add buffers.
> 
> Can we do something like
> 
> 1) disable interrupt
> 2) sync bh
> 
> Or I guess this is somehow you meant in the following steps.

So that would mean a new API to disable vq interrupts.
reset will re-enable.
E.g. virtqueue_cancel_cb_before_reset()?

Then drivers can sync, then reset.
This means maintaining more state though, which I don't like.

An alternative is something like this:

static void (*virtio_flush_device)(struct virtio_device *dev);

void virtio_reset_device(struct virtio_device *dev, virtio_flush_device flush)
{
	might_sleep();
	if (flush) {
		dev->config->disable_interrupts(dev);
		flush(dev);
		dev->config->reset(dev);
		dev->config->enable_interrupts(dev);
	} else {
		dev->config->reset(dev);
	}
}

I have patches wrapping all reset calls in virtio_reset_device
(without the flush parameter but that's easy to tweak).


> >
> > Maybe we can break device. Two issues with that
> > - drivers might not be ready to handle add_buf failures
> > - restore needs to unbreak then and we don't have a way to do that yet
> >
> > So .. careful reading of all device drivers and hoping we don't mess
> > things up even more ... here we come.
> 
> Yes.

The biggest issue with this trick is drivers not handling add_buf
errors, adding a failure path here risks creating memory leaks.
OTOH with e.g. bounce buffers maybe it's possible for add buf to
fail anyway?

> >
> > > >
> > > > I sent a patch to fix it for console removal specifically,
> > > > but I suspect it's not enough e.g. freeze is still broken.
> > > > And note this has been reported without any TDX things -
> > > > it's not a malicious device issue, can be triggered just
> > > > by module unload.
> > > >
> > > > I am vaguely thinking about new APIs to disable/enable callbacks.
> > > > An alternative:
> > > >
> > > > 1. adding new remove_nocb/freeze_nocb calls
> > > > 2. disabling/enabling interrupts automatically around these
> > > > 3. gradually moving devices to using these
> > > > 4. once/if all device move, removing the old callbacks
> > > >
> > > > the advantage here is that we'll be sure calls are always
> > > > paired correctly.
> > >
> > > I'm not sure I get the idea, but my feeling is that it doesn't
> > > conflict with the interrupt hardening here (or at least the same
> > > method is required e.g NO_AUTO_EN).
> > >
> > > Thanks
> >
> > Right.  It's not that it conflicts, it's that I was hoping that
> > since you are working on hardening you can take up fixing that.
> > Let me know whether you have the time. Thanks!
> 
> I can do that.
> 
> Thanks
> 
> >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > Jason Wang (9):
> > > > >   virtio-blk: validate num_queues during probe
> > > > >   virtio: add doc for validate() method
> > > > >   virtio-console: switch to use .validate()
> > > > >   virtio_console: validate max_nr_ports before trying to use it
> > > > >   virtio_config: introduce a new ready method
> > > > >   virtio_pci: harden MSI-X interrupts
> > > > >   virtio-pci: harden INTX interrupts
> > > > >   virtio_ring: fix typos in vring_desc_extra
> > > > >   virtio_ring: validate used buffer length
> > > > >
> > > > >  drivers/block/virtio_blk.c         |  3 +-
> > > > >  drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
> > > > >  drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
> > > > >  drivers/virtio/virtio_pci_common.h |  7 ++--
> > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> > > > >  drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
> > > > >  include/linux/virtio.h             |  1 +
> > > > >  include/linux/virtio_config.h      |  6 ++++
> > > > >  9 files changed, 118 insertions(+), 31 deletions(-)
> > > > >
> > > > > --
> > > > > 2.25.1
> > > >
> >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] More virtio hardening
  2021-10-12  5:44         ` Michael S. Tsirkin
@ 2021-10-12  6:11           ` Jason Wang
  2021-10-12  6:35             ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-10-12  6:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Tue, Oct 12, 2021 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Oct 12, 2021 at 10:43:57AM +0800, Jason Wang wrote:
> > On Mon, Oct 11, 2021 at 8:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> > > > On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > > > > Hi All:
> > > > > >
> > > > > > This series treis to do more hardening for virito.
> > > > > >
> > > > > > patch 1 validates the num_queues for virio-blk device.
> > > > > > patch 2-4 validates max_nr_ports for virito-console device.
> > > > > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > > > > interrupt handler is tiggered. If this makes sense we can do similar
> > > > > > things in other transport drivers.
> > > > > > patch 8-9 validate used ring length.
> > > > > >
> > > > > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > > > > >
> > > > > > Please review.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > So I poked at console at least, and I think I see
> > > > > an issue: if interrupt handler queues a work/bh,
> > > > > then it can still run while reset is in progress.
> > > >
> > > > Looks like a bug which is unrelated to the hardening?
> > >
> > > Won't preventing use after free be relevant?
> >
> > Oh right.
> >
> > > I frankly don't know what does hardening means then.
> > > > E.g the driver
> > > > should sync with work/bh before reset.
> > >
> > > No, there's no way to fix it ATM without extra locks and state which I
> > > think we should strive to avoid or make it generic, not per-driver,
> > > since sync before reset is useless, new interrupts will just arrive and
> > > queue more work. And a sync after reset is too late since driver will
> > > try to add buffers.
> >
> > Can we do something like
> >
> > 1) disable interrupt
> > 2) sync bh
> >
> > Or I guess this is somehow you meant in the following steps.
>
> So that would mean a new API to disable vq interrupts.
> reset will re-enable.
> E.g. virtqueue_cancel_cb_before_reset()?
>
> Then drivers can sync, then reset.
> This means maintaining more state though, which I don't like.
>
> An alternative is something like this:
>
> static void (*virtio_flush_device)(struct virtio_device *dev);
>
> void virtio_reset_device(struct virtio_device *dev, virtio_flush_device flush)
> {
>         might_sleep();
>         if (flush) {
>                 dev->config->disable_interrupts(dev);
>                 flush(dev);
>                 dev->config->reset(dev);
>                 dev->config->enable_interrupts(dev);

I wonder whether this is needed. As done in this series,
enable_interrupt should be done in virtio_device_ready().

Others should work.

>         } else {
>                 dev->config->reset(dev);
>         }
> }
>
> I have patches wrapping all reset calls in virtio_reset_device
> (without the flush parameter but that's easy to tweak).

Does it work if I post V2 and you post those patches on top?

>
>
> > >
> > > Maybe we can break device. Two issues with that
> > > - drivers might not be ready to handle add_buf failures
> > > - restore needs to unbreak then and we don't have a way to do that yet
> > >
> > > So .. careful reading of all device drivers and hoping we don't mess
> > > things up even more ... here we come.
> >
> > Yes.
>
> The biggest issue with this trick is drivers not handling add_buf
> errors, adding a failure path here risks creating memory leaks.
> OTOH with e.g. bounce buffers maybe it's possible for add buf to
> fail anyway?

I'm not sure I get this, a simple git grep told me at least the return
value of add_inbuf() were all checked.

Thanks

>
> > >
> > > > >
> > > > > I sent a patch to fix it for console removal specifically,
> > > > > but I suspect it's not enough e.g. freeze is still broken.
> > > > > And note this has been reported without any TDX things -
> > > > > it's not a malicious device issue, can be triggered just
> > > > > by module unload.
> > > > >
> > > > > I am vaguely thinking about new APIs to disable/enable callbacks.
> > > > > An alternative:
> > > > >
> > > > > 1. adding new remove_nocb/freeze_nocb calls
> > > > > 2. disabling/enabling interrupts automatically around these
> > > > > 3. gradually moving devices to using these
> > > > > 4. once/if all device move, removing the old callbacks
> > > > >
> > > > > the advantage here is that we'll be sure calls are always
> > > > > paired correctly.
> > > >
> > > > I'm not sure I get the idea, but my feeling is that it doesn't
> > > > conflict with the interrupt hardening here (or at least the same
> > > > method is required e.g NO_AUTO_EN).
> > > >
> > > > Thanks
> > >
> > > Right.  It's not that it conflicts, it's that I was hoping that
> > > since you are working on hardening you can take up fixing that.
> > > Let me know whether you have the time. Thanks!
> >
> > I can do that.
> >
> > Thanks
> >
> > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > Jason Wang (9):
> > > > > >   virtio-blk: validate num_queues during probe
> > > > > >   virtio: add doc for validate() method
> > > > > >   virtio-console: switch to use .validate()
> > > > > >   virtio_console: validate max_nr_ports before trying to use it
> > > > > >   virtio_config: introduce a new ready method
> > > > > >   virtio_pci: harden MSI-X interrupts
> > > > > >   virtio-pci: harden INTX interrupts
> > > > > >   virtio_ring: fix typos in vring_desc_extra
> > > > > >   virtio_ring: validate used buffer length
> > > > > >
> > > > > >  drivers/block/virtio_blk.c         |  3 +-
> > > > > >  drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
> > > > > >  drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
> > > > > >  drivers/virtio/virtio_pci_common.h |  7 ++--
> > > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> > > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> > > > > >  drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
> > > > > >  include/linux/virtio.h             |  1 +
> > > > > >  include/linux/virtio_config.h      |  6 ++++
> > > > > >  9 files changed, 118 insertions(+), 31 deletions(-)
> > > > > >
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] More virtio hardening
  2021-10-12  6:11           ` Jason Wang
@ 2021-10-12  6:35             ` Michael S. Tsirkin
  2021-10-12  6:43               ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-10-12  6:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Tue, Oct 12, 2021 at 02:11:10PM +0800, Jason Wang wrote:
> On Tue, Oct 12, 2021 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Oct 12, 2021 at 10:43:57AM +0800, Jason Wang wrote:
> > > On Mon, Oct 11, 2021 at 8:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> > > > > On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > > > > > Hi All:
> > > > > > >
> > > > > > > This series treis to do more hardening for virito.
> > > > > > >
> > > > > > > patch 1 validates the num_queues for virio-blk device.
> > > > > > > patch 2-4 validates max_nr_ports for virito-console device.
> > > > > > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > > > > > interrupt handler is tiggered. If this makes sense we can do similar
> > > > > > > things in other transport drivers.
> > > > > > > patch 8-9 validate used ring length.
> > > > > > >
> > > > > > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > > > > > >
> > > > > > > Please review.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > So I poked at console at least, and I think I see
> > > > > > an issue: if interrupt handler queues a work/bh,
> > > > > > then it can still run while reset is in progress.
> > > > >
> > > > > Looks like a bug which is unrelated to the hardening?
> > > >
> > > > Won't preventing use after free be relevant?
> > >
> > > Oh right.
> > >
> > > > I frankly don't know what does hardening means then.
> > > > > E.g the driver
> > > > > should sync with work/bh before reset.
> > > >
> > > > No, there's no way to fix it ATM without extra locks and state which I
> > > > think we should strive to avoid or make it generic, not per-driver,
> > > > since sync before reset is useless, new interrupts will just arrive and
> > > > queue more work. And a sync after reset is too late since driver will
> > > > try to add buffers.
> > >
> > > Can we do something like
> > >
> > > 1) disable interrupt
> > > 2) sync bh
> > >
> > > Or I guess this is somehow you meant in the following steps.
> >
> > So that would mean a new API to disable vq interrupts.
> > reset will re-enable.
> > E.g. virtqueue_cancel_cb_before_reset()?
> >
> > Then drivers can sync, then reset.
> > This means maintaining more state though, which I don't like.
> >
> > An alternative is something like this:
> >
> > static void (*virtio_flush_device)(struct virtio_device *dev);
> >
> > void virtio_reset_device(struct virtio_device *dev, virtio_flush_device flush)
> > {
> >         might_sleep();
> >         if (flush) {
> >                 dev->config->disable_interrupts(dev);
> >                 flush(dev);
> >                 dev->config->reset(dev);
> >                 dev->config->enable_interrupts(dev);
> 
> I wonder whether this is needed. As done in this series,
> enable_interrupt should be done in virtio_device_ready().
> 
> Others should work.
> 
> >         } else {
> >                 dev->config->reset(dev);
> >         }
> > }
> >
> > I have patches wrapping all reset calls in virtio_reset_device
> > (without the flush parameter but that's easy to tweak).
> 
> Does it work if I post V2 and you post those patches on top?

The reset things? Sure.

> >
> >
> > > >
> > > > Maybe we can break device. Two issues with that
> > > > - drivers might not be ready to handle add_buf failures
> > > > - restore needs to unbreak then and we don't have a way to do that yet
> > > >
> > > > So .. careful reading of all device drivers and hoping we don't mess
> > > > things up even more ... here we come.
> > >
> > > Yes.
> >
> > The biggest issue with this trick is drivers not handling add_buf
> > errors, adding a failure path here risks creating memory leaks.
> > OTOH with e.g. bounce buffers maybe it's possible for add buf to
> > fail anyway?
> 
> I'm not sure I get this, a simple git grep told me at least the return
> value of add_inbuf() were all checked.
> 
> Thanks

Checked locally, but not always error is handled all the way
to the top. E.g. add_port in console returns an error code
but that is never checked. Well, console is a mess generally.

> >
> > > >
> > > > > >
> > > > > > I sent a patch to fix it for console removal specifically,
> > > > > > but I suspect it's not enough e.g. freeze is still broken.
> > > > > > And note this has been reported without any TDX things -
> > > > > > it's not a malicious device issue, can be triggered just
> > > > > > by module unload.
> > > > > >
> > > > > > I am vaguely thinking about new APIs to disable/enable callbacks.
> > > > > > An alternative:
> > > > > >
> > > > > > 1. adding new remove_nocb/freeze_nocb calls
> > > > > > 2. disabling/enabling interrupts automatically around these
> > > > > > 3. gradually moving devices to using these
> > > > > > 4. once/if all device move, removing the old callbacks
> > > > > >
> > > > > > the advantage here is that we'll be sure calls are always
> > > > > > paired correctly.
> > > > >
> > > > > I'm not sure I get the idea, but my feeling is that it doesn't
> > > > > conflict with the interrupt hardening here (or at least the same
> > > > > method is required e.g NO_AUTO_EN).
> > > > >
> > > > > Thanks
> > > >
> > > > Right.  It's not that it conflicts, it's that I was hoping that
> > > > since you are working on hardening you can take up fixing that.
> > > > Let me know whether you have the time. Thanks!
> > >
> > > I can do that.
> > >
> > > Thanks
> > >
> > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Jason Wang (9):
> > > > > > >   virtio-blk: validate num_queues during probe
> > > > > > >   virtio: add doc for validate() method
> > > > > > >   virtio-console: switch to use .validate()
> > > > > > >   virtio_console: validate max_nr_ports before trying to use it
> > > > > > >   virtio_config: introduce a new ready method
> > > > > > >   virtio_pci: harden MSI-X interrupts
> > > > > > >   virtio-pci: harden INTX interrupts
> > > > > > >   virtio_ring: fix typos in vring_desc_extra
> > > > > > >   virtio_ring: validate used buffer length
> > > > > > >
> > > > > > >  drivers/block/virtio_blk.c         |  3 +-
> > > > > > >  drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
> > > > > > >  drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
> > > > > > >  drivers/virtio/virtio_pci_common.h |  7 ++--
> > > > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> > > > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> > > > > > >  drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
> > > > > > >  include/linux/virtio.h             |  1 +
> > > > > > >  include/linux/virtio_config.h      |  6 ++++
> > > > > > >  9 files changed, 118 insertions(+), 31 deletions(-)
> > > > > > >
> > > > > > > --
> > > > > > > 2.25.1
> > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] More virtio hardening
  2021-10-12  6:35             ` Michael S. Tsirkin
@ 2021-10-12  6:43               ` Jason Wang
  2021-10-12  7:03                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Wang @ 2021-10-12  6:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Tue, Oct 12, 2021 at 2:35 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Oct 12, 2021 at 02:11:10PM +0800, Jason Wang wrote:
> > On Tue, Oct 12, 2021 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Oct 12, 2021 at 10:43:57AM +0800, Jason Wang wrote:
> > > > On Mon, Oct 11, 2021 at 8:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> > > > > > On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > > > > > > Hi All:
> > > > > > > >
> > > > > > > > This series treis to do more hardening for virito.
> > > > > > > >
> > > > > > > > patch 1 validates the num_queues for virio-blk device.
> > > > > > > > patch 2-4 validates max_nr_ports for virito-console device.
> > > > > > > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > > > > > > interrupt handler is tiggered. If this makes sense we can do similar
> > > > > > > > things in other transport drivers.
> > > > > > > > patch 8-9 validate used ring length.
> > > > > > > >
> > > > > > > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > > > > > > >
> > > > > > > > Please review.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > So I poked at console at least, and I think I see
> > > > > > > an issue: if interrupt handler queues a work/bh,
> > > > > > > then it can still run while reset is in progress.
> > > > > >
> > > > > > Looks like a bug which is unrelated to the hardening?
> > > > >
> > > > > Won't preventing use after free be relevant?
> > > >
> > > > Oh right.
> > > >
> > > > > I frankly don't know what does hardening means then.
> > > > > > E.g the driver
> > > > > > should sync with work/bh before reset.
> > > > >
> > > > > No, there's no way to fix it ATM without extra locks and state which I
> > > > > think we should strive to avoid or make it generic, not per-driver,
> > > > > since sync before reset is useless, new interrupts will just arrive and
> > > > > queue more work. And a sync after reset is too late since driver will
> > > > > try to add buffers.
> > > >
> > > > Can we do something like
> > > >
> > > > 1) disable interrupt
> > > > 2) sync bh
> > > >
> > > > Or I guess this is somehow you meant in the following steps.
> > >
> > > So that would mean a new API to disable vq interrupts.
> > > reset will re-enable.
> > > E.g. virtqueue_cancel_cb_before_reset()?
> > >
> > > Then drivers can sync, then reset.
> > > This means maintaining more state though, which I don't like.
> > >
> > > An alternative is something like this:
> > >
> > > static void (*virtio_flush_device)(struct virtio_device *dev);
> > >
> > > void virtio_reset_device(struct virtio_device *dev, virtio_flush_device flush)
> > > {
> > >         might_sleep();
> > >         if (flush) {
> > >                 dev->config->disable_interrupts(dev);
> > >                 flush(dev);
> > >                 dev->config->reset(dev);
> > >                 dev->config->enable_interrupts(dev);
> >
> > I wonder whether this is needed. As done in this series,
> > enable_interrupt should be done in virtio_device_ready().
> >
> > Others should work.
> >
> > >         } else {
> > >                 dev->config->reset(dev);
> > >         }
> > > }
> > >
> > > I have patches wrapping all reset calls in virtio_reset_device
> > > (without the flush parameter but that's easy to tweak).
> >
> > Does it work if I post V2 and you post those patches on top?
>
> The reset things? Sure.

Ok.

>
> > >
> > >
> > > > >
> > > > > Maybe we can break device. Two issues with that
> > > > > - drivers might not be ready to handle add_buf failures
> > > > > - restore needs to unbreak then and we don't have a way to do that yet
> > > > >
> > > > > So .. careful reading of all device drivers and hoping we don't mess
> > > > > things up even more ... here we come.
> > > >
> > > > Yes.
> > >
> > > The biggest issue with this trick is drivers not handling add_buf
> > > errors, adding a failure path here risks creating memory leaks.
> > > OTOH with e.g. bounce buffers maybe it's possible for add buf to
> > > fail anyway?
> >
> > I'm not sure I get this, a simple git grep told me at least the return
> > value of add_inbuf() were all checked.
> >
> > Thanks
>
> Checked locally, but not always error is handled all the way
> to the top. E.g. add_port in console returns an error code
> but that is never checked. Well, console is a mess generally.

I see. I can try to audit all virtio drivers for the add_inbuf() case.

Thanks

>
> > >
> > > > >
> > > > > > >
> > > > > > > I sent a patch to fix it for console removal specifically,
> > > > > > > but I suspect it's not enough e.g. freeze is still broken.
> > > > > > > And note this has been reported without any TDX things -
> > > > > > > it's not a malicious device issue, can be triggered just
> > > > > > > by module unload.
> > > > > > >
> > > > > > > I am vaguely thinking about new APIs to disable/enable callbacks.
> > > > > > > An alternative:
> > > > > > >
> > > > > > > 1. adding new remove_nocb/freeze_nocb calls
> > > > > > > 2. disabling/enabling interrupts automatically around these
> > > > > > > 3. gradually moving devices to using these
> > > > > > > 4. once/if all device move, removing the old callbacks
> > > > > > >
> > > > > > > the advantage here is that we'll be sure calls are always
> > > > > > > paired correctly.
> > > > > >
> > > > > > I'm not sure I get the idea, but my feeling is that it doesn't
> > > > > > conflict with the interrupt hardening here (or at least the same
> > > > > > method is required e.g NO_AUTO_EN).
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Right.  It's not that it conflicts, it's that I was hoping that
> > > > > since you are working on hardening you can take up fixing that.
> > > > > Let me know whether you have the time. Thanks!
> > > >
> > > > I can do that.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Jason Wang (9):
> > > > > > > >   virtio-blk: validate num_queues during probe
> > > > > > > >   virtio: add doc for validate() method
> > > > > > > >   virtio-console: switch to use .validate()
> > > > > > > >   virtio_console: validate max_nr_ports before trying to use it
> > > > > > > >   virtio_config: introduce a new ready method
> > > > > > > >   virtio_pci: harden MSI-X interrupts
> > > > > > > >   virtio-pci: harden INTX interrupts
> > > > > > > >   virtio_ring: fix typos in vring_desc_extra
> > > > > > > >   virtio_ring: validate used buffer length
> > > > > > > >
> > > > > > > >  drivers/block/virtio_blk.c         |  3 +-
> > > > > > > >  drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
> > > > > > > >  drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
> > > > > > > >  drivers/virtio/virtio_pci_common.h |  7 ++--
> > > > > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> > > > > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> > > > > > > >  drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
> > > > > > > >  include/linux/virtio.h             |  1 +
> > > > > > > >  include/linux/virtio_config.h      |  6 ++++
> > > > > > > >  9 files changed, 118 insertions(+), 31 deletions(-)
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] More virtio hardening
  2021-10-12  6:43               ` Jason Wang
@ 2021-10-12  7:03                 ` Michael S. Tsirkin
  2021-10-12  8:46                   ` Jason Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2021-10-12  7:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Tue, Oct 12, 2021 at 02:43:13PM +0800, Jason Wang wrote:
> On Tue, Oct 12, 2021 at 2:35 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Oct 12, 2021 at 02:11:10PM +0800, Jason Wang wrote:
> > > On Tue, Oct 12, 2021 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Oct 12, 2021 at 10:43:57AM +0800, Jason Wang wrote:
> > > > > On Mon, Oct 11, 2021 at 8:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> > > > > > > On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > > > > > > > Hi All:
> > > > > > > > >
> > > > > > > > > This series treis to do more hardening for virito.
> > > > > > > > >
> > > > > > > > > patch 1 validates the num_queues for virio-blk device.
> > > > > > > > > patch 2-4 validates max_nr_ports for virito-console device.
> > > > > > > > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > > > > > > > interrupt handler is tiggered. If this makes sense we can do similar
> > > > > > > > > things in other transport drivers.
> > > > > > > > > patch 8-9 validate used ring length.
> > > > > > > > >
> > > > > > > > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > > > > > > > >
> > > > > > > > > Please review.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > So I poked at console at least, and I think I see
> > > > > > > > an issue: if interrupt handler queues a work/bh,
> > > > > > > > then it can still run while reset is in progress.
> > > > > > >
> > > > > > > Looks like a bug which is unrelated to the hardening?
> > > > > >
> > > > > > Won't preventing use after free be relevant?
> > > > >
> > > > > Oh right.
> > > > >
> > > > > > I frankly don't know what does hardening means then.
> > > > > > > E.g the driver
> > > > > > > should sync with work/bh before reset.
> > > > > >
> > > > > > No, there's no way to fix it ATM without extra locks and state which I
> > > > > > think we should strive to avoid or make it generic, not per-driver,
> > > > > > since sync before reset is useless, new interrupts will just arrive and
> > > > > > queue more work. And a sync after reset is too late since driver will
> > > > > > try to add buffers.
> > > > >
> > > > > Can we do something like
> > > > >
> > > > > 1) disable interrupt
> > > > > 2) sync bh
> > > > >
> > > > > Or I guess this is somehow you meant in the following steps.
> > > >
> > > > So that would mean a new API to disable vq interrupts.
> > > > reset will re-enable.
> > > > E.g. virtqueue_cancel_cb_before_reset()?
> > > >
> > > > Then drivers can sync, then reset.
> > > > This means maintaining more state though, which I don't like.
> > > >
> > > > An alternative is something like this:
> > > >
> > > > static void (*virtio_flush_device)(struct virtio_device *dev);
> > > >
> > > > void virtio_reset_device(struct virtio_device *dev, virtio_flush_device flush)
> > > > {
> > > >         might_sleep();
> > > >         if (flush) {
> > > >                 dev->config->disable_interrupts(dev);
> > > >                 flush(dev);
> > > >                 dev->config->reset(dev);
> > > >                 dev->config->enable_interrupts(dev);
> > >
> > > I wonder whether this is needed. As done in this series,
> > > enable_interrupt should be done in virtio_device_ready().
> > >
> > > Others should work.
> > >
> > > >         } else {
> > > >                 dev->config->reset(dev);
> > > >         }
> > > > }
> > > >
> > > > I have patches wrapping all reset calls in virtio_reset_device
> > > > (without the flush parameter but that's easy to tweak).
> > >
> > > Does it work if I post V2 and you post those patches on top?
> >
> > The reset things? Sure.
> 
> Ok.
> 
> >
> > > >
> > > >
> > > > > >
> > > > > > Maybe we can break device. Two issues with that
> > > > > > - drivers might not be ready to handle add_buf failures
> > > > > > - restore needs to unbreak then and we don't have a way to do that yet
> > > > > >
> > > > > > So .. careful reading of all device drivers and hoping we don't mess
> > > > > > things up even more ... here we come.
> > > > >
> > > > > Yes.
> > > >
> > > > The biggest issue with this trick is drivers not handling add_buf
> > > > errors, adding a failure path here risks creating memory leaks.
> > > > OTOH with e.g. bounce buffers maybe it's possible for add buf to
> > > > fail anyway?
> > >
> > > I'm not sure I get this, a simple git grep told me at least the return
> > > value of add_inbuf() were all checked.
> > >
> > > Thanks
> >
> > Checked locally, but not always error is handled all the way
> > to the top. E.g. add_port in console returns an error code
> > but that is never checked. Well, console is a mess generally.
> 
> I see. I can try to audit all virtio drivers for the add_inbuf() case.
> 
> Thanks

Why inbuf specifically?
I mean, re-reading code often finds bugs, sure ;)

But I don't think just to fix remove we need to audit them all
as such, as long as we are not modifying core, whatever
driver remove we are poking for, that driver needs to be
audited.


> >
> > > >
> > > > > >
> > > > > > > >
> > > > > > > > I sent a patch to fix it for console removal specifically,
> > > > > > > > but I suspect it's not enough e.g. freeze is still broken.
> > > > > > > > And note this has been reported without any TDX things -
> > > > > > > > it's not a malicious device issue, can be triggered just
> > > > > > > > by module unload.
> > > > > > > >
> > > > > > > > I am vaguely thinking about new APIs to disable/enable callbacks.
> > > > > > > > An alternative:
> > > > > > > >
> > > > > > > > 1. adding new remove_nocb/freeze_nocb calls
> > > > > > > > 2. disabling/enabling interrupts automatically around these
> > > > > > > > 3. gradually moving devices to using these
> > > > > > > > 4. once/if all device move, removing the old callbacks
> > > > > > > >
> > > > > > > > the advantage here is that we'll be sure calls are always
> > > > > > > > paired correctly.
> > > > > > >
> > > > > > > I'm not sure I get the idea, but my feeling is that it doesn't
> > > > > > > conflict with the interrupt hardening here (or at least the same
> > > > > > > method is required e.g NO_AUTO_EN).
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Right.  It's not that it conflicts, it's that I was hoping that
> > > > > > since you are working on hardening you can take up fixing that.
> > > > > > Let me know whether you have the time. Thanks!
> > > > >
> > > > > I can do that.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Jason Wang (9):
> > > > > > > > >   virtio-blk: validate num_queues during probe
> > > > > > > > >   virtio: add doc for validate() method
> > > > > > > > >   virtio-console: switch to use .validate()
> > > > > > > > >   virtio_console: validate max_nr_ports before trying to use it
> > > > > > > > >   virtio_config: introduce a new ready method
> > > > > > > > >   virtio_pci: harden MSI-X interrupts
> > > > > > > > >   virtio-pci: harden INTX interrupts
> > > > > > > > >   virtio_ring: fix typos in vring_desc_extra
> > > > > > > > >   virtio_ring: validate used buffer length
> > > > > > > > >
> > > > > > > > >  drivers/block/virtio_blk.c         |  3 +-
> > > > > > > > >  drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
> > > > > > > > >  drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
> > > > > > > > >  drivers/virtio/virtio_pci_common.h |  7 ++--
> > > > > > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> > > > > > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> > > > > > > > >  drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
> > > > > > > > >  include/linux/virtio.h             |  1 +
> > > > > > > > >  include/linux/virtio_config.h      |  6 ++++
> > > > > > > > >  9 files changed, 118 insertions(+), 31 deletions(-)
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] More virtio hardening
  2021-10-12  7:03                 ` Michael S. Tsirkin
@ 2021-10-12  8:46                   ` Jason Wang
  0 siblings, 0 replies; 55+ messages in thread
From: Jason Wang @ 2021-10-12  8:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Hetzelt, Felicitas, kaplan, david,
	Konrad Rzeszutek Wilk

On Tue, Oct 12, 2021 at 3:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Oct 12, 2021 at 02:43:13PM +0800, Jason Wang wrote:
> > On Tue, Oct 12, 2021 at 2:35 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Oct 12, 2021 at 02:11:10PM +0800, Jason Wang wrote:
> > > > On Tue, Oct 12, 2021 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Oct 12, 2021 at 10:43:57AM +0800, Jason Wang wrote:
> > > > > > On Mon, Oct 11, 2021 at 8:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> > > > > > > > On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > > > > > > > > Hi All:
> > > > > > > > > >
> > > > > > > > > > This series treis to do more hardening for virito.
> > > > > > > > > >
> > > > > > > > > > patch 1 validates the num_queues for virio-blk device.
> > > > > > > > > > patch 2-4 validates max_nr_ports for virito-console device.
> > > > > > > > > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > > > > > > > > interrupt handler is tiggered. If this makes sense we can do similar
> > > > > > > > > > things in other transport drivers.
> > > > > > > > > > patch 8-9 validate used ring length.
> > > > > > > > > >
> > > > > > > > > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > > > > > > > > >
> > > > > > > > > > Please review.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > So I poked at console at least, and I think I see
> > > > > > > > > an issue: if interrupt handler queues a work/bh,
> > > > > > > > > then it can still run while reset is in progress.
> > > > > > > >
> > > > > > > > Looks like a bug which is unrelated to the hardening?
> > > > > > >
> > > > > > > Won't preventing use after free be relevant?
> > > > > >
> > > > > > Oh right.
> > > > > >
> > > > > > > I frankly don't know what does hardening means then.
> > > > > > > > E.g the driver
> > > > > > > > should sync with work/bh before reset.
> > > > > > >
> > > > > > > No, there's no way to fix it ATM without extra locks and state which I
> > > > > > > think we should strive to avoid or make it generic, not per-driver,
> > > > > > > since sync before reset is useless, new interrupts will just arrive and
> > > > > > > queue more work. And a sync after reset is too late since driver will
> > > > > > > try to add buffers.
> > > > > >
> > > > > > Can we do something like
> > > > > >
> > > > > > 1) disable interrupt
> > > > > > 2) sync bh
> > > > > >
> > > > > > Or I guess this is somehow you meant in the following steps.
> > > > >
> > > > > So that would mean a new API to disable vq interrupts.
> > > > > reset will re-enable.
> > > > > E.g. virtqueue_cancel_cb_before_reset()?
> > > > >
> > > > > Then drivers can sync, then reset.
> > > > > This means maintaining more state though, which I don't like.
> > > > >
> > > > > An alternative is something like this:
> > > > >
> > > > > static void (*virtio_flush_device)(struct virtio_device *dev);
> > > > >
> > > > > void virtio_reset_device(struct virtio_device *dev, virtio_flush_device flush)
> > > > > {
> > > > >         might_sleep();
> > > > >         if (flush) {
> > > > >                 dev->config->disable_interrupts(dev);
> > > > >                 flush(dev);
> > > > >                 dev->config->reset(dev);
> > > > >                 dev->config->enable_interrupts(dev);
> > > >
> > > > I wonder whether this is needed. As done in this series,
> > > > enable_interrupt should be done in virtio_device_ready().
> > > >
> > > > Others should work.
> > > >
> > > > >         } else {
> > > > >                 dev->config->reset(dev);
> > > > >         }
> > > > > }
> > > > >
> > > > > I have patches wrapping all reset calls in virtio_reset_device
> > > > > (without the flush parameter but that's easy to tweak).
> > > >
> > > > Does it work if I post V2 and you post those patches on top?
> > >
> > > The reset things? Sure.
> >
> > Ok.
> >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > Maybe we can break device. Two issues with that
> > > > > > > - drivers might not be ready to handle add_buf failures
> > > > > > > - restore needs to unbreak then and we don't have a way to do that yet
> > > > > > >
> > > > > > > So .. careful reading of all device drivers and hoping we don't mess
> > > > > > > things up even more ... here we come.
> > > > > >
> > > > > > Yes.
> > > > >
> > > > > The biggest issue with this trick is drivers not handling add_buf
> > > > > errors, adding a failure path here risks creating memory leaks.
> > > > > OTOH with e.g. bounce buffers maybe it's possible for add buf to
> > > > > fail anyway?
> > > >
> > > > I'm not sure I get this, a simple git grep told me at least the return
> > > > value of add_inbuf() were all checked.
> > > >
> > > > Thanks
> > >
> > > Checked locally, but not always error is handled all the way
> > > to the top. E.g. add_port in console returns an error code
> > > but that is never checked. Well, console is a mess generally.
> >
> > I see. I can try to audit all virtio drivers for the add_inbuf() case.
> >
> > Thanks
>
> Why inbuf specifically?

Typo :(

> I mean, re-reading code often finds bugs, sure ;)

Yes.

>
> But I don't think just to fix remove we need to audit them all
> as such, as long as we are not modifying core, whatever
> driver remove we are poking for, that driver needs to be
> audited.

Right.

Thanks

>
>
> > >
> > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > I sent a patch to fix it for console removal specifically,
> > > > > > > > > but I suspect it's not enough e.g. freeze is still broken.
> > > > > > > > > And note this has been reported without any TDX things -
> > > > > > > > > it's not a malicious device issue, can be triggered just
> > > > > > > > > by module unload.
> > > > > > > > >
> > > > > > > > > I am vaguely thinking about new APIs to disable/enable callbacks.
> > > > > > > > > An alternative:
> > > > > > > > >
> > > > > > > > > 1. adding new remove_nocb/freeze_nocb calls
> > > > > > > > > 2. disabling/enabling interrupts automatically around these
> > > > > > > > > 3. gradually moving devices to using these
> > > > > > > > > 4. once/if all device move, removing the old callbacks
> > > > > > > > >
> > > > > > > > > the advantage here is that we'll be sure calls are always
> > > > > > > > > paired correctly.
> > > > > > > >
> > > > > > > > I'm not sure I get the idea, but my feeling is that it doesn't
> > > > > > > > conflict with the interrupt hardening here (or at least the same
> > > > > > > > method is required e.g NO_AUTO_EN).
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > Right.  It's not that it conflicts, it's that I was hoping that
> > > > > > > since you are working on hardening you can take up fixing that.
> > > > > > > Let me know whether you have the time. Thanks!
> > > > > >
> > > > > > I can do that.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Jason Wang (9):
> > > > > > > > > >   virtio-blk: validate num_queues during probe
> > > > > > > > > >   virtio: add doc for validate() method
> > > > > > > > > >   virtio-console: switch to use .validate()
> > > > > > > > > >   virtio_console: validate max_nr_ports before trying to use it
> > > > > > > > > >   virtio_config: introduce a new ready method
> > > > > > > > > >   virtio_pci: harden MSI-X interrupts
> > > > > > > > > >   virtio-pci: harden INTX interrupts
> > > > > > > > > >   virtio_ring: fix typos in vring_desc_extra
> > > > > > > > > >   virtio_ring: validate used buffer length
> > > > > > > > > >
> > > > > > > > > >  drivers/block/virtio_blk.c         |  3 +-
> > > > > > > > > >  drivers/char/virtio_console.c      | 51 +++++++++++++++++++++---------
> > > > > > > > > >  drivers/virtio/virtio_pci_common.c | 43 +++++++++++++++++++++----
> > > > > > > > > >  drivers/virtio/virtio_pci_common.h |  7 ++--
> > > > > > > > > >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> > > > > > > > > >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> > > > > > > > > >  drivers/virtio/virtio_ring.c       | 27 ++++++++++++++--
> > > > > > > > > >  include/linux/virtio.h             |  1 +
> > > > > > > > > >  include/linux/virtio_config.h      |  6 ++++
> > > > > > > > > >  9 files changed, 118 insertions(+), 31 deletions(-)
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.25.1
> > > > > > > > >
> > > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2021-10-12  8:46 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-13  5:53 [PATCH 0/9] More virtio hardening Jason Wang
2021-09-13  5:53 ` [PATCH 1/9] virtio-blk: validate num_queues during probe Jason Wang
2021-09-13  7:48   ` Stefano Garzarella
2021-09-14  2:29     ` Jason Wang
2021-09-13 12:05   ` Stefan Hajnoczi
2021-09-13  5:53 ` [PATCH 2/9] virtio: add doc for validate() method Jason Wang
2021-09-13  5:53 ` [PATCH 3/9] virtio-console: switch to use .validate() Jason Wang
2021-09-13  5:53 ` [PATCH 4/9] virtio_console: validate max_nr_ports before trying to use it Jason Wang
2021-09-13  5:53 ` [PATCH 5/9] virtio_config: introduce a new ready method Jason Wang
2021-09-13  5:53 ` [PATCH 6/9] virtio_pci: harden MSI-X interrupts Jason Wang
2021-09-13  6:03   ` Michael S. Tsirkin
2021-09-13  6:08     ` Jason Wang
2021-09-13  6:28       ` Michael S. Tsirkin
2021-09-13  6:34         ` Jason Wang
2021-09-13  6:37           ` Michael S. Tsirkin
2021-09-13  6:43             ` Jason Wang
2021-09-13  7:01               ` Michael S. Tsirkin
2021-09-13  7:15                 ` Jason Wang
2021-09-13  6:50             ` Michael S. Tsirkin
2021-09-13  7:07               ` Jason Wang
2021-09-13 19:38                 ` Thomas Gleixner
2021-09-13 20:54                   ` Michael S. Tsirkin
2021-09-13 22:31                     ` Thomas Gleixner
2021-09-14  2:20                       ` Jason Wang
2021-09-14  8:29                     ` Thomas Gleixner
2021-09-13  5:53 ` [PATCH 7/9] virtio-pci: harden INTX interrupts Jason Wang
2021-09-13  6:33   ` Michael S. Tsirkin
2021-09-13  6:36     ` Jason Wang
2021-09-13  6:41       ` Michael S. Tsirkin
2021-09-13  6:45         ` Jason Wang
2021-09-13  7:02           ` Michael S. Tsirkin
2021-09-13  7:17             ` Jason Wang
2021-09-13 21:36   ` Thomas Gleixner
2021-09-13 22:01     ` Michael S. Tsirkin
2021-09-13 22:20       ` Thomas Gleixner
2021-09-14  2:50     ` Jason Wang
2021-09-14  9:34     ` Boqun Feng
2021-09-14 11:03     ` Peter Zijlstra
2021-09-14 11:09       ` Thomas Gleixner
2021-09-13  5:53 ` [PATCH 8/9] virtio_ring: fix typos in vring_desc_extra Jason Wang
2021-09-13  5:53 ` [PATCH 9/9] virtio_ring: validate used buffer length Jason Wang
2021-09-13  6:36   ` Michael S. Tsirkin
2021-09-13  6:40     ` Jason Wang
2021-09-13  6:57       ` Michael S. Tsirkin
2021-09-13  7:13         ` Jason Wang
2021-10-05  7:42 ` [PATCH 0/9] More virtio hardening Michael S. Tsirkin
2021-10-11  7:36   ` Jason Wang
2021-10-11 12:36     ` Michael S. Tsirkin
2021-10-12  2:43       ` Jason Wang
2021-10-12  5:44         ` Michael S. Tsirkin
2021-10-12  6:11           ` Jason Wang
2021-10-12  6:35             ` Michael S. Tsirkin
2021-10-12  6:43               ` Jason Wang
2021-10-12  7:03                 ` Michael S. Tsirkin
2021-10-12  8:46                   ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).