LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC v5 0/5] virtio: support packed ring
@ 2018-05-22  8:16 Tiwei Bie
  2018-05-22  8:16 ` [RFC v5 1/5] virtio: add packed ring definitions Tiwei Bie
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Tiwei Bie @ 2018-05-22  8:16 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie

Hello everyone,

This RFC implements packed ring support in virtio driver.

Some simple functional tests have been done with Jason's
packed ring implementation in vhost (RFC v4):

https://lkml.org/lkml/2018/5/16/501

Both of ping and netperf worked as expected w/ EVENT_IDX
disabled. Ping worked as expected w/ EVENT_IDX enabled,
but netperf didn't (A hack has been added in the driver
to make netperf test pass in this case. The hack can be
found by `grep -rw XXX` in the code).

TODO:
- Refinements (for code and commit log);
- More tests;
- Bug fixes;

RFC v4 -> RFC v5:
- Save DMA addr, etc in desc state (Jason);
- Track used wrap counter;

RFC v3 -> RFC v4:
- Make ID allocation support out-of-order (Jason);
- Various fixes for EVENT_IDX support;

RFC v2 -> RFC v3:
- Split into small patches (Jason);
- Add helper virtqueue_use_indirect() (Jason);
- Just set id for the last descriptor of a list (Jason);
- Calculate the prev in virtqueue_add_packed() (Jason);
- Fix/improve desc suppression code (Jason/MST);
- Refine the code layout for XXX_split/packed and wrappers (MST);
- Fix the comments and API in uapi (MST);
- Remove the BUG_ON() for indirect (Jason);
- Some other refinements and bug fixes;

RFC v1 -> RFC v2:
- Add indirect descriptor support - compile test only;
- Add event suppression supprt - compile test only;
- Move vring_packed_init() out of uapi (Jason, MST);
- Merge two loops into one in virtqueue_add_packed() (Jason);
- Split vring_unmap_one() for packed ring and split ring (Jason);
- Avoid using '%' operator (Jason);
- Rename free_head -> next_avail_idx (Jason);
- Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
- Some other refinements and bug fixes;

Thanks!

Tiwei Bie (5):
  virtio: add packed ring definitions
  virtio_ring: support creating packed ring
  virtio_ring: add packed ring support
  virtio_ring: add event idx support in packed ring
  virtio_ring: enable packed ring

 drivers/virtio/virtio_ring.c       | 1354 ++++++++++++++++++++++------
 include/linux/virtio_ring.h        |    8 +-
 include/uapi/linux/virtio_config.h |    5 +-
 include/uapi/linux/virtio_ring.h   |   36 +
 4 files changed, 1125 insertions(+), 278 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v5 1/5] virtio: add packed ring definitions
  2018-05-22  8:16 [RFC v5 0/5] virtio: support packed ring Tiwei Bie
@ 2018-05-22  8:16 ` Tiwei Bie
  2018-05-22  8:16 ` [RFC v5 2/5] virtio_ring: support creating packed ring Tiwei Bie
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Tiwei Bie @ 2018-05-22  8:16 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 include/uapi/linux/virtio_config.h |  5 ++++-
 include/uapi/linux/virtio_ring.h   | 36 ++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 308e2096291f..932a6ecc8e46 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
  * transport being used (eg. virtio_ring), the rest are per-device feature
  * bits. */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		34
+#define VIRTIO_TRANSPORT_F_END		35
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,7 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM		33
+
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED		34
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..3932cb80c347 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,9 @@
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT	4
 
+#define VRING_DESC_F_AVAIL(b)	((b) << 7)
+#define VRING_DESC_F_USED(b)	((b) << 15)
+
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
  * will still kick if it's out of buffers. */
@@ -53,6 +56,10 @@
  * optimization.  */
 #define VRING_AVAIL_F_NO_INTERRUPT	1
 
+#define VRING_EVENT_F_ENABLE	0x0
+#define VRING_EVENT_F_DISABLE	0x1
+#define VRING_EVENT_F_DESC	0x2
+
 /* We support indirect buffer descriptors */
 #define VIRTIO_RING_F_INDIRECT_DESC	28
 
@@ -171,4 +178,33 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
 	return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
 }
 
+struct vring_packed_desc_event {
+	/* __virtio16 off  : 15; // Descriptor Event Offset
+	 * __virtio16 wrap : 1;  // Descriptor Event Wrap Counter */
+	__virtio16 off_wrap;
+	/* __virtio16 flags : 2; // Descriptor Event Flags */
+	__virtio16 flags;
+};
+
+struct vring_packed_desc {
+	/* Buffer Address. */
+	__virtio64 addr;
+	/* Buffer Length. */
+	__virtio32 len;
+	/* Buffer ID. */
+	__virtio16 id;
+	/* The flags depending on descriptor type. */
+	__virtio16 flags;
+};
+
+struct vring_packed {
+	unsigned int num;
+
+	struct vring_packed_desc *desc;
+
+	struct vring_packed_desc_event *driver;
+
+	struct vring_packed_desc_event *device;
+};
+
 #endif /* _UAPI_LINUX_VIRTIO_RING_H */
-- 
2.17.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v5 2/5] virtio_ring: support creating packed ring
  2018-05-22  8:16 [RFC v5 0/5] virtio: support packed ring Tiwei Bie
  2018-05-22  8:16 ` [RFC v5 1/5] virtio: add packed ring definitions Tiwei Bie
@ 2018-05-22  8:16 ` Tiwei Bie
  2018-05-29  2:49   ` Jason Wang
  2018-05-22  8:16 ` [RFC v5 3/5] virtio_ring: add packed ring support Tiwei Bie
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Tiwei Bie @ 2018-05-22  8:16 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie

This commit introduces the support for creating packed ring.
All split ring specific functions are added _split suffix.
Some necessary stubs for packed ring are also added.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/virtio/virtio_ring.c | 801 +++++++++++++++++++++++------------
 include/linux/virtio_ring.h  |   8 +-
 2 files changed, 546 insertions(+), 263 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71458f493cf8..f5ef5f42a7cf 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -61,11 +61,15 @@ struct vring_desc_state {
 	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
 };
 
+struct vring_desc_state_packed {
+	int next;			/* The next desc state. */
+};
+
 struct vring_virtqueue {
 	struct virtqueue vq;
 
-	/* Actual memory layout for this queue */
-	struct vring vring;
+	/* Is this a packed ring? */
+	bool packed;
 
 	/* Can we use weak barriers? */
 	bool weak_barriers;
@@ -87,11 +91,39 @@ struct vring_virtqueue {
 	/* Last used index we've seen. */
 	u16 last_used_idx;
 
-	/* Last written value to avail->flags */
-	u16 avail_flags_shadow;
+	union {
+		/* Available for split ring */
+		struct {
+			/* Actual memory layout for this queue. */
+			struct vring vring;
 
-	/* Last written value to avail->idx in guest byte order */
-	u16 avail_idx_shadow;
+			/* Last written value to avail->flags */
+			u16 avail_flags_shadow;
+
+			/* Last written value to avail->idx in
+			 * guest byte order. */
+			u16 avail_idx_shadow;
+		};
+
+		/* Available for packed ring */
+		struct {
+			/* Actual memory layout for this queue. */
+			struct vring_packed vring_packed;
+
+			/* Driver ring wrap counter. */
+			u8 avail_wrap_counter;
+
+			/* Device ring wrap counter. */
+			u8 used_wrap_counter;
+
+			/* Index of the next avail descriptor. */
+			u16 next_avail_idx;
+
+			/* Last written value to driver->flags in
+			 * guest byte order. */
+			u16 event_flags_shadow;
+		};
+	};
 
 	/* How to notify other side. FIXME: commonalize hcalls! */
 	bool (*notify)(struct virtqueue *vq);
@@ -111,11 +143,24 @@ struct vring_virtqueue {
 #endif
 
 	/* Per-descriptor state. */
-	struct vring_desc_state desc_state[];
+	union {
+		struct vring_desc_state desc_state[1];
+		struct vring_desc_state_packed desc_state_packed[1];
+	};
 };
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
+					  unsigned int total_sg)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	/* If the host supports indirect descriptor tables, and we have multiple
+	 * buffers, then go indirect. FIXME: tune this threshold */
+	return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
 /*
  * Modern virtio devices have feature bits to specify whether they need a
  * quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -201,8 +246,17 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
 			      cpu_addr, size, direction);
 }
 
-static void vring_unmap_one(const struct vring_virtqueue *vq,
-			    struct vring_desc *desc)
+static int vring_mapping_error(const struct vring_virtqueue *vq,
+			       dma_addr_t addr)
+{
+	if (!vring_use_dma_api(vq->vq.vdev))
+		return 0;
+
+	return dma_mapping_error(vring_dma_dev(vq), addr);
+}
+
+static void vring_unmap_one_split(const struct vring_virtqueue *vq,
+				  struct vring_desc *desc)
 {
 	u16 flags;
 
@@ -226,17 +280,9 @@ static void vring_unmap_one(const struct vring_virtqueue *vq,
 	}
 }
 
-static int vring_mapping_error(const struct vring_virtqueue *vq,
-			       dma_addr_t addr)
-{
-	if (!vring_use_dma_api(vq->vq.vdev))
-		return 0;
-
-	return dma_mapping_error(vring_dma_dev(vq), addr);
-}
-
-static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
-					 unsigned int total_sg, gfp_t gfp)
+static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+					       unsigned int total_sg,
+					       gfp_t gfp)
 {
 	struct vring_desc *desc;
 	unsigned int i;
@@ -257,14 +303,14 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
 	return desc;
 }
 
-static inline int virtqueue_add(struct virtqueue *_vq,
-				struct scatterlist *sgs[],
-				unsigned int total_sg,
-				unsigned int out_sgs,
-				unsigned int in_sgs,
-				void *data,
-				void *ctx,
-				gfp_t gfp)
+static inline int virtqueue_add_split(struct virtqueue *_vq,
+				      struct scatterlist *sgs[],
+				      unsigned int total_sg,
+				      unsigned int out_sgs,
+				      unsigned int in_sgs,
+				      void *data,
+				      void *ctx,
+				      gfp_t gfp)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	struct scatterlist *sg;
@@ -300,10 +346,8 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 
 	head = vq->free_head;
 
-	/* If the host supports indirect descriptor tables, and we have multiple
-	 * buffers, then go indirect. FIXME: tune this threshold */
-	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
-		desc = alloc_indirect(_vq, total_sg, gfp);
+	if (virtqueue_use_indirect(_vq, total_sg))
+		desc = alloc_indirect_split(_vq, total_sg, gfp);
 	else {
 		desc = NULL;
 		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
@@ -424,7 +468,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
 			break;
-		vring_unmap_one(vq, &desc[i]);
+		vring_unmap_one_split(vq, &desc[i]);
 		i = virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next);
 	}
 
@@ -435,6 +479,355 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	return -EIO;
 }
 
+static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	u16 new, old;
+	bool needs_kick;
+
+	START_USE(vq);
+	/* We need to expose available array entries before checking avail
+	 * event. */
+	virtio_mb(vq->weak_barriers);
+
+	old = vq->avail_idx_shadow - vq->num_added;
+	new = vq->avail_idx_shadow;
+	vq->num_added = 0;
+
+#ifdef DEBUG
+	if (vq->last_add_time_valid) {
+		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
+					      vq->last_add_time)) > 100);
+	}
+	vq->last_add_time_valid = false;
+#endif
+
+	if (vq->event) {
+		needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, vring_avail_event(&vq->vring)),
+					      new, old);
+	} else {
+		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
+	}
+	END_USE(vq);
+	return needs_kick;
+}
+
+static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
+			     void **ctx)
+{
+	unsigned int i, j;
+	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+
+	/* Clear data ptr. */
+	vq->desc_state[head].data = NULL;
+
+	/* Put back on free list: unmap first-level descriptors and find end */
+	i = head;
+
+	while (vq->vring.desc[i].flags & nextflag) {
+		vring_unmap_one_split(vq, &vq->vring.desc[i]);
+		i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
+		vq->vq.num_free++;
+	}
+
+	vring_unmap_one_split(vq, &vq->vring.desc[i]);
+	vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
+	vq->free_head = head;
+
+	/* Plus final descriptor */
+	vq->vq.num_free++;
+
+	if (vq->indirect) {
+		struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
+		u32 len;
+
+		/* Free the indirect table, if any, now that it's unmapped. */
+		if (!indir_desc)
+			return;
+
+		len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
+
+		BUG_ON(!(vq->vring.desc[head].flags &
+			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
+		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
+
+		for (j = 0; j < len / sizeof(struct vring_desc); j++)
+			vring_unmap_one_split(vq, &indir_desc[j]);
+
+		kfree(indir_desc);
+		vq->desc_state[head].indir_desc = NULL;
+	} else if (ctx) {
+		*ctx = vq->desc_state[head].indir_desc;
+	}
+}
+
+static inline bool more_used_split(const struct vring_virtqueue *vq)
+{
+	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
+}
+
+static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
+					 unsigned int *len,
+					 void **ctx)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	void *ret;
+	unsigned int i;
+	u16 last_used;
+
+	START_USE(vq);
+
+	if (unlikely(vq->broken)) {
+		END_USE(vq);
+		return NULL;
+	}
+
+	if (!more_used_split(vq)) {
+		pr_debug("No more buffers in queue\n");
+		END_USE(vq);
+		return NULL;
+	}
+
+	/* Only get used array entries after they have been exposed by host. */
+	virtio_rmb(vq->weak_barriers);
+
+	last_used = (vq->last_used_idx & (vq->vring.num - 1));
+	i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
+	*len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
+
+	if (unlikely(i >= vq->vring.num)) {
+		BAD_RING(vq, "id %u out of range\n", i);
+		return NULL;
+	}
+	if (unlikely(!vq->desc_state[i].data)) {
+		BAD_RING(vq, "id %u is not a head!\n", i);
+		return NULL;
+	}
+
+	/* detach_buf_split clears data, so grab it now. */
+	ret = vq->desc_state[i].data;
+	detach_buf_split(vq, i, ctx);
+	vq->last_used_idx++;
+	/* If we expect an interrupt for the next entry, tell host
+	 * by writing event index and flush out the write before
+	 * the read in the next get_buf call. */
+	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
+		virtio_store_mb(vq->weak_barriers,
+				&vring_used_event(&vq->vring),
+				cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
+
+#ifdef DEBUG
+	vq->last_add_time_valid = false;
+#endif
+
+	END_USE(vq);
+	return ret;
+}
+
+static void virtqueue_disable_cb_split(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
+		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+		if (!vq->event)
+			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
+	}
+}
+
+static unsigned virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	u16 last_used_idx;
+
+	START_USE(vq);
+
+	/* We optimistically turn back on interrupts, then check if there was
+	 * more to do. */
+	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
+	 * either clear the flags bit or point the event index at the next
+	 * entry. Always do both to keep code simple. */
+	if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
+		vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
+		if (!vq->event)
+			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
+	}
+	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
+	END_USE(vq);
+	return last_used_idx;
+}
+
+static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned last_used_idx)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	virtio_mb(vq->weak_barriers);
+	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
+}
+
+static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	u16 bufs;
+
+	START_USE(vq);
+
+	/* We optimistically turn back on interrupts, then check if there was
+	 * more to do. */
+	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+	 * either clear the flags bit or point the event index at the next
+	 * entry. Always update the event index to keep code simple. */
+	if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
+		vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
+		if (!vq->event)
+			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
+	}
+	/* TODO: tune this threshold */
+	bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
+
+	virtio_store_mb(vq->weak_barriers,
+			&vring_used_event(&vq->vring),
+			cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
+
+	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
+		END_USE(vq);
+		return false;
+	}
+
+	END_USE(vq);
+	return true;
+}
+
+static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	unsigned int i;
+	void *buf;
+
+	START_USE(vq);
+
+	for (i = 0; i < vq->vring.num; i++) {
+		if (!vq->desc_state[i].data)
+			continue;
+		/* detach_buf clears data, so grab it now. */
+		buf = vq->desc_state[i].data;
+		detach_buf_split(vq, i, NULL);
+		vq->avail_idx_shadow--;
+		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
+		END_USE(vq);
+		return buf;
+	}
+	/* That should have freed everything. */
+	BUG_ON(vq->vq.num_free != vq->vring.num);
+
+	END_USE(vq);
+	return NULL;
+}
+
+/*
+ * The layout for the packed ring is a continuous chunk of memory
+ * which looks like this.
+ *
+ * struct vring_packed {
+ *	// The actual descriptors (16 bytes each)
+ *	struct vring_packed_desc desc[num];
+ *
+ *	// Padding to the next align boundary.
+ *	char pad[];
+ *
+ *	// Driver Event Suppression
+ *	struct vring_packed_desc_event driver;
+ *
+ *	// Device Event Suppression
+ *	struct vring_packed_desc_event device;
+ * };
+ */
+static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
+				     void *p, unsigned long align)
+{
+	vr->num = num;
+	vr->desc = p;
+	vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
+		* num + align - 1) & ~(align - 1));
+	vr->device = vr->driver + 1;
+}
+
+static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
+{
+	return ((sizeof(struct vring_packed_desc) * num + align - 1)
+		& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
+}
+
+static inline int virtqueue_add_packed(struct virtqueue *_vq,
+				       struct scatterlist *sgs[],
+				       unsigned int total_sg,
+				       unsigned int out_sgs,
+				       unsigned int in_sgs,
+				       void *data,
+				       void *ctx,
+				       gfp_t gfp)
+{
+	return -EIO;
+}
+
+static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
+{
+	return false;
+}
+
+static inline bool more_used_packed(const struct vring_virtqueue *vq)
+{
+	return false;
+}
+
+static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
+					  unsigned int *len,
+					  void **ctx)
+{
+	return NULL;
+}
+
+static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
+{
+}
+
+static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
+{
+	return 0;
+}
+
+static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
+{
+	return false;
+}
+
+static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
+{
+	return false;
+}
+
+static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
+{
+	return NULL;
+}
+
+static inline int virtqueue_add(struct virtqueue *_vq,
+				struct scatterlist *sgs[],
+				unsigned int total_sg,
+				unsigned int out_sgs,
+				unsigned int in_sgs,
+				void *data,
+				void *ctx,
+				gfp_t gfp)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
+						 in_sgs, data, ctx, gfp) :
+			    virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
+						in_sgs, data, ctx, gfp);
+}
+
 /**
  * virtqueue_add_sgs - expose buffers to other end
  * @vq: the struct virtqueue we're talking about.
@@ -551,34 +944,9 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 bool virtqueue_kick_prepare(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	u16 new, old;
-	bool needs_kick;
 
-	START_USE(vq);
-	/* We need to expose available array entries before checking avail
-	 * event. */
-	virtio_mb(vq->weak_barriers);
-
-	old = vq->avail_idx_shadow - vq->num_added;
-	new = vq->avail_idx_shadow;
-	vq->num_added = 0;
-
-#ifdef DEBUG
-	if (vq->last_add_time_valid) {
-		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
-					      vq->last_add_time)) > 100);
-	}
-	vq->last_add_time_valid = false;
-#endif
-
-	if (vq->event) {
-		needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, vring_avail_event(&vq->vring)),
-					      new, old);
-	} else {
-		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
-	}
-	END_USE(vq);
-	return needs_kick;
+	return vq->packed ? virtqueue_kick_prepare_packed(_vq) :
+			    virtqueue_kick_prepare_split(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
 
@@ -626,58 +994,9 @@ bool virtqueue_kick(struct virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick);
 
-static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
-		       void **ctx)
-{
-	unsigned int i, j;
-	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
-
-	/* Clear data ptr. */
-	vq->desc_state[head].data = NULL;
-
-	/* Put back on free list: unmap first-level descriptors and find end */
-	i = head;
-
-	while (vq->vring.desc[i].flags & nextflag) {
-		vring_unmap_one(vq, &vq->vring.desc[i]);
-		i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
-		vq->vq.num_free++;
-	}
-
-	vring_unmap_one(vq, &vq->vring.desc[i]);
-	vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
-	vq->free_head = head;
-
-	/* Plus final descriptor */
-	vq->vq.num_free++;
-
-	if (vq->indirect) {
-		struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
-		u32 len;
-
-		/* Free the indirect table, if any, now that it's unmapped. */
-		if (!indir_desc)
-			return;
-
-		len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
-
-		BUG_ON(!(vq->vring.desc[head].flags &
-			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
-		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
-
-		for (j = 0; j < len / sizeof(struct vring_desc); j++)
-			vring_unmap_one(vq, &indir_desc[j]);
-
-		kfree(indir_desc);
-		vq->desc_state[head].indir_desc = NULL;
-	} else if (ctx) {
-		*ctx = vq->desc_state[head].indir_desc;
-	}
-}
-
 static inline bool more_used(const struct vring_virtqueue *vq)
 {
-	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
+	return vq->packed ? more_used_packed(vq) : more_used_split(vq);
 }
 
 /**
@@ -700,57 +1019,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 			    void **ctx)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	void *ret;
-	unsigned int i;
-	u16 last_used;
 
-	START_USE(vq);
-
-	if (unlikely(vq->broken)) {
-		END_USE(vq);
-		return NULL;
-	}
-
-	if (!more_used(vq)) {
-		pr_debug("No more buffers in queue\n");
-		END_USE(vq);
-		return NULL;
-	}
-
-	/* Only get used array entries after they have been exposed by host. */
-	virtio_rmb(vq->weak_barriers);
-
-	last_used = (vq->last_used_idx & (vq->vring.num - 1));
-	i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
-	*len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
-
-	if (unlikely(i >= vq->vring.num)) {
-		BAD_RING(vq, "id %u out of range\n", i);
-		return NULL;
-	}
-	if (unlikely(!vq->desc_state[i].data)) {
-		BAD_RING(vq, "id %u is not a head!\n", i);
-		return NULL;
-	}
-
-	/* detach_buf clears data, so grab it now. */
-	ret = vq->desc_state[i].data;
-	detach_buf(vq, i, ctx);
-	vq->last_used_idx++;
-	/* If we expect an interrupt for the next entry, tell host
-	 * by writing event index and flush out the write before
-	 * the read in the next get_buf call. */
-	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
-		virtio_store_mb(vq->weak_barriers,
-				&vring_used_event(&vq->vring),
-				cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
-
-#ifdef DEBUG
-	vq->last_add_time_valid = false;
-#endif
-
-	END_USE(vq);
-	return ret;
+	return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
+			    virtqueue_get_buf_ctx_split(_vq, len, ctx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
 
@@ -772,12 +1043,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
-		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
-			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
-	}
-
+	if (vq->packed)
+		virtqueue_disable_cb_packed(_vq);
+	else
+		virtqueue_disable_cb_split(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
 
@@ -796,23 +1065,9 @@ EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	u16 last_used_idx;
 
-	START_USE(vq);
-
-	/* We optimistically turn back on interrupts, then check if there was
-	 * more to do. */
-	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
-	 * either clear the flags bit or point the event index at the next
-	 * entry. Always do both to keep code simple. */
-	if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
-		vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
-			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
-	}
-	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
-	END_USE(vq);
-	return last_used_idx;
+	return vq->packed ? virtqueue_enable_cb_prepare_packed(_vq) :
+			    virtqueue_enable_cb_prepare_split(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
 
@@ -829,8 +1084,8 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	virtio_mb(vq->weak_barriers);
-	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
+	return vq->packed ? virtqueue_poll_packed(_vq, last_used_idx) :
+			    virtqueue_poll_split(_vq, last_used_idx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_poll);
 
@@ -868,34 +1123,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
 bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	u16 bufs;
 
-	START_USE(vq);
-
-	/* We optimistically turn back on interrupts, then check if there was
-	 * more to do. */
-	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
-	 * either clear the flags bit or point the event index at the next
-	 * entry. Always update the event index to keep code simple. */
-	if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
-		vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
-			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
-	}
-	/* TODO: tune this threshold */
-	bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
-
-	virtio_store_mb(vq->weak_barriers,
-			&vring_used_event(&vq->vring),
-			cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
-
-	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
-		END_USE(vq);
-		return false;
-	}
-
-	END_USE(vq);
-	return true;
+	return vq->packed ? virtqueue_enable_cb_delayed_packed(_vq) :
+			    virtqueue_enable_cb_delayed_split(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
 
@@ -910,27 +1140,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
 void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	unsigned int i;
-	void *buf;
 
-	START_USE(vq);
-
-	for (i = 0; i < vq->vring.num; i++) {
-		if (!vq->desc_state[i].data)
-			continue;
-		/* detach_buf clears data, so grab it now. */
-		buf = vq->desc_state[i].data;
-		detach_buf(vq, i, NULL);
-		vq->avail_idx_shadow--;
-		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
-		END_USE(vq);
-		return buf;
-	}
-	/* That should have freed everything. */
-	BUG_ON(vq->vq.num_free != vq->vring.num);
-
-	END_USE(vq);
-	return NULL;
+	return vq->packed ? virtqueue_detach_unused_buf_packed(_vq) :
+			    virtqueue_detach_unused_buf_split(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
@@ -955,7 +1167,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 EXPORT_SYMBOL_GPL(vring_interrupt);
 
 struct virtqueue *__vring_new_virtqueue(unsigned int index,
-					struct vring vring,
+					union vring_union vring,
+					bool packed,
 					struct virtio_device *vdev,
 					bool weak_barriers,
 					bool context,
@@ -963,19 +1176,22 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 					void (*callback)(struct virtqueue *),
 					const char *name)
 {
-	unsigned int i;
 	struct vring_virtqueue *vq;
+	unsigned int num, i;
+	size_t size;
 
-	vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
-		     GFP_KERNEL);
+	num = packed ? vring.vring_packed.num : vring.vring_split.num;
+	size = packed ? num * sizeof(struct vring_desc_state_packed) :
+			num * sizeof(struct vring_desc_state);
+
+	vq = kmalloc(sizeof(*vq) + size, GFP_KERNEL);
 	if (!vq)
 		return NULL;
 
-	vq->vring = vring;
 	vq->vq.callback = callback;
 	vq->vq.vdev = vdev;
 	vq->vq.name = name;
-	vq->vq.num_free = vring.num;
+	vq->vq.num_free = num;
 	vq->vq.index = index;
 	vq->we_own_ring = false;
 	vq->queue_dma_addr = 0;
@@ -984,9 +1200,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
-	vq->avail_flags_shadow = 0;
-	vq->avail_idx_shadow = 0;
 	vq->num_added = 0;
+	vq->packed = packed;
 	list_add_tail(&vq->vq.list, &vdev->vqs);
 #ifdef DEBUG
 	vq->in_use = false;
@@ -997,19 +1212,48 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 		!context;
 	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
 
+	if (vq->packed) {
+		vq->vring_packed = vring.vring_packed;
+		vq->next_avail_idx = 0;
+		vq->avail_wrap_counter = 1;
+		vq->used_wrap_counter = 1;
+		vq->event_flags_shadow = 0;
+
+		memset(vq->desc_state_packed, 0,
+			num * sizeof(struct vring_desc_state_packed));
+
+		/* Put everything in free lists. */
+		vq->free_head = 0;
+		for (i = 0; i < num-1; i++)
+			vq->desc_state_packed[i].next = i + 1;
+	} else {
+		vq->vring = vring.vring_split;
+		vq->avail_flags_shadow = 0;
+		vq->avail_idx_shadow = 0;
+
+		/* Put everything in free lists. */
+		vq->free_head = 0;
+		for (i = 0; i < num-1; i++)
+			vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
+
+		memset(vq->desc_state, 0,
+			num * sizeof(struct vring_desc_state));
+	}
+
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback) {
-		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
-			vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
+		if (packed) {
+			vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
+			vq->vring_packed.driver->flags = cpu_to_virtio16(vdev,
+						vq->event_flags_shadow);
+		} else {
+			vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+			if (!vq->event)
+				vq->vring.avail->flags = cpu_to_virtio16(vdev,
+						vq->avail_flags_shadow);
+		}
 	}
 
-	/* Put everything in free lists. */
-	vq->free_head = 0;
-	for (i = 0; i < vring.num-1; i++)
-		vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
-	memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
-
 	return &vq->vq;
 }
 EXPORT_SYMBOL_GPL(__vring_new_virtqueue);
@@ -1056,6 +1300,12 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
 	}
 }
 
+static inline int
+__vring_size(unsigned int num, unsigned long align, bool packed)
+{
+	return packed ? vring_size_packed(num, align) : vring_size(num, align);
+}
+
 struct virtqueue *vring_create_virtqueue(
 	unsigned int index,
 	unsigned int num,
@@ -1072,7 +1322,8 @@ struct virtqueue *vring_create_virtqueue(
 	void *queue = NULL;
 	dma_addr_t dma_addr;
 	size_t queue_size_in_bytes;
-	struct vring vring;
+	union vring_union vring;
+	bool packed;
 
 	/* We assume num is a power of 2. */
 	if (num & (num - 1)) {
@@ -1080,9 +1331,13 @@ struct virtqueue *vring_create_virtqueue(
 		return NULL;
 	}
 
+	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
+
 	/* TODO: allocate each queue chunk individually */
-	for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
-		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+	for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
+			num /= 2) {
+		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+							     packed),
 					  &dma_addr,
 					  GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
 		if (queue)
@@ -1094,17 +1349,21 @@ struct virtqueue *vring_create_virtqueue(
 
 	if (!queue) {
 		/* Try to get a single page. You are my only hope! */
-		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+							     packed),
 					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
 	}
 	if (!queue)
 		return NULL;
 
-	queue_size_in_bytes = vring_size(num, vring_align);
-	vring_init(&vring, num, queue, vring_align);
+	queue_size_in_bytes = __vring_size(num, vring_align, packed);
+	if (packed)
+		vring_init_packed(&vring.vring_packed, num, queue, vring_align);
+	else
+		vring_init(&vring.vring_split, num, queue, vring_align);
 
-	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
-				   notify, callback, name);
+	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+				   context, notify, callback, name);
 	if (!vq) {
 		vring_free_queue(vdev, queue_size_in_bytes, queue,
 				 dma_addr);
@@ -1130,10 +1389,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      void (*callback)(struct virtqueue *vq),
 				      const char *name)
 {
-	struct vring vring;
-	vring_init(&vring, num, pages, vring_align);
-	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
-				     notify, callback, name);
+	union vring_union vring;
+	bool packed;
+
+	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
+	if (packed)
+		vring_init_packed(&vring.vring_packed, num, pages, vring_align);
+	else
+		vring_init(&vring.vring_split, num, pages, vring_align);
+
+	return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+				     context, notify, callback, name);
 }
 EXPORT_SYMBOL_GPL(vring_new_virtqueue);
 
@@ -1143,7 +1409,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 
 	if (vq->we_own_ring) {
 		vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
-				 vq->vring.desc, vq->queue_dma_addr);
+				 vq->packed ? (void *)vq->vring_packed.desc :
+					      (void *)vq->vring.desc,
+				 vq->queue_dma_addr);
 	}
 	list_del(&_vq->list);
 	kfree(vq);
@@ -1185,7 +1453,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
 
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->vring.num;
+	return vq->packed ? vq->vring_packed.num : vq->vring.num;
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
 
@@ -1228,6 +1496,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
 
 	BUG_ON(!vq->we_own_ring);
 
+	if (vq->packed)
+		return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
+				(char *)vq->vring_packed.desc);
+
 	return vq->queue_dma_addr +
 		((char *)vq->vring.avail - (char *)vq->vring.desc);
 }
@@ -1239,11 +1511,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
 
 	BUG_ON(!vq->we_own_ring);
 
+	if (vq->packed)
+		return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
+				(char *)vq->vring_packed.desc);
+
 	return vq->queue_dma_addr +
 		((char *)vq->vring.used - (char *)vq->vring.desc);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
 
+/* Only available for split ring */
 const struct vring *virtqueue_get_vring(struct virtqueue *vq)
 {
 	return &to_vvq(vq)->vring;
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index bbf32524ab27..a0075894ad16 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
 struct virtio_device;
 struct virtqueue;
 
+union vring_union {
+	struct vring vring_split;
+	struct vring_packed vring_packed;
+};
+
 /*
  * Creates a virtqueue and allocates the descriptor ring.  If
  * may_reduce_num is set, then this may allocate a smaller ring than
@@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
 
 /* Creates a virtqueue with a custom layout. */
 struct virtqueue *__vring_new_virtqueue(unsigned int index,
-					struct vring vring,
+					union vring_union vring,
+					bool packed,
 					struct virtio_device *vdev,
 					bool weak_barriers,
 					bool ctx,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v5 3/5] virtio_ring: add packed ring support
  2018-05-22  8:16 [RFC v5 0/5] virtio: support packed ring Tiwei Bie
  2018-05-22  8:16 ` [RFC v5 1/5] virtio: add packed ring definitions Tiwei Bie
  2018-05-22  8:16 ` [RFC v5 2/5] virtio_ring: support creating packed ring Tiwei Bie
@ 2018-05-22  8:16 ` Tiwei Bie
  2018-05-29  3:18   ` Jason Wang
  2018-05-22  8:16 ` [RFC v5 4/5] virtio_ring: add event idx support in packed ring Tiwei Bie
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Tiwei Bie @ 2018-05-22  8:16 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie

This commit introduces the support (without EVENT_IDX) for
packed ring.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/virtio/virtio_ring.c | 474 ++++++++++++++++++++++++++++++++++-
 1 file changed, 468 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f5ef5f42a7cf..eb9fd5207a68 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -62,6 +62,12 @@ struct vring_desc_state {
 };
 
 struct vring_desc_state_packed {
+	void *data;			/* Data for callback. */
+	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
+	int num;			/* Descriptor list length. */
+	dma_addr_t addr;		/* Buffer DMA addr. */
+	u32 len;			/* Buffer length. */
+	u16 flags;			/* Descriptor flags. */
 	int next;			/* The next desc state. */
 };
 
@@ -758,6 +764,72 @@ static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
 		& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
 }
 
+static void vring_unmap_state_packed(const struct vring_virtqueue *vq,
+				     struct vring_desc_state_packed *state)
+{
+	u16 flags;
+
+	if (!vring_use_dma_api(vq->vq.vdev))
+		return;
+
+	flags = state->flags;
+
+	if (flags & VRING_DESC_F_INDIRECT) {
+		dma_unmap_single(vring_dma_dev(vq),
+				 state->addr, state->len,
+				 (flags & VRING_DESC_F_WRITE) ?
+				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
+	} else {
+		dma_unmap_page(vring_dma_dev(vq),
+			       state->addr, state->len,
+			       (flags & VRING_DESC_F_WRITE) ?
+			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
+	}
+}
+
+static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
+				   struct vring_packed_desc *desc)
+{
+	u16 flags;
+
+	if (!vring_use_dma_api(vq->vq.vdev))
+		return;
+
+	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+
+	if (flags & VRING_DESC_F_INDIRECT) {
+		dma_unmap_single(vring_dma_dev(vq),
+				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
+				 virtio32_to_cpu(vq->vq.vdev, desc->len),
+				 (flags & VRING_DESC_F_WRITE) ?
+				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
+	} else {
+		dma_unmap_page(vring_dma_dev(vq),
+			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
+			       virtio32_to_cpu(vq->vq.vdev, desc->len),
+			       (flags & VRING_DESC_F_WRITE) ?
+			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
+	}
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+						       unsigned int total_sg,
+						       gfp_t gfp)
+{
+	struct vring_packed_desc *desc;
+
+	/*
+	 * We require lowmem mappings for the descriptors because
+	 * otherwise virt_to_phys will give us bogus addresses in the
+	 * virtqueue.
+	 */
+	gfp &= ~__GFP_HIGHMEM;
+
+	desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+	return desc;
+}
+
 static inline int virtqueue_add_packed(struct virtqueue *_vq,
 				       struct scatterlist *sgs[],
 				       unsigned int total_sg,
@@ -767,47 +839,437 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 				       void *ctx,
 				       gfp_t gfp)
 {
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct vring_packed_desc *desc;
+	struct scatterlist *sg;
+	unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
+	__virtio16 uninitialized_var(head_flags), flags;
+	u16 head, avail_wrap_counter, id, cur;
+	bool indirect;
+
+	START_USE(vq);
+
+	BUG_ON(data == NULL);
+	BUG_ON(ctx && vq->indirect);
+
+	if (unlikely(vq->broken)) {
+		END_USE(vq);
+		return -EIO;
+	}
+
+#ifdef DEBUG
+	{
+		ktime_t now = ktime_get();
+
+		/* No kick or get, with .1 second between?  Warn. */
+		if (vq->last_add_time_valid)
+			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+					    > 100);
+		vq->last_add_time = now;
+		vq->last_add_time_valid = true;
+	}
+#endif
+
+	BUG_ON(total_sg == 0);
+
+	head = vq->next_avail_idx;
+	avail_wrap_counter = vq->avail_wrap_counter;
+
+	if (virtqueue_use_indirect(_vq, total_sg))
+		desc = alloc_indirect_packed(_vq, total_sg, gfp);
+	else {
+		desc = NULL;
+		WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
+	}
+
+	if (desc) {
+		/* Use a single buffer which doesn't continue */
+		indirect = true;
+		/* Set up rest to use this indirect table. */
+		i = 0;
+		descs_used = 1;
+	} else {
+		indirect = false;
+		desc = vq->vring_packed.desc;
+		i = head;
+		descs_used = total_sg;
+	}
+
+	if (vq->vq.num_free < descs_used) {
+		pr_debug("Can't add buf len %i - avail = %i\n",
+			 descs_used, vq->vq.num_free);
+		/* FIXME: for historical reasons, we force a notify here if
+		 * there are outgoing parts to the buffer.  Presumably the
+		 * host should service the ring ASAP. */
+		if (out_sgs)
+			vq->notify(&vq->vq);
+		if (indirect)
+			kfree(desc);
+		END_USE(vq);
+		return -ENOSPC;
+	}
+
+	id = vq->free_head;
+	BUG_ON(id == vq->vring_packed.num);
+
+	cur = id;
+	for (n = 0; n < out_sgs + in_sgs; n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
+					       DMA_TO_DEVICE : DMA_FROM_DEVICE);
+			if (vring_mapping_error(vq, addr))
+				goto unmap_release;
+
+			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
+				  (n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
+				  VRING_DESC_F_AVAIL(vq->avail_wrap_counter) |
+				  VRING_DESC_F_USED(!vq->avail_wrap_counter));
+			if (!indirect && i == head)
+				head_flags = flags;
+			else
+				desc[i].flags = flags;
+
+			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
+			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
+			i++;
+			if (!indirect) {
+				vq->desc_state_packed[cur].addr = addr;
+				vq->desc_state_packed[cur].len = sg->length;
+				vq->desc_state_packed[cur].flags =
+					virtio16_to_cpu(_vq->vdev, flags);
+				cur = vq->desc_state_packed[cur].next;
+
+				if (i >= vq->vring_packed.num) {
+					i = 0;
+					vq->avail_wrap_counter ^= 1;
+				}
+			}
+		}
+	}
+
+	prev = (i > 0 ? i : vq->vring_packed.num) - 1;
+	desc[prev].id = cpu_to_virtio16(_vq->vdev, id);
+
+	/* Last one doesn't continue. */
+	if (total_sg == 1)
+		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
+	else
+		desc[prev].flags &= cpu_to_virtio16(_vq->vdev,
+						~VRING_DESC_F_NEXT);
+
+	if (indirect) {
+		/* Now that the indirect table is filled in, map it. */
+		dma_addr_t addr = vring_map_single(
+			vq, desc, total_sg * sizeof(struct vring_packed_desc),
+			DMA_TO_DEVICE);
+		if (vring_mapping_error(vq, addr))
+			goto unmap_release;
+
+		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
+				       VRING_DESC_F_AVAIL(avail_wrap_counter) |
+				       VRING_DESC_F_USED(!avail_wrap_counter));
+		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev,
+								   addr);
+		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
+				total_sg * sizeof(struct vring_packed_desc));
+		vq->vring_packed.desc[head].id = cpu_to_virtio16(_vq->vdev, id);
+
+		vq->desc_state_packed[id].addr = addr;
+		vq->desc_state_packed[id].len = total_sg *
+				sizeof(struct vring_packed_desc);
+		vq->desc_state_packed[id].flags =
+				virtio16_to_cpu(_vq->vdev, head_flags);
+	}
+
+	/* We're using some buffers from the free list. */
+	vq->vq.num_free -= descs_used;
+
+	/* Update free pointer */
+	if (indirect) {
+		n = head + 1;
+		if (n >= vq->vring_packed.num) {
+			n = 0;
+			vq->avail_wrap_counter ^= 1;
+		}
+		vq->next_avail_idx = n;
+		vq->free_head = vq->desc_state_packed[id].next;
+	} else {
+		vq->next_avail_idx = i;
+		vq->free_head = cur;
+	}
+
+	/* Store token and indirect buffer state. */
+	vq->desc_state_packed[id].num = descs_used;
+	vq->desc_state_packed[id].data = data;
+	if (indirect)
+		vq->desc_state_packed[id].indir_desc = desc;
+	else
+		vq->desc_state_packed[id].indir_desc = ctx;
+
+	/* A driver MUST NOT make the first descriptor in the list
+	 * available before all subsequent descriptors comprising
+	 * the list are made available. */
+	virtio_wmb(vq->weak_barriers);
+	vq->vring_packed.desc[head].flags = head_flags;
+	vq->num_added += descs_used;
+
+	pr_debug("Added buffer head %i to %p\n", head, vq);
+	END_USE(vq);
+
+	return 0;
+
+unmap_release:
+	err_idx = i;
+	i = head;
+
+	for (n = 0; n < total_sg; n++) {
+		if (i == err_idx)
+			break;
+		vring_unmap_desc_packed(vq, &desc[i]);
+		i++;
+		if (!indirect && i >= vq->vring_packed.num)
+			i = 0;
+	}
+
+	vq->avail_wrap_counter = avail_wrap_counter;
+
+	if (indirect)
+		kfree(desc);
+
+	END_USE(vq);
 	return -EIO;
 }
 
 static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 {
-	return false;
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	u16 flags;
+	bool needs_kick;
+	u32 snapshot;
+
+	START_USE(vq);
+	/* We need to expose the new flags value before checking notification
+	 * suppressions. */
+	virtio_mb(vq->weak_barriers);
+
+	snapshot = *(u32 *)vq->vring_packed.device;
+	flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
+
+#ifdef DEBUG
+	if (vq->last_add_time_valid) {
+		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
+					      vq->last_add_time)) > 100);
+	}
+	vq->last_add_time_valid = false;
+#endif
+
+	needs_kick = (flags != VRING_EVENT_F_DISABLE);
+	END_USE(vq);
+	return needs_kick;
+}
+
+static void detach_buf_packed(struct vring_virtqueue *vq,
+			      unsigned int id, void **ctx)
+{
+	struct vring_desc_state_packed *state;
+	struct vring_packed_desc *desc;
+	unsigned int i;
+	int *next;
+
+	/* Clear data ptr. */
+	vq->desc_state_packed[id].data = NULL;
+
+	next = &id;
+	for (i = 0; i < vq->desc_state_packed[id].num; i++) {
+		state = &vq->desc_state_packed[*next];
+		vring_unmap_state_packed(vq, state);
+		next = &vq->desc_state_packed[*next].next;
+	}
+
+	vq->vq.num_free += vq->desc_state_packed[id].num;
+	*next = vq->free_head;
+	vq->free_head = id;
+
+	if (vq->indirect) {
+		u32 len;
+
+		/* Free the indirect table, if any, now that it's unmapped. */
+		desc = vq->desc_state_packed[id].indir_desc;
+		if (!desc)
+			return;
+
+		len = vq->desc_state_packed[id].len;
+		for (i = 0; i < len / sizeof(struct vring_packed_desc); i++)
+			vring_unmap_desc_packed(vq, &desc[i]);
+
+		kfree(desc);
+		vq->desc_state_packed[id].indir_desc = NULL;
+	} else if (ctx) {
+		*ctx = vq->desc_state_packed[id].indir_desc;
+	}
 }
 
 static inline bool more_used_packed(const struct vring_virtqueue *vq)
 {
-	return false;
+	u16 last_used, flags;
+	u8 avail, used;
+
+	last_used = vq->last_used_idx;
+	flags = virtio16_to_cpu(vq->vq.vdev,
+				vq->vring_packed.desc[last_used].flags);
+	avail = !!(flags & VRING_DESC_F_AVAIL(1));
+	used = !!(flags & VRING_DESC_F_USED(1));
+
+	return avail == used && used == vq->used_wrap_counter;
 }
 
 static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 					  unsigned int *len,
 					  void **ctx)
 {
-	return NULL;
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	u16 last_used, id;
+	void *ret;
+
+	START_USE(vq);
+
+	if (unlikely(vq->broken)) {
+		END_USE(vq);
+		return NULL;
+	}
+
+	if (!more_used_packed(vq)) {
+		pr_debug("No more buffers in queue\n");
+		END_USE(vq);
+		return NULL;
+	}
+
+	/* Only get used elements after they have been exposed by host. */
+	virtio_rmb(vq->weak_barriers);
+
+	last_used = vq->last_used_idx;
+	id = virtio16_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
+	*len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
+
+	if (unlikely(id >= vq->vring_packed.num)) {
+		BAD_RING(vq, "id %u out of range\n", id);
+		return NULL;
+	}
+	if (unlikely(!vq->desc_state_packed[id].data)) {
+		BAD_RING(vq, "id %u is not a head!\n", id);
+		return NULL;
+	}
+
+	vq->last_used_idx += vq->desc_state_packed[id].num;
+	if (vq->last_used_idx >= vq->vring_packed.num) {
+		vq->last_used_idx -= vq->vring_packed.num;
+		vq->used_wrap_counter ^= 1;
+	}
+
+	/* detach_buf_packed clears data, so grab it now. */
+	ret = vq->desc_state_packed[id].data;
+	detach_buf_packed(vq, id, ctx);
+
+#ifdef DEBUG
+	vq->last_add_time_valid = false;
+#endif
+
+	END_USE(vq);
+	return ret;
 }
 
 static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
 {
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (vq->event_flags_shadow != VRING_EVENT_F_DISABLE) {
+		vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
+		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
+							vq->event_flags_shadow);
+	}
 }
 
 static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
 {
-	return 0;
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	START_USE(vq);
+
+	/* We optimistically turn back on interrupts, then check if there was
+	 * more to do. */
+
+	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
+		virtio_wmb(vq->weak_barriers);
+		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
+							vq->event_flags_shadow);
+	}
+
+	END_USE(vq);
+	return vq->last_used_idx;
 }
 
 static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
 {
-	return false;
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	u8 avail, used;
+	u16 flags;
+
+	virtio_mb(vq->weak_barriers);
+	flags = virtio16_to_cpu(vq->vq.vdev,
+			vq->vring_packed.desc[last_used_idx].flags);
+	avail = !!(flags & VRING_DESC_F_AVAIL(1));
+	used = !!(flags & VRING_DESC_F_USED(1));
+
+	return avail == used && used == vq->used_wrap_counter;
 }
 
 static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
 {
-	return false;
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	START_USE(vq);
+
+	/* We optimistically turn back on interrupts, then check if there was
+	 * more to do. */
+
+	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
+		virtio_wmb(vq->weak_barriers);
+		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
+							vq->event_flags_shadow);
+	}
+
+	if (more_used_packed(vq)) {
+		END_USE(vq);
+		return false;
+	}
+
+	END_USE(vq);
+	return true;
 }
 
 static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
 {
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	unsigned int i;
+	void *buf;
+
+	START_USE(vq);
+
+	for (i = 0; i < vq->vring_packed.num; i++) {
+		if (!vq->desc_state_packed[i].data)
+			continue;
+		/* detach_buf clears data, so grab it now. */
+		buf = vq->desc_state_packed[i].data;
+		detach_buf_packed(vq, i, NULL);
+		END_USE(vq);
+		return buf;
+	}
+	/* That should have freed everything. */
+	BUG_ON(vq->vq.num_free != vq->vring_packed.num);
+
+	END_USE(vq);
 	return NULL;
 }
 
-- 
2.17.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v5 4/5] virtio_ring: add event idx support in packed ring
  2018-05-22  8:16 [RFC v5 0/5] virtio: support packed ring Tiwei Bie
                   ` (2 preceding siblings ...)
  2018-05-22  8:16 ` [RFC v5 3/5] virtio_ring: add packed ring support Tiwei Bie
@ 2018-05-22  8:16 ` Tiwei Bie
  2018-05-22  8:16 ` [RFC v5 5/5] virtio_ring: enable " Tiwei Bie
  2018-05-25  2:31 ` [RFC v5 0/5] virtio: support " Jason Wang
  5 siblings, 0 replies; 14+ messages in thread
From: Tiwei Bie @ 2018-05-22  8:16 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie

This commit introduces the EVENT_IDX support in packed ring.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/virtio/virtio_ring.c | 67 ++++++++++++++++++++++++++++++++++--
 1 file changed, 64 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index eb9fd5207a68..1ee52a89cb04 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1043,7 +1043,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	u16 flags;
+	u16 new, old, off_wrap, flags, wrap_counter, event_idx;
 	bool needs_kick;
 	u32 snapshot;
 
@@ -1052,9 +1052,19 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 	 * suppressions. */
 	virtio_mb(vq->weak_barriers);
 
+	old = vq->next_avail_idx - vq->num_added;
+	new = vq->next_avail_idx;
+	vq->num_added = 0;
+
 	snapshot = *(u32 *)vq->vring_packed.device;
+	off_wrap = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot & 0xffff));
 	flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
 
+	wrap_counter = off_wrap >> 15;
+	event_idx = off_wrap & ~(1<<15);
+	if (wrap_counter != vq->avail_wrap_counter)
+		event_idx -= vq->vring_packed.num;
+
 #ifdef DEBUG
 	if (vq->last_add_time_valid) {
 		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
@@ -1063,7 +1073,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 	vq->last_add_time_valid = false;
 #endif
 
-	needs_kick = (flags != VRING_EVENT_F_DISABLE);
+	if (flags == VRING_EVENT_F_DESC)
+		needs_kick = vring_need_event(event_idx, new, old);
+	else
+		needs_kick = (flags != VRING_EVENT_F_DISABLE);
 	END_USE(vq);
 	return needs_kick;
 }
@@ -1170,6 +1183,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 	ret = vq->desc_state_packed[id].data;
 	detach_buf_packed(vq, id, ctx);
 
+	/* If we expect an interrupt for the next entry, tell host
+	 * by writing event index and flush out the write before
+	 * the read in the next get_buf call. */
+	if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
+		virtio_store_mb(vq->weak_barriers,
+				&vq->vring_packed.driver->off_wrap,
+				cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
+					((u16)vq->used_wrap_counter << 15)));
+
 #ifdef DEBUG
 	vq->last_add_time_valid = false;
 #endif
@@ -1197,10 +1219,26 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
 
 	/* We optimistically turn back on interrupts, then check if there was
 	 * more to do. */
+	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
+	 * either clear the flags bit or point the event index at the next
+	 * entry. Always update the event index to keep code simple. */
+
+	vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+			vq->last_used_idx |
+			((u16)vq->used_wrap_counter << 15));
 
 	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
 		virtio_wmb(vq->weak_barriers);
+		// XXX XXX XXX
+		// Setting VRING_EVENT_F_DESC in this function
+		// will break netperf test for now.
+		// Will need to look into this.
+#if 0
+		vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+						     VRING_EVENT_F_ENABLE;
+#else
 		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+#endif
 		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
 							vq->event_flags_shadow);
 	}
@@ -1227,15 +1265,38 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
 static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
+	u16 bufs, used_idx, wrap_counter;
 
 	START_USE(vq);
 
 	/* We optimistically turn back on interrupts, then check if there was
 	 * more to do. */
+	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
+	 * either clear the flags bit or point the event index at the next
+	 * entry. Always update the event index to keep code simple. */
+
+	/* TODO: tune this threshold */
+	if (vq->next_avail_idx < vq->last_used_idx)
+		bufs = (vq->vring_packed.num + vq->next_avail_idx -
+				vq->last_used_idx) * 3 / 4;
+	else
+		bufs = (vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
+
+	wrap_counter = vq->used_wrap_counter;
+
+	used_idx = vq->last_used_idx + bufs;
+	if (used_idx >= vq->vring_packed.num) {
+		used_idx -= vq->vring_packed.num;
+		wrap_counter ^= 1;
+	}
+
+	vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+			used_idx | (wrap_counter << 15));
 
 	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
 		virtio_wmb(vq->weak_barriers);
-		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+		vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+						     VRING_EVENT_F_ENABLE;
 		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
 							vq->event_flags_shadow);
 	}
-- 
2.17.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v5 5/5] virtio_ring: enable packed ring
  2018-05-22  8:16 [RFC v5 0/5] virtio: support packed ring Tiwei Bie
                   ` (3 preceding siblings ...)
  2018-05-22  8:16 ` [RFC v5 4/5] virtio_ring: add event idx support in packed ring Tiwei Bie
@ 2018-05-22  8:16 ` Tiwei Bie
  2018-05-25  2:31 ` [RFC v5 0/5] virtio: support " Jason Wang
  5 siblings, 0 replies; 14+ messages in thread
From: Tiwei Bie @ 2018-05-22  8:16 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/virtio/virtio_ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1ee52a89cb04..355dfef4f1eb 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1956,6 +1956,8 @@ void vring_transport_features(struct virtio_device *vdev)
 			break;
 		case VIRTIO_F_IOMMU_PLATFORM:
 			break;
+		case VIRTIO_F_RING_PACKED:
+			break;
 		default:
 			/* We don't understand this bit. */
 			__virtio_clear_bit(vdev, i);
-- 
2.17.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v5 0/5] virtio: support packed ring
  2018-05-22  8:16 [RFC v5 0/5] virtio: support packed ring Tiwei Bie
                   ` (4 preceding siblings ...)
  2018-05-22  8:16 ` [RFC v5 5/5] virtio_ring: enable " Tiwei Bie
@ 2018-05-25  2:31 ` Jason Wang
  2018-05-25  3:07   ` Tiwei Bie
  5 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2018-05-25  2:31 UTC (permalink / raw)
  To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu, jfreimann



On 2018年05月22日 16:16, Tiwei Bie wrote:
> Hello everyone,
>
> This RFC implements packed ring support in virtio driver.
>
> Some simple functional tests have been done with Jason's
> packed ring implementation in vhost (RFC v4):
>
> https://lkml.org/lkml/2018/5/16/501
>
> Both of ping and netperf worked as expected w/ EVENT_IDX
> disabled. Ping worked as expected w/ EVENT_IDX enabled,
> but netperf didn't (A hack has been added in the driver
> to make netperf test pass in this case. The hack can be
> found by `grep -rw XXX` in the code).

Looks like this is because a bug in vhost which wrongly track 
signalled_used and may miss an interrupt. After fixing that, both side 
works like a charm.

I'm testing vIOMMU and zerocopy, and will post a new version shortly. 
(Hope it would be the last RFC version).

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v5 0/5] virtio: support packed ring
  2018-05-25  2:31 ` [RFC v5 0/5] virtio: support " Jason Wang
@ 2018-05-25  3:07   ` Tiwei Bie
  0 siblings, 0 replies; 14+ messages in thread
From: Tiwei Bie @ 2018-05-25  3:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann

On Fri, May 25, 2018 at 10:31:26AM +0800, Jason Wang wrote:
> On 2018年05月22日 16:16, Tiwei Bie wrote:
> > Hello everyone,
> > 
> > This RFC implements packed ring support in virtio driver.
> > 
> > Some simple functional tests have been done with Jason's
> > packed ring implementation in vhost (RFC v4):
> > 
> > https://lkml.org/lkml/2018/5/16/501
> > 
> > Both of ping and netperf worked as expected w/ EVENT_IDX
> > disabled. Ping worked as expected w/ EVENT_IDX enabled,
> > but netperf didn't (A hack has been added in the driver
> > to make netperf test pass in this case. The hack can be
> > found by `grep -rw XXX` in the code).
> 
> Looks like this is because a bug in vhost which wrongly track signalled_used
> and may miss an interrupt. After fixing that, both side works like a charm.
> 
> I'm testing vIOMMU and zerocopy, and will post a new version shortly. (Hope
> it would be the last RFC version).

Great to know that. Thanks a lot! :)

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v5 2/5] virtio_ring: support creating packed ring
  2018-05-22  8:16 ` [RFC v5 2/5] virtio_ring: support creating packed ring Tiwei Bie
@ 2018-05-29  2:49   ` Jason Wang
  2018-05-29  5:24     ` Tiwei Bie
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2018-05-29  2:49 UTC (permalink / raw)
  To: Tiwei Bie, mst, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, Cornelia Huck



On 2018年05月22日 16:16, Tiwei Bie wrote:
> This commit introduces the support for creating packed ring.
> All split ring specific functions are added _split suffix.
> Some necessary stubs for packed ring are also added.
>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 801 +++++++++++++++++++++++------------
>   include/linux/virtio_ring.h  |   8 +-
>   2 files changed, 546 insertions(+), 263 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71458f493cf8..f5ef5f42a7cf 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -61,11 +61,15 @@ struct vring_desc_state {
>   	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
>   };
>   
> +struct vring_desc_state_packed {
> +	int next;			/* The next desc state. */
> +};
> +
>   struct vring_virtqueue {
>   	struct virtqueue vq;
>   
> -	/* Actual memory layout for this queue */
> -	struct vring vring;
> +	/* Is this a packed ring? */
> +	bool packed;
>   
>   	/* Can we use weak barriers? */
>   	bool weak_barriers;
> @@ -87,11 +91,39 @@ struct vring_virtqueue {
>   	/* Last used index we've seen. */
>   	u16 last_used_idx;
>   
> -	/* Last written value to avail->flags */
> -	u16 avail_flags_shadow;
> +	union {
> +		/* Available for split ring */
> +		struct {
> +			/* Actual memory layout for this queue. */
> +			struct vring vring;
>   
> -	/* Last written value to avail->idx in guest byte order */
> -	u16 avail_idx_shadow;
> +			/* Last written value to avail->flags */
> +			u16 avail_flags_shadow;
> +
> +			/* Last written value to avail->idx in
> +			 * guest byte order. */
> +			u16 avail_idx_shadow;
> +		};
> +
> +		/* Available for packed ring */
> +		struct {
> +			/* Actual memory layout for this queue. */
> +			struct vring_packed vring_packed;
> +
> +			/* Driver ring wrap counter. */
> +			u8 avail_wrap_counter;
> +
> +			/* Device ring wrap counter. */
> +			u8 used_wrap_counter;

How about just use boolean?

> +
> +			/* Index of the next avail descriptor. */
> +			u16 next_avail_idx;
> +
> +			/* Last written value to driver->flags in
> +			 * guest byte order. */
> +			u16 event_flags_shadow;
> +		};
> +	};
>   
>   	/* How to notify other side. FIXME: commonalize hcalls! */
>   	bool (*notify)(struct virtqueue *vq);
> @@ -111,11 +143,24 @@ struct vring_virtqueue {
>   #endif
>   
>   	/* Per-descriptor state. */
> -	struct vring_desc_state desc_state[];
> +	union {
> +		struct vring_desc_state desc_state[1];
> +		struct vring_desc_state_packed desc_state_packed[1];
> +	};
>   };
>   
>   #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>   
> +static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
> +					  unsigned int total_sg)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	/* If the host supports indirect descriptor tables, and we have multiple
> +	 * buffers, then go indirect. FIXME: tune this threshold */
> +	return (vq->indirect && total_sg > 1 && vq->vq.num_free);
> +}
> +
>   /*
>    * Modern virtio devices have feature bits to specify whether they need a
>    * quirk and bypass the IOMMU. If not there, just use the DMA API.
> @@ -201,8 +246,17 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
>   			      cpu_addr, size, direction);
>   }
>   
> -static void vring_unmap_one(const struct vring_virtqueue *vq,
> -			    struct vring_desc *desc)
> +static int vring_mapping_error(const struct vring_virtqueue *vq,
> +			       dma_addr_t addr)
> +{
> +	if (!vring_use_dma_api(vq->vq.vdev))
> +		return 0;
> +
> +	return dma_mapping_error(vring_dma_dev(vq), addr);
> +}
> +
> +static void vring_unmap_one_split(const struct vring_virtqueue *vq,
> +				  struct vring_desc *desc)
>   {
>   	u16 flags;
>   
> @@ -226,17 +280,9 @@ static void vring_unmap_one(const struct vring_virtqueue *vq,
>   	}
>   }
>   
> -static int vring_mapping_error(const struct vring_virtqueue *vq,
> -			       dma_addr_t addr)
> -{
> -	if (!vring_use_dma_api(vq->vq.vdev))
> -		return 0;
> -
> -	return dma_mapping_error(vring_dma_dev(vq), addr);
> -}

It looks to me if you keep vring_mapping_error behind 
vring_unmap_one_split(), lots of changes were unncessary.

> -
> -static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> -					 unsigned int total_sg, gfp_t gfp)
> +static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +					       unsigned int total_sg,
> +					       gfp_t gfp)
>   {
>   	struct vring_desc *desc;
>   	unsigned int i;
> @@ -257,14 +303,14 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
>   	return desc;
>   }
>   
> -static inline int virtqueue_add(struct virtqueue *_vq,
> -				struct scatterlist *sgs[],
> -				unsigned int total_sg,
> -				unsigned int out_sgs,
> -				unsigned int in_sgs,
> -				void *data,
> -				void *ctx,
> -				gfp_t gfp)
> +static inline int virtqueue_add_split(struct virtqueue *_vq,
> +				      struct scatterlist *sgs[],
> +				      unsigned int total_sg,
> +				      unsigned int out_sgs,
> +				      unsigned int in_sgs,
> +				      void *data,
> +				      void *ctx,
> +				      gfp_t gfp)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   	struct scatterlist *sg;
> @@ -300,10 +346,8 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   
>   	head = vq->free_head;
>   
> -	/* If the host supports indirect descriptor tables, and we have multiple
> -	 * buffers, then go indirect. FIXME: tune this threshold */
> -	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> -		desc = alloc_indirect(_vq, total_sg, gfp);
> +	if (virtqueue_use_indirect(_vq, total_sg))
> +		desc = alloc_indirect_split(_vq, total_sg, gfp);
>   	else {
>   		desc = NULL;
>   		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> @@ -424,7 +468,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   	for (n = 0; n < total_sg; n++) {
>   		if (i == err_idx)
>   			break;
> -		vring_unmap_one(vq, &desc[i]);
> +		vring_unmap_one_split(vq, &desc[i]);
>   		i = virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next);
>   	}
>   
> @@ -435,6 +479,355 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   	return -EIO;
>   }
>   
> +static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u16 new, old;
> +	bool needs_kick;
> +
> +	START_USE(vq);
> +	/* We need to expose available array entries before checking avail
> +	 * event. */
> +	virtio_mb(vq->weak_barriers);
> +
> +	old = vq->avail_idx_shadow - vq->num_added;
> +	new = vq->avail_idx_shadow;
> +	vq->num_added = 0;
> +
> +#ifdef DEBUG
> +	if (vq->last_add_time_valid) {
> +		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> +					      vq->last_add_time)) > 100);
> +	}
> +	vq->last_add_time_valid = false;
> +#endif
> +
> +	if (vq->event) {
> +		needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, vring_avail_event(&vq->vring)),
> +					      new, old);
> +	} else {
> +		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
> +	}
> +	END_USE(vq);
> +	return needs_kick;
> +}
> +
> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> +			     void **ctx)
> +{
> +	unsigned int i, j;
> +	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> +
> +	/* Clear data ptr. */
> +	vq->desc_state[head].data = NULL;
> +
> +	/* Put back on free list: unmap first-level descriptors and find end */
> +	i = head;
> +
> +	while (vq->vring.desc[i].flags & nextflag) {
> +		vring_unmap_one_split(vq, &vq->vring.desc[i]);
> +		i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
> +		vq->vq.num_free++;
> +	}
> +
> +	vring_unmap_one_split(vq, &vq->vring.desc[i]);
> +	vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
> +	vq->free_head = head;
> +
> +	/* Plus final descriptor */
> +	vq->vq.num_free++;
> +
> +	if (vq->indirect) {
> +		struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
> +		u32 len;
> +
> +		/* Free the indirect table, if any, now that it's unmapped. */
> +		if (!indir_desc)
> +			return;
> +
> +		len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
> +
> +		BUG_ON(!(vq->vring.desc[head].flags &
> +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> +		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> +
> +		for (j = 0; j < len / sizeof(struct vring_desc); j++)
> +			vring_unmap_one_split(vq, &indir_desc[j]);
> +
> +		kfree(indir_desc);
> +		vq->desc_state[head].indir_desc = NULL;
> +	} else if (ctx) {
> +		*ctx = vq->desc_state[head].indir_desc;
> +	}
> +}
> +
> +static inline bool more_used_split(const struct vring_virtqueue *vq)
> +{
> +	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> +}
> +
> +static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> +					 unsigned int *len,
> +					 void **ctx)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	void *ret;
> +	unsigned int i;
> +	u16 last_used;
> +
> +	START_USE(vq);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	if (!more_used_split(vq)) {
> +		pr_debug("No more buffers in queue\n");
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	/* Only get used array entries after they have been exposed by host. */
> +	virtio_rmb(vq->weak_barriers);
> +
> +	last_used = (vq->last_used_idx & (vq->vring.num - 1));
> +	i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
> +	*len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
> +
> +	if (unlikely(i >= vq->vring.num)) {
> +		BAD_RING(vq, "id %u out of range\n", i);
> +		return NULL;
> +	}
> +	if (unlikely(!vq->desc_state[i].data)) {
> +		BAD_RING(vq, "id %u is not a head!\n", i);
> +		return NULL;
> +	}
> +
> +	/* detach_buf_split clears data, so grab it now. */
> +	ret = vq->desc_state[i].data;
> +	detach_buf_split(vq, i, ctx);
> +	vq->last_used_idx++;
> +	/* If we expect an interrupt for the next entry, tell host
> +	 * by writing event index and flush out the write before
> +	 * the read in the next get_buf call. */
> +	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> +		virtio_store_mb(vq->weak_barriers,
> +				&vring_used_event(&vq->vring),
> +				cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
> +
> +#ifdef DEBUG
> +	vq->last_add_time_valid = false;
> +#endif
> +
> +	END_USE(vq);
> +	return ret;
> +}
> +
> +static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> +		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> +		if (!vq->event)
> +			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> +	}
> +}
> +
> +static unsigned virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u16 last_used_idx;
> +
> +	START_USE(vq);
> +
> +	/* We optimistically turn back on interrupts, then check if there was
> +	 * more to do. */
> +	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> +	 * either clear the flags bit or point the event index at the next
> +	 * entry. Always do both to keep code simple. */
> +	if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> +		vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> +		if (!vq->event)
> +			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> +	}
> +	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
> +	END_USE(vq);
> +	return last_used_idx;
> +}
> +
> +static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned last_used_idx)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	virtio_mb(vq->weak_barriers);
> +	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
> +}
> +
> +static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u16 bufs;
> +
> +	START_USE(vq);
> +
> +	/* We optimistically turn back on interrupts, then check if there was
> +	 * more to do. */
> +	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> +	 * either clear the flags bit or point the event index at the next
> +	 * entry. Always update the event index to keep code simple. */
> +	if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> +		vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> +		if (!vq->event)
> +			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> +	}
> +	/* TODO: tune this threshold */
> +	bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
> +
> +	virtio_store_mb(vq->weak_barriers,
> +			&vring_used_event(&vq->vring),
> +			cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
> +
> +	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
> +		END_USE(vq);
> +		return false;
> +	}
> +
> +	END_USE(vq);
> +	return true;
> +}
> +
> +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	unsigned int i;
> +	void *buf;
> +
> +	START_USE(vq);
> +
> +	for (i = 0; i < vq->vring.num; i++) {
> +		if (!vq->desc_state[i].data)
> +			continue;
> +		/* detach_buf clears data, so grab it now. */
> +		buf = vq->desc_state[i].data;
> +		detach_buf_split(vq, i, NULL);
> +		vq->avail_idx_shadow--;
> +		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> +		END_USE(vq);
> +		return buf;
> +	}
> +	/* That should have freed everything. */
> +	BUG_ON(vq->vq.num_free != vq->vring.num);
> +
> +	END_USE(vq);
> +	return NULL;
> +}

I think the those copy-and-paste hunks could be avoided and the diff 
should only contains renaming of the function. If yes, it would be very 
welcomed since it requires to compare the changes verbatim otherwise.

> +
> +/*
> + * The layout for the packed ring is a continuous chunk of memory
> + * which looks like this.
> + *
> + * struct vring_packed {
> + *	// The actual descriptors (16 bytes each)
> + *	struct vring_packed_desc desc[num];
> + *
> + *	// Padding to the next align boundary.
> + *	char pad[];
> + *
> + *	// Driver Event Suppression
> + *	struct vring_packed_desc_event driver;
> + *
> + *	// Device Event Suppression
> + *	struct vring_packed_desc_event device;
> + * };
> + */
> +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
> +				     void *p, unsigned long align)
> +{
> +	vr->num = num;
> +	vr->desc = p;
> +	vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
> +		* num + align - 1) & ~(align - 1));

If we choose not to go uapi, maybe we can use ALIGN() macro here?

> +	vr->device = vr->driver + 1;
> +}
> +
> +static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
> +{
> +	return ((sizeof(struct vring_packed_desc) * num + align - 1)
> +		& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> +}
> +
> +static inline int virtqueue_add_packed(struct virtqueue *_vq,
> +				       struct scatterlist *sgs[],
> +				       unsigned int total_sg,
> +				       unsigned int out_sgs,
> +				       unsigned int in_sgs,
> +				       void *data,
> +				       void *ctx,
> +				       gfp_t gfp)
> +{
> +	return -EIO;
> +}
> +
> +static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> +{
> +	return false;
> +}
> +
> +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> +{
> +	return false;
> +}
> +
> +static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> +					  unsigned int *len,
> +					  void **ctx)
> +{
> +	return NULL;
> +}
> +
> +static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
> +{
> +}
> +
> +static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> +{
> +	return 0;
> +}
> +
> +static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> +{
> +	return false;
> +}
> +
> +static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> +{
> +	return false;
> +}
> +
> +static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
> +{
> +	return NULL;
> +}
> +
> +static inline int virtqueue_add(struct virtqueue *_vq,
> +				struct scatterlist *sgs[],
> +				unsigned int total_sg,
> +				unsigned int out_sgs,
> +				unsigned int in_sgs,
> +				void *data,
> +				void *ctx,
> +				gfp_t gfp)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
> +						 in_sgs, data, ctx, gfp) :
> +			    virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
> +						in_sgs, data, ctx, gfp);
> +}
> +
>   /**
>    * virtqueue_add_sgs - expose buffers to other end
>    * @vq: the struct virtqueue we're talking about.
> @@ -551,34 +944,9 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
>   bool virtqueue_kick_prepare(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
> -	u16 new, old;
> -	bool needs_kick;
>   
> -	START_USE(vq);
> -	/* We need to expose available array entries before checking avail
> -	 * event. */
> -	virtio_mb(vq->weak_barriers);
> -
> -	old = vq->avail_idx_shadow - vq->num_added;
> -	new = vq->avail_idx_shadow;
> -	vq->num_added = 0;
> -
> -#ifdef DEBUG
> -	if (vq->last_add_time_valid) {
> -		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> -					      vq->last_add_time)) > 100);
> -	}
> -	vq->last_add_time_valid = false;
> -#endif
> -
> -	if (vq->event) {
> -		needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, vring_avail_event(&vq->vring)),
> -					      new, old);
> -	} else {
> -		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
> -	}
> -	END_USE(vq);
> -	return needs_kick;
> +	return vq->packed ? virtqueue_kick_prepare_packed(_vq) :
> +			    virtqueue_kick_prepare_split(_vq);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
>   
> @@ -626,58 +994,9 @@ bool virtqueue_kick(struct virtqueue *vq)
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_kick);
>   
> -static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> -		       void **ctx)
> -{
> -	unsigned int i, j;
> -	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> -
> -	/* Clear data ptr. */
> -	vq->desc_state[head].data = NULL;
> -
> -	/* Put back on free list: unmap first-level descriptors and find end */
> -	i = head;
> -
> -	while (vq->vring.desc[i].flags & nextflag) {
> -		vring_unmap_one(vq, &vq->vring.desc[i]);
> -		i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
> -		vq->vq.num_free++;
> -	}
> -
> -	vring_unmap_one(vq, &vq->vring.desc[i]);
> -	vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
> -	vq->free_head = head;
> -
> -	/* Plus final descriptor */
> -	vq->vq.num_free++;
> -
> -	if (vq->indirect) {
> -		struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
> -		u32 len;
> -
> -		/* Free the indirect table, if any, now that it's unmapped. */
> -		if (!indir_desc)
> -			return;
> -
> -		len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
> -
> -		BUG_ON(!(vq->vring.desc[head].flags &
> -			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> -		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> -
> -		for (j = 0; j < len / sizeof(struct vring_desc); j++)
> -			vring_unmap_one(vq, &indir_desc[j]);
> -
> -		kfree(indir_desc);
> -		vq->desc_state[head].indir_desc = NULL;
> -	} else if (ctx) {
> -		*ctx = vq->desc_state[head].indir_desc;
> -	}
> -}
> -
>   static inline bool more_used(const struct vring_virtqueue *vq)
>   {
> -	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> +	return vq->packed ? more_used_packed(vq) : more_used_split(vq);
>   }
>   
>   /**
> @@ -700,57 +1019,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>   			    void **ctx)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
> -	void *ret;
> -	unsigned int i;
> -	u16 last_used;
>   
> -	START_USE(vq);
> -
> -	if (unlikely(vq->broken)) {
> -		END_USE(vq);
> -		return NULL;
> -	}
> -
> -	if (!more_used(vq)) {
> -		pr_debug("No more buffers in queue\n");
> -		END_USE(vq);
> -		return NULL;
> -	}
> -
> -	/* Only get used array entries after they have been exposed by host. */
> -	virtio_rmb(vq->weak_barriers);
> -
> -	last_used = (vq->last_used_idx & (vq->vring.num - 1));
> -	i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
> -	*len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
> -
> -	if (unlikely(i >= vq->vring.num)) {
> -		BAD_RING(vq, "id %u out of range\n", i);
> -		return NULL;
> -	}
> -	if (unlikely(!vq->desc_state[i].data)) {
> -		BAD_RING(vq, "id %u is not a head!\n", i);
> -		return NULL;
> -	}
> -
> -	/* detach_buf clears data, so grab it now. */
> -	ret = vq->desc_state[i].data;
> -	detach_buf(vq, i, ctx);
> -	vq->last_used_idx++;
> -	/* If we expect an interrupt for the next entry, tell host
> -	 * by writing event index and flush out the write before
> -	 * the read in the next get_buf call. */
> -	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> -		virtio_store_mb(vq->weak_barriers,
> -				&vring_used_event(&vq->vring),
> -				cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
> -
> -#ifdef DEBUG
> -	vq->last_add_time_valid = false;
> -#endif
> -
> -	END_USE(vq);
> -	return ret;
> +	return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> +			    virtqueue_get_buf_ctx_split(_vq, len, ctx);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>   
> @@ -772,12 +1043,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> -	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> -			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> -	}
> -
> +	if (vq->packed)
> +		virtqueue_disable_cb_packed(_vq);
> +	else
> +		virtqueue_disable_cb_split(_vq);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>   
> @@ -796,23 +1065,9 @@ EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>   unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
> -	u16 last_used_idx;
>   
> -	START_USE(vq);
> -
> -	/* We optimistically turn back on interrupts, then check if there was
> -	 * more to do. */
> -	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> -	 * either clear the flags bit or point the event index at the next
> -	 * entry. Always do both to keep code simple. */
> -	if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> -		vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> -			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> -	}
> -	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
> -	END_USE(vq);
> -	return last_used_idx;
> +	return vq->packed ? virtqueue_enable_cb_prepare_packed(_vq) :
> +			    virtqueue_enable_cb_prepare_split(_vq);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>   
> @@ -829,8 +1084,8 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> -	virtio_mb(vq->weak_barriers);
> -	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
> +	return vq->packed ? virtqueue_poll_packed(_vq, last_used_idx) :
> +			    virtqueue_poll_split(_vq, last_used_idx);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_poll);
>   
> @@ -868,34 +1123,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
>   bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
> -	u16 bufs;
>   
> -	START_USE(vq);
> -
> -	/* We optimistically turn back on interrupts, then check if there was
> -	 * more to do. */
> -	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> -	 * either clear the flags bit or point the event index at the next
> -	 * entry. Always update the event index to keep code simple. */
> -	if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> -		vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> -			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> -	}
> -	/* TODO: tune this threshold */
> -	bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
> -
> -	virtio_store_mb(vq->weak_barriers,
> -			&vring_used_event(&vq->vring),
> -			cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
> -
> -	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
> -		END_USE(vq);
> -		return false;
> -	}
> -
> -	END_USE(vq);
> -	return true;
> +	return vq->packed ? virtqueue_enable_cb_delayed_packed(_vq) :
> +			    virtqueue_enable_cb_delayed_split(_vq);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
>   
> @@ -910,27 +1140,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
>   void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
> -	unsigned int i;
> -	void *buf;
>   
> -	START_USE(vq);
> -
> -	for (i = 0; i < vq->vring.num; i++) {
> -		if (!vq->desc_state[i].data)
> -			continue;
> -		/* detach_buf clears data, so grab it now. */
> -		buf = vq->desc_state[i].data;
> -		detach_buf(vq, i, NULL);
> -		vq->avail_idx_shadow--;
> -		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> -		END_USE(vq);
> -		return buf;
> -	}
> -	/* That should have freed everything. */
> -	BUG_ON(vq->vq.num_free != vq->vring.num);
> -
> -	END_USE(vq);
> -	return NULL;
> +	return vq->packed ? virtqueue_detach_unused_buf_packed(_vq) :
> +			    virtqueue_detach_unused_buf_split(_vq);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>   
> @@ -955,7 +1167,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>   EXPORT_SYMBOL_GPL(vring_interrupt);
>   
>   struct virtqueue *__vring_new_virtqueue(unsigned int index,
> -					struct vring vring,
> +					union vring_union vring,
> +					bool packed,
>   					struct virtio_device *vdev,
>   					bool weak_barriers,
>   					bool context,
> @@ -963,19 +1176,22 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   					void (*callback)(struct virtqueue *),
>   					const char *name)
>   {
> -	unsigned int i;
>   	struct vring_virtqueue *vq;
> +	unsigned int num, i;
> +	size_t size;
>   
> -	vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
> -		     GFP_KERNEL);
> +	num = packed ? vring.vring_packed.num : vring.vring_split.num;
> +	size = packed ? num * sizeof(struct vring_desc_state_packed) :
> +			num * sizeof(struct vring_desc_state);
> +
> +	vq = kmalloc(sizeof(*vq) + size, GFP_KERNEL);
>   	if (!vq)
>   		return NULL;
>   
> -	vq->vring = vring;
>   	vq->vq.callback = callback;
>   	vq->vq.vdev = vdev;
>   	vq->vq.name = name;
> -	vq->vq.num_free = vring.num;
> +	vq->vq.num_free = num;
>   	vq->vq.index = index;
>   	vq->we_own_ring = false;
>   	vq->queue_dma_addr = 0;
> @@ -984,9 +1200,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   	vq->weak_barriers = weak_barriers;
>   	vq->broken = false;
>   	vq->last_used_idx = 0;
> -	vq->avail_flags_shadow = 0;
> -	vq->avail_idx_shadow = 0;
>   	vq->num_added = 0;
> +	vq->packed = packed;
>   	list_add_tail(&vq->vq.list, &vdev->vqs);
>   #ifdef DEBUG
>   	vq->in_use = false;
> @@ -997,19 +1212,48 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   		!context;
>   	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
>   
> +	if (vq->packed) {
> +		vq->vring_packed = vring.vring_packed;
> +		vq->next_avail_idx = 0;
> +		vq->avail_wrap_counter = 1;
> +		vq->used_wrap_counter = 1;
> +		vq->event_flags_shadow = 0;
> +
> +		memset(vq->desc_state_packed, 0,
> +			num * sizeof(struct vring_desc_state_packed));
> +
> x
> +		vq->free_head = 0;
> +		for (i = 0; i < num-1; i++)
> +			vq->desc_state_packed[i].next = i + 1;
> +	} else {
> +		vq->vring = vring.vring_split;
> +		vq->avail_flags_shadow = 0;
> +		vq->avail_idx_shadow = 0;
> +
> +		/* Put everything in free lists. */
> +		vq->free_head = 0;
> +		for (i = 0; i < num-1; i++)
> +			vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> +
> +		memset(vq->desc_state, 0,
> +			num * sizeof(struct vring_desc_state));
> +	}
> +
>   	/* No callback?  Tell other side not to bother us. */
>   	if (!callback) {
> -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> -			vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
> +		if (packed) {
> +			vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
> +			vq->vring_packed.driver->flags = cpu_to_virtio16(vdev,
> +						vq->event_flags_shadow);
> +		} else {
> +			vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> +			if (!vq->event)
> +				vq->vring.avail->flags = cpu_to_virtio16(vdev,
> +						vq->avail_flags_shadow);
> +		}
>   	}
>   
> -	/* Put everything in free lists. */
> -	vq->free_head = 0;
> -	for (i = 0; i < vring.num-1; i++)
> -		vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> -	memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
> -
>   	return &vq->vq;
>   }
>   EXPORT_SYMBOL_GPL(__vring_new_virtqueue);
> @@ -1056,6 +1300,12 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
>   	}
>   }
>   
> +static inline int
> +__vring_size(unsigned int num, unsigned long align, bool packed)
> +{
> +	return packed ? vring_size_packed(num, align) : vring_size(num, align);
> +}
> +
>   struct virtqueue *vring_create_virtqueue(
>   	unsigned int index,
>   	unsigned int num,
> @@ -1072,7 +1322,8 @@ struct virtqueue *vring_create_virtqueue(
>   	void *queue = NULL;
>   	dma_addr_t dma_addr;
>   	size_t queue_size_in_bytes;
> -	struct vring vring;
> +	union vring_union vring;
> +	bool packed;
>   
>   	/* We assume num is a power of 2. */
>   	if (num & (num - 1)) {
> @@ -1080,9 +1331,13 @@ struct virtqueue *vring_create_virtqueue(
>   		return NULL;
>   	}
>   
> +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> +
>   	/* TODO: allocate each queue chunk individually */
> -	for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> +	for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
> +			num /= 2) {
> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> +							     packed),
>   					  &dma_addr,
>   					  GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
>   		if (queue)
> @@ -1094,17 +1349,21 @@ struct virtqueue *vring_create_virtqueue(
>   
>   	if (!queue) {
>   		/* Try to get a single page. You are my only hope! */
> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> +							     packed),
>   					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
>   	}
>   	if (!queue)
>   		return NULL;
>   
> -	queue_size_in_bytes = vring_size(num, vring_align);
> -	vring_init(&vring, num, queue, vring_align);
> +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> +	if (packed)
> +		vring_init_packed(&vring.vring_packed, num, queue, vring_align);
> +	else
> +		vring_init(&vring.vring_split, num, queue, vring_align);
>   
> -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> -				   notify, callback, name);
> +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> +				   context, notify, callback, name);
>   	if (!vq) {
>   		vring_free_queue(vdev, queue_size_in_bytes, queue,
>   				 dma_addr);
> @@ -1130,10 +1389,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
>   				      void (*callback)(struct virtqueue *vq),
>   				      const char *name)
>   {
> -	struct vring vring;
> -	vring_init(&vring, num, pages, vring_align);
> -	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> -				     notify, callback, name);
> +	union vring_union vring;
> +	bool packed;
> +
> +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> +	if (packed)
> +		vring_init_packed(&vring.vring_packed, num, pages, vring_align);
> +	else
> +		vring_init(&vring.vring_split, num, pages, vring_align);
> +
> +	return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> +				     context, notify, callback, name);
>   }
>   EXPORT_SYMBOL_GPL(vring_new_virtqueue);
>   
> @@ -1143,7 +1409,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>   
>   	if (vq->we_own_ring) {
>   		vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> -				 vq->vring.desc, vq->queue_dma_addr);
> +				 vq->packed ? (void *)vq->vring_packed.desc :
> +					      (void *)vq->vring.desc,
> +				 vq->queue_dma_addr);
>   	}
>   	list_del(&_vq->list);
>   	kfree(vq);
> @@ -1185,7 +1453,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
>   
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> -	return vq->vring.num;
> +	return vq->packed ? vq->vring_packed.num : vq->vring.num;
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
>   
> @@ -1228,6 +1496,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>   
>   	BUG_ON(!vq->we_own_ring);
>   
> +	if (vq->packed)
> +		return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
> +				(char *)vq->vring_packed.desc);
> +
>   	return vq->queue_dma_addr +
>   		((char *)vq->vring.avail - (char *)vq->vring.desc);
>   }
> @@ -1239,11 +1511,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
>   
>   	BUG_ON(!vq->we_own_ring);
>   
> +	if (vq->packed)
> +		return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
> +				(char *)vq->vring_packed.desc);
> +
>   	return vq->queue_dma_addr +
>   		((char *)vq->vring.used - (char *)vq->vring.desc);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
>   
> +/* Only available for split ring */
>   const struct vring *virtqueue_get_vring(struct virtqueue *vq)
>   {

A possible issue with this is:

After commit d4674240f31f8c4289abba07d64291c6ddce51bc ("KVM: s390: 
virtio-ccw revision 1 SET_VQ"). CCW tries to use 
virtqueue_get_avail()/virtqueue_get_used(). Looks like a bug either here 
or ccw code.

Thanks

>   	return &to_vvq(vq)->vring;
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index bbf32524ab27..a0075894ad16 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
>   struct virtio_device;
>   struct virtqueue;
>   
> +union vring_union {
> +	struct vring vring_split;
> +	struct vring_packed vring_packed;
> +};
> +
>   /*
>    * Creates a virtqueue and allocates the descriptor ring.  If
>    * may_reduce_num is set, then this may allocate a smaller ring than
> @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
>   
>   /* Creates a virtqueue with a custom layout. */
>   struct virtqueue *__vring_new_virtqueue(unsigned int index,
> -					struct vring vring,
> +					union vring_union vring,
> +					bool packed,
>   					struct virtio_device *vdev,
>   					bool weak_barriers,
>   					bool ctx,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v5 3/5] virtio_ring: add packed ring support
  2018-05-22  8:16 ` [RFC v5 3/5] virtio_ring: add packed ring support Tiwei Bie
@ 2018-05-29  3:18   ` Jason Wang
  2018-05-29  5:11     ` Tiwei Bie
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2018-05-29  3:18 UTC (permalink / raw)
  To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu, jfreimann



On 2018年05月22日 16:16, Tiwei Bie wrote:
> This commit introduces the support (without EVENT_IDX) for
> packed ring.
>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 474 ++++++++++++++++++++++++++++++++++-
>   1 file changed, 468 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index f5ef5f42a7cf..eb9fd5207a68 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -62,6 +62,12 @@ struct vring_desc_state {
>   };
>   
>   struct vring_desc_state_packed {
> +	void *data;			/* Data for callback. */
> +	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> +	int num;			/* Descriptor list length. */
> +	dma_addr_t addr;		/* Buffer DMA addr. */
> +	u32 len;			/* Buffer length. */
> +	u16 flags;			/* Descriptor flags. */
>   	int next;			/* The next desc state. */
>   };
>   
> @@ -758,6 +764,72 @@ static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
>   		& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
>   }
>   
> +static void vring_unmap_state_packed(const struct vring_virtqueue *vq,
> +				     struct vring_desc_state_packed *state)
> +{
> +	u16 flags;
> +
> +	if (!vring_use_dma_api(vq->vq.vdev))
> +		return;
> +
> +	flags = state->flags;
> +
> +	if (flags & VRING_DESC_F_INDIRECT) {
> +		dma_unmap_single(vring_dma_dev(vq),
> +				 state->addr, state->len,
> +				 (flags & VRING_DESC_F_WRITE) ?
> +				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +	} else {
> +		dma_unmap_page(vring_dma_dev(vq),
> +			       state->addr, state->len,
> +			       (flags & VRING_DESC_F_WRITE) ?
> +			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +	}
> +}
> +
> +static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> +				   struct vring_packed_desc *desc)
> +{
> +	u16 flags;
> +
> +	if (!vring_use_dma_api(vq->vq.vdev))
> +		return;
> +
> +	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +
> +	if (flags & VRING_DESC_F_INDIRECT) {
> +		dma_unmap_single(vring_dma_dev(vq),
> +				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
> +				 virtio32_to_cpu(vq->vq.vdev, desc->len),
> +				 (flags & VRING_DESC_F_WRITE) ?
> +				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +	} else {
> +		dma_unmap_page(vring_dma_dev(vq),
> +			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
> +			       virtio32_to_cpu(vq->vq.vdev, desc->len),
> +			       (flags & VRING_DESC_F_WRITE) ?
> +			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +	}
> +}
> +
> +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> +						       unsigned int total_sg,
> +						       gfp_t gfp)
> +{
> +	struct vring_packed_desc *desc;
> +
> +	/*
> +	 * We require lowmem mappings for the descriptors because
> +	 * otherwise virt_to_phys will give us bogus addresses in the
> +	 * virtqueue.
> +	 */
> +	gfp &= ~__GFP_HIGHMEM;
> +
> +	desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
> +
> +	return desc;
> +}
> +
>   static inline int virtqueue_add_packed(struct virtqueue *_vq,
>   				       struct scatterlist *sgs[],
>   				       unsigned int total_sg,
> @@ -767,47 +839,437 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>   				       void *ctx,
>   				       gfp_t gfp)
>   {
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	struct vring_packed_desc *desc;
> +	struct scatterlist *sg;
> +	unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> +	__virtio16 uninitialized_var(head_flags), flags;
> +	u16 head, avail_wrap_counter, id, cur;
> +	bool indirect;
> +
> +	START_USE(vq);
> +
> +	BUG_ON(data == NULL);
> +	BUG_ON(ctx && vq->indirect);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return -EIO;
> +	}
> +
> +#ifdef DEBUG
> +	{
> +		ktime_t now = ktime_get();
> +
> +		/* No kick or get, with .1 second between?  Warn. */
> +		if (vq->last_add_time_valid)
> +			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> +					    > 100);
> +		vq->last_add_time = now;
> +		vq->last_add_time_valid = true;
> +	}
> +#endif
> +
> +	BUG_ON(total_sg == 0);
> +
> +	head = vq->next_avail_idx;
> +	avail_wrap_counter = vq->avail_wrap_counter;
> +
> +	if (virtqueue_use_indirect(_vq, total_sg))
> +		desc = alloc_indirect_packed(_vq, total_sg, gfp);
> +	else {
> +		desc = NULL;
> +		WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
> +	}
> +
> +	if (desc) {
> +		/* Use a single buffer which doesn't continue */
> +		indirect = true;
> +		/* Set up rest to use this indirect table. */
> +		i = 0;
> +		descs_used = 1;
> +	} else {
> +		indirect = false;
> +		desc = vq->vring_packed.desc;
> +		i = head;
> +		descs_used = total_sg;
> +	}
> +
> +	if (vq->vq.num_free < descs_used) {
> +		pr_debug("Can't add buf len %i - avail = %i\n",
> +			 descs_used, vq->vq.num_free);
> +		/* FIXME: for historical reasons, we force a notify here if
> +		 * there are outgoing parts to the buffer.  Presumably the
> +		 * host should service the ring ASAP. */
> +		if (out_sgs)
> +			vq->notify(&vq->vq);
> +		if (indirect)
> +			kfree(desc);
> +		END_USE(vq);
> +		return -ENOSPC;
> +	}
> +
> +	id = vq->free_head;
> +	BUG_ON(id == vq->vring_packed.num);
> +
> +	cur = id;
> +	for (n = 0; n < out_sgs + in_sgs; n++) {
> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> +					       DMA_TO_DEVICE : DMA_FROM_DEVICE);
> +			if (vring_mapping_error(vq, addr))
> +				goto unmap_release;
> +
> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> +				  (n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
> +				  VRING_DESC_F_AVAIL(vq->avail_wrap_counter) |
> +				  VRING_DESC_F_USED(!vq->avail_wrap_counter));
> +			if (!indirect && i == head)
> +				head_flags = flags;
> +			else
> +				desc[i].flags = flags;
> +
> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> +			i++;
> +			if (!indirect) {
> +				vq->desc_state_packed[cur].addr = addr;
> +				vq->desc_state_packed[cur].len = sg->length;
> +				vq->desc_state_packed[cur].flags =
> +					virtio16_to_cpu(_vq->vdev, flags);
> +				cur = vq->desc_state_packed[cur].next;
> +
> +				if (i >= vq->vring_packed.num) {
> +					i = 0;
> +					vq->avail_wrap_counter ^= 1;
> +				}
> +			}
> +		}
> +	}
> +
> +	prev = (i > 0 ? i : vq->vring_packed.num) - 1;
> +	desc[prev].id = cpu_to_virtio16(_vq->vdev, id);
> +
> +	/* Last one doesn't continue. */
> +	if (total_sg == 1)
> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> +	else
> +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev,
> +						~VRING_DESC_F_NEXT);
> +
> +	if (indirect) {
> +		/* Now that the indirect table is filled in, map it. */
> +		dma_addr_t addr = vring_map_single(
> +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> +			DMA_TO_DEVICE);
> +		if (vring_mapping_error(vq, addr))
> +			goto unmap_release;
> +
> +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> +				       VRING_DESC_F_AVAIL(avail_wrap_counter) |
> +				       VRING_DESC_F_USED(!avail_wrap_counter));
> +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev,
> +								   addr);
> +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> +				total_sg * sizeof(struct vring_packed_desc));
> +		vq->vring_packed.desc[head].id = cpu_to_virtio16(_vq->vdev, id);
> +
> +		vq->desc_state_packed[id].addr = addr;
> +		vq->desc_state_packed[id].len = total_sg *
> +				sizeof(struct vring_packed_desc);
> +		vq->desc_state_packed[id].flags =
> +				virtio16_to_cpu(_vq->vdev, head_flags);
> +	}
> +
> +	/* We're using some buffers from the free list. */
> +	vq->vq.num_free -= descs_used;
> +
> +	/* Update free pointer */
> +	if (indirect) {
> +		n = head + 1;
> +		if (n >= vq->vring_packed.num) {
> +			n = 0;
> +			vq->avail_wrap_counter ^= 1;
> +		}
> +		vq->next_avail_idx = n;
> +		vq->free_head = vq->desc_state_packed[id].next;
> +	} else {
> +		vq->next_avail_idx = i;
> +		vq->free_head = cur;
> +	}
> +
> +	/* Store token and indirect buffer state. */
> +	vq->desc_state_packed[id].num = descs_used;
> +	vq->desc_state_packed[id].data = data;
> +	if (indirect)
> +		vq->desc_state_packed[id].indir_desc = desc;
> +	else
> +		vq->desc_state_packed[id].indir_desc = ctx;
> +
> +	/* A driver MUST NOT make the first descriptor in the list
> +	 * available before all subsequent descriptors comprising
> +	 * the list are made available. */
> +	virtio_wmb(vq->weak_barriers);
> +	vq->vring_packed.desc[head].flags = head_flags;
> +	vq->num_added += descs_used;
> +
> +	pr_debug("Added buffer head %i to %p\n", head, vq);
> +	END_USE(vq);
> +
> +	return 0;
> +
> +unmap_release:
> +	err_idx = i;
> +	i = head;
> +
> +	for (n = 0; n < total_sg; n++) {
> +		if (i == err_idx)
> +			break;
> +		vring_unmap_desc_packed(vq, &desc[i]);
> +		i++;
> +		if (!indirect && i >= vq->vring_packed.num)
> +			i = 0;
> +	}
> +
> +	vq->avail_wrap_counter = avail_wrap_counter;
> +
> +	if (indirect)
> +		kfree(desc);
> +
> +	END_USE(vq);
>   	return -EIO;
>   }
>   
>   static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
>   {
> -	return false;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u16 flags;
> +	bool needs_kick;
> +	u32 snapshot;
> +
> +	START_USE(vq);
> +	/* We need to expose the new flags value before checking notification
> +	 * suppressions. */
> +	virtio_mb(vq->weak_barriers);
> +
> +	snapshot = *(u32 *)vq->vring_packed.device;
> +	flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
> +
> +#ifdef DEBUG
> +	if (vq->last_add_time_valid) {
> +		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> +					      vq->last_add_time)) > 100);
> +	}
> +	vq->last_add_time_valid = false;
> +#endif
> +
> +	needs_kick = (flags != VRING_EVENT_F_DISABLE);
> +	END_USE(vq);
> +	return needs_kick;
> +}
> +
> +static void detach_buf_packed(struct vring_virtqueue *vq,
> +			      unsigned int id, void **ctx)
> +{
> +	struct vring_desc_state_packed *state;
> +	struct vring_packed_desc *desc;
> +	unsigned int i;
> +	int *next;
> +
> +	/* Clear data ptr. */
> +	vq->desc_state_packed[id].data = NULL;
> +
> +	next = &id;
> +	for (i = 0; i < vq->desc_state_packed[id].num; i++) {
> +		state = &vq->desc_state_packed[*next];
> +		vring_unmap_state_packed(vq, state);
> +		next = &vq->desc_state_packed[*next].next;

Have a short discussion with Michael. It looks like the only thing we 
care is DMA address and length here.

So a thought is to avoid using desc_state_packed() is 
vring_use_dma_api() is false? Probably need another id allocator or just 
use desc_state_packed for free id list.

> +	}
> +
> +	vq->vq.num_free += vq->desc_state_packed[id].num;
> +	*next = vq->free_head;

Using pointer seems not elegant and unnecessary here. You can just track 
next in integer I think?

> +	vq->free_head = id;
> +
> +	if (vq->indirect) {
> +		u32 len;
> +
> +		/* Free the indirect table, if any, now that it's unmapped. */
> +		desc = vq->desc_state_packed[id].indir_desc;
> +		if (!desc)
> +			return;
> +
> +		len = vq->desc_state_packed[id].len;
> +		for (i = 0; i < len / sizeof(struct vring_packed_desc); i++)
> +			vring_unmap_desc_packed(vq, &desc[i]);
> +
> +		kfree(desc);
> +		vq->desc_state_packed[id].indir_desc = NULL;
> +	} else if (ctx) {
> +		*ctx = vq->desc_state_packed[id].indir_desc;
> +	}
>   }
>   
>   static inline bool more_used_packed(const struct vring_virtqueue *vq)
>   {
> -	return false;
> +	u16 last_used, flags;
> +	u8 avail, used;
> +
> +	last_used = vq->last_used_idx;
> +	flags = virtio16_to_cpu(vq->vq.vdev,
> +				vq->vring_packed.desc[last_used].flags);
> +	avail = !!(flags & VRING_DESC_F_AVAIL(1));
> +	used = !!(flags & VRING_DESC_F_USED(1));
> +
> +	return avail == used && used == vq->used_wrap_counter;

Spec does not check avail == used? So there's probably a bug in either 
of the two places.

In what condition that avail != used but used == vq_used_wrap_counter? A 
buggy device or device set the two bits in two writes?

>   }
>   
>   static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>   					  unsigned int *len,
>   					  void **ctx)
>   {
> -	return NULL;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u16 last_used, id;
> +	void *ret;
> +
> +	START_USE(vq);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	if (!more_used_packed(vq)) {
> +		pr_debug("No more buffers in queue\n");
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	/* Only get used elements after they have been exposed by host. */
> +	virtio_rmb(vq->weak_barriers);
> +
> +	last_used = vq->last_used_idx;
> +	id = virtio16_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
> +	*len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
> +
> +	if (unlikely(id >= vq->vring_packed.num)) {
> +		BAD_RING(vq, "id %u out of range\n", id);
> +		return NULL;
> +	}
> +	if (unlikely(!vq->desc_state_packed[id].data)) {
> +		BAD_RING(vq, "id %u is not a head!\n", id);
> +		return NULL;
> +	}
> +
> +	vq->last_used_idx += vq->desc_state_packed[id].num;
> +	if (vq->last_used_idx >= vq->vring_packed.num) {
> +		vq->last_used_idx -= vq->vring_packed.num;
> +		vq->used_wrap_counter ^= 1;
> +	}
> +
> +	/* detach_buf_packed clears data, so grab it now. */
> +	ret = vq->desc_state_packed[id].data;
> +	detach_buf_packed(vq, id, ctx);
> +
> +#ifdef DEBUG
> +	vq->last_add_time_valid = false;
> +#endif
> +
> +	END_USE(vq);
> +	return ret;
>   }
>   
>   static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
>   {
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	if (vq->event_flags_shadow != VRING_EVENT_F_DISABLE) {
> +		vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
> +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->event_flags_shadow);
> +	}
>   }
>   
>   static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
>   {
> -	return 0;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	START_USE(vq);
> +
> +	/* We optimistically turn back on interrupts, then check if there was
> +	 * more to do. */
> +
> +	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> +		virtio_wmb(vq->weak_barriers);

Missing comments for the barrier.

> +		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->event_flags_shadow);
> +	}
> +
> +	END_USE(vq);
> +	return vq->last_used_idx;
>   }
>   
>   static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
>   {
> -	return false;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u8 avail, used;
> +	u16 flags;
> +
> +	virtio_mb(vq->weak_barriers);
> +	flags = virtio16_to_cpu(vq->vq.vdev,
> +			vq->vring_packed.desc[last_used_idx].flags);
> +	avail = !!(flags & VRING_DESC_F_AVAIL(1));
> +	used = !!(flags & VRING_DESC_F_USED(1));
> +
> +	return avail == used && used == vq->used_wrap_counter;
>   }
>   
>   static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>   {
> -	return false;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	START_USE(vq);
> +
> +	/* We optimistically turn back on interrupts, then check if there was
> +	 * more to do. */
> +
> +	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> +		virtio_wmb(vq->weak_barriers);


Need comments for the barrier.

> +		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->event_flags_shadow);
> +	}
> +
> +	if (more_used_packed(vq)) {
> +		END_USE(vq);
> +		return false;
> +	}
> +
> +	END_USE(vq);
> +	return true;
>   }
>   
>   static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
>   {
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	unsigned int i;
> +	void *buf;
> +
> +	START_USE(vq);
> +
> +	for (i = 0; i < vq->vring_packed.num; i++) {
> +		if (!vq->desc_state_packed[i].data)
> +			continue;
> +		/* detach_buf clears data, so grab it now. */
> +		buf = vq->desc_state_packed[i].data;
> +		detach_buf_packed(vq, i, NULL);
> +		END_USE(vq);
> +		return buf;
> +	}
> +	/* That should have freed everything. */
> +	BUG_ON(vq->vq.num_free != vq->vring_packed.num);
> +
> +	END_USE(vq);
>   	return NULL;
>   }
>   

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v5 3/5] virtio_ring: add packed ring support
  2018-05-29  3:18   ` Jason Wang
@ 2018-05-29  5:11     ` Tiwei Bie
  2018-05-29  6:16       ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Tiwei Bie @ 2018-05-29  5:11 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann

On Tue, May 29, 2018 at 11:18:57AM +0800, Jason Wang wrote:
> On 2018年05月22日 16:16, Tiwei Bie wrote:
[...]
> > +static void detach_buf_packed(struct vring_virtqueue *vq,
> > +			      unsigned int id, void **ctx)
> > +{
> > +	struct vring_desc_state_packed *state;
> > +	struct vring_packed_desc *desc;
> > +	unsigned int i;
> > +	int *next;
> > +
> > +	/* Clear data ptr. */
> > +	vq->desc_state_packed[id].data = NULL;
> > +
> > +	next = &id;
> > +	for (i = 0; i < vq->desc_state_packed[id].num; i++) {
> > +		state = &vq->desc_state_packed[*next];
> > +		vring_unmap_state_packed(vq, state);
> > +		next = &vq->desc_state_packed[*next].next;
> 
> Have a short discussion with Michael. It looks like the only thing we care
> is DMA address and length here.

The length tracked by desc_state_packed[] is also necessary
when INDIRECT is used and the buffers are writable.

> 
> So a thought is to avoid using desc_state_packed() is vring_use_dma_api() is
> false? Probably need another id allocator or just use desc_state_packed for
> free id list.

I think it's a good suggestion. Thanks!

I won't track the addr/len/flags for non-indirect descs
when vring_use_dma_api() returns false.

> 
> > +	}
> > +
> > +	vq->vq.num_free += vq->desc_state_packed[id].num;
> > +	*next = vq->free_head;
> 
> Using pointer seems not elegant and unnecessary here. You can just track
> next in integer I think?

It's possible to use integer, but will need one more var
to track `prev` (desc_state_packed[prev].next == next in
this case).

> 
> > +	vq->free_head = id;
> > +
> > +	if (vq->indirect) {
> > +		u32 len;
> > +
> > +		/* Free the indirect table, if any, now that it's unmapped. */
> > +		desc = vq->desc_state_packed[id].indir_desc;
> > +		if (!desc)
> > +			return;
> > +
> > +		len = vq->desc_state_packed[id].len;
> > +		for (i = 0; i < len / sizeof(struct vring_packed_desc); i++)
> > +			vring_unmap_desc_packed(vq, &desc[i]);
> > +
> > +		kfree(desc);
> > +		vq->desc_state_packed[id].indir_desc = NULL;
> > +	} else if (ctx) {
> > +		*ctx = vq->desc_state_packed[id].indir_desc;
> > +	}
> >   }
> >   static inline bool more_used_packed(const struct vring_virtqueue *vq)
> >   {
> > -	return false;
> > +	u16 last_used, flags;
> > +	u8 avail, used;
> > +
> > +	last_used = vq->last_used_idx;
> > +	flags = virtio16_to_cpu(vq->vq.vdev,
> > +				vq->vring_packed.desc[last_used].flags);
> > +	avail = !!(flags & VRING_DESC_F_AVAIL(1));
> > +	used = !!(flags & VRING_DESC_F_USED(1));
> > +
> > +	return avail == used && used == vq->used_wrap_counter;
> 
> Spec does not check avail == used? So there's probably a bug in either of
> the two places.
> 
> In what condition that avail != used but used == vq_used_wrap_counter? A
> buggy device or device set the two bits in two writes?

Currently, spec doesn't forbid devices to set the status
bits as: avail != used but used == vq_used_wrap_counter.
So I think driver cannot assume this won't happen.

> 
> >   }
[...]
> >   static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> >   {
> > -	return 0;
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	START_USE(vq);
> > +
> > +	/* We optimistically turn back on interrupts, then check if there was
> > +	 * more to do. */
> > +
> > +	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > +		virtio_wmb(vq->weak_barriers);
> 
> Missing comments for the barrier.

Will add some comments.

> 
> > +		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > +							vq->event_flags_shadow);
> > +	}
> > +
> > +	END_USE(vq);
> > +	return vq->last_used_idx;
> >   }
> >   static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> >   {
> > -	return false;
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +	u8 avail, used;
> > +	u16 flags;
> > +
> > +	virtio_mb(vq->weak_barriers);
> > +	flags = virtio16_to_cpu(vq->vq.vdev,
> > +			vq->vring_packed.desc[last_used_idx].flags);
> > +	avail = !!(flags & VRING_DESC_F_AVAIL(1));
> > +	used = !!(flags & VRING_DESC_F_USED(1));
> > +
> > +	return avail == used && used == vq->used_wrap_counter;
> >   }
> >   static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> >   {
> > -	return false;
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	START_USE(vq);
> > +
> > +	/* We optimistically turn back on interrupts, then check if there was
> > +	 * more to do. */
> > +
> > +	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > +		virtio_wmb(vq->weak_barriers);
> 
> Need comments for the barrier.

Will add some comments.

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v5 2/5] virtio_ring: support creating packed ring
  2018-05-29  2:49   ` Jason Wang
@ 2018-05-29  5:24     ` Tiwei Bie
  2018-05-29  6:13       ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Tiwei Bie @ 2018-05-29  5:24 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann,
	Cornelia Huck

On Tue, May 29, 2018 at 10:49:11AM +0800, Jason Wang wrote:
> On 2018年05月22日 16:16, Tiwei Bie wrote:
> > This commit introduces the support for creating packed ring.
> > All split ring specific functions are added _split suffix.
> > Some necessary stubs for packed ring are also added.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> >   drivers/virtio/virtio_ring.c | 801 +++++++++++++++++++++++------------
> >   include/linux/virtio_ring.h  |   8 +-
> >   2 files changed, 546 insertions(+), 263 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 71458f493cf8..f5ef5f42a7cf 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -61,11 +61,15 @@ struct vring_desc_state {
> >   	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
> >   };
> > +struct vring_desc_state_packed {
> > +	int next;			/* The next desc state. */
> > +};
> > +
> >   struct vring_virtqueue {
> >   	struct virtqueue vq;
> > -	/* Actual memory layout for this queue */
> > -	struct vring vring;
> > +	/* Is this a packed ring? */
> > +	bool packed;
> >   	/* Can we use weak barriers? */
> >   	bool weak_barriers;
> > @@ -87,11 +91,39 @@ struct vring_virtqueue {
> >   	/* Last used index we've seen. */
> >   	u16 last_used_idx;
> > -	/* Last written value to avail->flags */
> > -	u16 avail_flags_shadow;
> > +	union {
> > +		/* Available for split ring */
> > +		struct {
> > +			/* Actual memory layout for this queue. */
> > +			struct vring vring;
> > -	/* Last written value to avail->idx in guest byte order */
> > -	u16 avail_idx_shadow;
> > +			/* Last written value to avail->flags */
> > +			u16 avail_flags_shadow;
> > +
> > +			/* Last written value to avail->idx in
> > +			 * guest byte order. */
> > +			u16 avail_idx_shadow;
> > +		};
> > +
> > +		/* Available for packed ring */
> > +		struct {
> > +			/* Actual memory layout for this queue. */
> > +			struct vring_packed vring_packed;
> > +
> > +			/* Driver ring wrap counter. */
> > +			u8 avail_wrap_counter;
> > +
> > +			/* Device ring wrap counter. */
> > +			u8 used_wrap_counter;
> 
> How about just use boolean?

I can change it to bool if you want.

> 
[...]
> > -static int vring_mapping_error(const struct vring_virtqueue *vq,
> > -			       dma_addr_t addr)
> > -{
> > -	if (!vring_use_dma_api(vq->vq.vdev))
> > -		return 0;
> > -
> > -	return dma_mapping_error(vring_dma_dev(vq), addr);
> > -}
> 
> It looks to me if you keep vring_mapping_error behind
> vring_unmap_one_split(), lots of changes were unncessary.
> 
[...]
> > +	}
> > +	/* That should have freed everything. */
> > +	BUG_ON(vq->vq.num_free != vq->vring.num);
> > +
> > +	END_USE(vq);
> > +	return NULL;
> > +}
> 
> I think the those copy-and-paste hunks could be avoided and the diff should
> only contains renaming of the function. If yes, it would be very welcomed
> since it requires to compare the changes verbatim otherwise.

Michael suggested to lay out the code as:

XXX_split

XXX_packed

XXX wrappers

https://lkml.org/lkml/2018/4/13/410

That's why I moved some code.

> 
> > +
> > +/*
> > + * The layout for the packed ring is a continuous chunk of memory
> > + * which looks like this.
> > + *
> > + * struct vring_packed {
> > + *	// The actual descriptors (16 bytes each)
> > + *	struct vring_packed_desc desc[num];
> > + *
> > + *	// Padding to the next align boundary.
> > + *	char pad[];
> > + *
> > + *	// Driver Event Suppression
> > + *	struct vring_packed_desc_event driver;
> > + *
> > + *	// Device Event Suppression
> > + *	struct vring_packed_desc_event device;
> > + * };
> > + */
> > +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
> > +				     void *p, unsigned long align)
> > +{
> > +	vr->num = num;
> > +	vr->desc = p;
> > +	vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
> > +		* num + align - 1) & ~(align - 1));
> 
> If we choose not to go uapi, maybe we can use ALIGN() macro here?

Okay.

> 
> > +	vr->device = vr->driver + 1;
> > +}
[...]
> > +/* Only available for split ring */
> >   const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> >   {
> 
> A possible issue with this is:
> 
> After commit d4674240f31f8c4289abba07d64291c6ddce51bc ("KVM: s390:
> virtio-ccw revision 1 SET_VQ"). CCW tries to use
> virtqueue_get_avail()/virtqueue_get_used(). Looks like a bug either here or
> ccw code.

Do we still need to support:

include/linux/virtio.h
/*
 * Legacy accessors -- in almost all cases, these are the wrong functions
 * to use.
 */
static inline void *virtqueue_get_desc(struct virtqueue *vq)
{
        return virtqueue_get_vring(vq)->desc;
}
static inline void *virtqueue_get_avail(struct virtqueue *vq)
{
        return virtqueue_get_vring(vq)->avail;
}
static inline void *virtqueue_get_used(struct virtqueue *vq)
{
        return virtqueue_get_vring(vq)->used;
}

in packed ring?

If so, I think maybe it's better to expose them as
symbols and implement them in virtio_ring.c.

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v5 2/5] virtio_ring: support creating packed ring
  2018-05-29  5:24     ` Tiwei Bie
@ 2018-05-29  6:13       ` Jason Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2018-05-29  6:13 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann,
	Cornelia Huck



On 2018年05月29日 13:24, Tiwei Bie wrote:
> On Tue, May 29, 2018 at 10:49:11AM +0800, Jason Wang wrote:
>> On 2018年05月22日 16:16, Tiwei Bie wrote:
>>> This commit introduces the support for creating packed ring.
>>> All split ring specific functions are added _split suffix.
>>> Some necessary stubs for packed ring are also added.
>>>
>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>>    drivers/virtio/virtio_ring.c | 801 +++++++++++++++++++++++------------
>>>    include/linux/virtio_ring.h  |   8 +-
>>>    2 files changed, 546 insertions(+), 263 deletions(-)
>>>
>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>> index 71458f493cf8..f5ef5f42a7cf 100644
>>> --- a/drivers/virtio/virtio_ring.c
>>> +++ b/drivers/virtio/virtio_ring.c
>>> @@ -61,11 +61,15 @@ struct vring_desc_state {
>>>    	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
>>>    };
>>> +struct vring_desc_state_packed {
>>> +	int next;			/* The next desc state. */
>>> +};
>>> +
>>>    struct vring_virtqueue {
>>>    	struct virtqueue vq;
>>> -	/* Actual memory layout for this queue */
>>> -	struct vring vring;
>>> +	/* Is this a packed ring? */
>>> +	bool packed;
>>>    	/* Can we use weak barriers? */
>>>    	bool weak_barriers;
>>> @@ -87,11 +91,39 @@ struct vring_virtqueue {
>>>    	/* Last used index we've seen. */
>>>    	u16 last_used_idx;
>>> -	/* Last written value to avail->flags */
>>> -	u16 avail_flags_shadow;
>>> +	union {
>>> +		/* Available for split ring */
>>> +		struct {
>>> +			/* Actual memory layout for this queue. */
>>> +			struct vring vring;
>>> -	/* Last written value to avail->idx in guest byte order */
>>> -	u16 avail_idx_shadow;
>>> +			/* Last written value to avail->flags */
>>> +			u16 avail_flags_shadow;
>>> +
>>> +			/* Last written value to avail->idx in
>>> +			 * guest byte order. */
>>> +			u16 avail_idx_shadow;
>>> +		};
>>> +
>>> +		/* Available for packed ring */
>>> +		struct {
>>> +			/* Actual memory layout for this queue. */
>>> +			struct vring_packed vring_packed;
>>> +
>>> +			/* Driver ring wrap counter. */
>>> +			u8 avail_wrap_counter;
>>> +
>>> +			/* Device ring wrap counter. */
>>> +			u8 used_wrap_counter;
>> How about just use boolean?
> I can change it to bool if you want.

Yes, please.

>
> [...]
>>> -static int vring_mapping_error(const struct vring_virtqueue *vq,
>>> -			       dma_addr_t addr)
>>> -{
>>> -	if (!vring_use_dma_api(vq->vq.vdev))
>>> -		return 0;
>>> -
>>> -	return dma_mapping_error(vring_dma_dev(vq), addr);
>>> -}
>> It looks to me if you keep vring_mapping_error behind
>> vring_unmap_one_split(), lots of changes were unncessary.
>>
> [...]
>>> +	}
>>> +	/* That should have freed everything. */
>>> +	BUG_ON(vq->vq.num_free != vq->vring.num);
>>> +
>>> +	END_USE(vq);
>>> +	return NULL;
>>> +}
>> I think the those copy-and-paste hunks could be avoided and the diff should
>> only contains renaming of the function. If yes, it would be very welcomed
>> since it requires to compare the changes verbatim otherwise.
> Michael suggested to lay out the code as:
>
> XXX_split
>
> XXX_packed
>
> XXX wrappers
>
> https://lkml.org/lkml/2018/4/13/410
>
> That's why I moved some code.

I see, then no need to change but it still looks unnecessary.

>
>>> +
>>> +/*
>>> + * The layout for the packed ring is a continuous chunk of memory
>>> + * which looks like this.
>>> + *
>>> + * struct vring_packed {
>>> + *	// The actual descriptors (16 bytes each)
>>> + *	struct vring_packed_desc desc[num];
>>> + *
>>> + *	// Padding to the next align boundary.
>>> + *	char pad[];
>>> + *
>>> + *	// Driver Event Suppression
>>> + *	struct vring_packed_desc_event driver;
>>> + *
>>> + *	// Device Event Suppression
>>> + *	struct vring_packed_desc_event device;
>>> + * };
>>> + */
>>> +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
>>> +				     void *p, unsigned long align)
>>> +{
>>> +	vr->num = num;
>>> +	vr->desc = p;
>>> +	vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
>>> +		* num + align - 1) & ~(align - 1));
>> If we choose not to go uapi, maybe we can use ALIGN() macro here?
> Okay.
>
>>> +	vr->device = vr->driver + 1;
>>> +}
> [...]
>>> +/* Only available for split ring */
>>>    const struct vring *virtqueue_get_vring(struct virtqueue *vq)
>>>    {
>> A possible issue with this is:
>>
>> After commit d4674240f31f8c4289abba07d64291c6ddce51bc ("KVM: s390:
>> virtio-ccw revision 1 SET_VQ"). CCW tries to use
>> virtqueue_get_avail()/virtqueue_get_used(). Looks like a bug either here or
>> ccw code.
> Do we still need to support:
>
> include/linux/virtio.h
> /*
>   * Legacy accessors -- in almost all cases, these are the wrong functions
>   * to use.
>   */
> static inline void *virtqueue_get_desc(struct virtqueue *vq)
> {
>          return virtqueue_get_vring(vq)->desc;
> }
> static inline void *virtqueue_get_avail(struct virtqueue *vq)
> {
>          return virtqueue_get_vring(vq)->avail;
> }
> static inline void *virtqueue_get_used(struct virtqueue *vq)
> {
>          return virtqueue_get_vring(vq)->used;
> }
>
> in packed ring?

I think it was probably a bug in ccw, they should use e.g 
virtqueue_get_desc_addr() instead.

Thanks

>
> If so, I think maybe it's better to expose them as
> symbols and implement them in virtio_ring.c.
>
> Best regards,
> Tiwei Bie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v5 3/5] virtio_ring: add packed ring support
  2018-05-29  5:11     ` Tiwei Bie
@ 2018-05-29  6:16       ` Jason Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2018-05-29  6:16 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann



On 2018年05月29日 13:11, Tiwei Bie wrote:
> On Tue, May 29, 2018 at 11:18:57AM +0800, Jason Wang wrote:
>> On 2018年05月22日 16:16, Tiwei Bie wrote:
> [...]
>>> +static void detach_buf_packed(struct vring_virtqueue *vq,
>>> +			      unsigned int id, void **ctx)
>>> +{
>>> +	struct vring_desc_state_packed *state;
>>> +	struct vring_packed_desc *desc;
>>> +	unsigned int i;
>>> +	int *next;
>>> +
>>> +	/* Clear data ptr. */
>>> +	vq->desc_state_packed[id].data = NULL;
>>> +
>>> +	next = &id;
>>> +	for (i = 0; i < vq->desc_state_packed[id].num; i++) {
>>> +		state = &vq->desc_state_packed[*next];
>>> +		vring_unmap_state_packed(vq, state);
>>> +		next = &vq->desc_state_packed[*next].next;
>> Have a short discussion with Michael. It looks like the only thing we care
>> is DMA address and length here.
> The length tracked by desc_state_packed[] is also necessary
> when INDIRECT is used and the buffers are writable.
>
>> So a thought is to avoid using desc_state_packed() is vring_use_dma_api() is
>> false? Probably need another id allocator or just use desc_state_packed for
>> free id list.
> I think it's a good suggestion. Thanks!
>
> I won't track the addr/len/flags for non-indirect descs
> when vring_use_dma_api() returns false.
>
>>> +	}
>>> +
>>> +	vq->vq.num_free += vq->desc_state_packed[id].num;
>>> +	*next = vq->free_head;
>> Using pointer seems not elegant and unnecessary here. You can just track
>> next in integer I think?
> It's possible to use integer, but will need one more var
> to track `prev` (desc_state_packed[prev].next == next in
> this case).

Yes, please do this.

>
>>> +	vq->free_head = id;
>>> +
>>> +	if (vq->indirect) {
>>> +		u32 len;
>>> +
>>> +		/* Free the indirect table, if any, now that it's unmapped. */
>>> +		desc = vq->desc_state_packed[id].indir_desc;
>>> +		if (!desc)
>>> +			return;
>>> +
>>> +		len = vq->desc_state_packed[id].len;
>>> +		for (i = 0; i < len / sizeof(struct vring_packed_desc); i++)
>>> +			vring_unmap_desc_packed(vq, &desc[i]);
>>> +
>>> +		kfree(desc);
>>> +		vq->desc_state_packed[id].indir_desc = NULL;
>>> +	} else if (ctx) {
>>> +		*ctx = vq->desc_state_packed[id].indir_desc;
>>> +	}
>>>    }
>>>    static inline bool more_used_packed(const struct vring_virtqueue *vq)
>>>    {
>>> -	return false;
>>> +	u16 last_used, flags;
>>> +	u8 avail, used;
>>> +
>>> +	last_used = vq->last_used_idx;
>>> +	flags = virtio16_to_cpu(vq->vq.vdev,
>>> +				vq->vring_packed.desc[last_used].flags);
>>> +	avail = !!(flags & VRING_DESC_F_AVAIL(1));
>>> +	used = !!(flags & VRING_DESC_F_USED(1));
>>> +
>>> +	return avail == used && used == vq->used_wrap_counter;
>> Spec does not check avail == used? So there's probably a bug in either of
>> the two places.
>>
>> In what condition that avail != used but used == vq_used_wrap_counter? A
>> buggy device or device set the two bits in two writes?
> Currently, spec doesn't forbid devices to set the status
> bits as: avail != used but used == vq_used_wrap_counter.
> So I think driver cannot assume this won't happen.

Right, but I could not find a situation that device need to something 
like this. We should either forbid this in the spec or change the 
example code in the spec.

Let's see how Michael think about this.

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-05-29  6:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22  8:16 [RFC v5 0/5] virtio: support packed ring Tiwei Bie
2018-05-22  8:16 ` [RFC v5 1/5] virtio: add packed ring definitions Tiwei Bie
2018-05-22  8:16 ` [RFC v5 2/5] virtio_ring: support creating packed ring Tiwei Bie
2018-05-29  2:49   ` Jason Wang
2018-05-29  5:24     ` Tiwei Bie
2018-05-29  6:13       ` Jason Wang
2018-05-22  8:16 ` [RFC v5 3/5] virtio_ring: add packed ring support Tiwei Bie
2018-05-29  3:18   ` Jason Wang
2018-05-29  5:11     ` Tiwei Bie
2018-05-29  6:16       ` Jason Wang
2018-05-22  8:16 ` [RFC v5 4/5] virtio_ring: add event idx support in packed ring Tiwei Bie
2018-05-22  8:16 ` [RFC v5 5/5] virtio_ring: enable " Tiwei Bie
2018-05-25  2:31 ` [RFC v5 0/5] virtio: support " Jason Wang
2018-05-25  3:07   ` Tiwei Bie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).