LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API
@ 2021-11-15 14:19 Paul Cercueil
  2021-11-15 14:19 ` [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues Paul Cercueil
                   ` (13 more replies)
  0 siblings, 14 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Hi Jonathan,

This patchset introduces a new userspace interface based on DMABUF
objects, to complement the existing fileio based API.

The advantage of this DMABUF based interface vs. the fileio
interface, is that it avoids an extra copy of the data between the
kernel and userspace. This is particularly userful for high-speed
devices which produce several megabytes or even gigabytes of data per
second.

The first few patches [01/15] to [03/15] are not really related, but
allow to reduce the size of the patches that introduce the new API.

Patch [04/15] to [06/15] enables write() support to the buffer-dma
implementation of the buffer API, to continue the work done by
Mihail Chindris.

Patches [07/15] to [12/15] introduce the new DMABUF based API.

Patches [13/15] and [14/15] add support for cyclic buffers, only through
the new API. A cyclic buffer will be repeated on the output until the
buffer is disabled.

Patch [15/15] adds documentation about the new API.

For now, the API allows you to alloc DMABUF objects and mmap() them to
read or write the samples. It does not yet allow to import DMABUFs
parented to other subsystems, but that should eventually be possible
once it's wired.

This patchset is inspired by the "mmap interface" that was previously
submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would be
great if I could get a review from you guys. Alexandru's commit was
signed with his @analog.com address but he doesn't work at ADI anymore,
so I believe I'll need him to sign with a new email.

Cheers,
-Paul

Alexandru Ardelean (1):
  iio: buffer-dma: split iio_dma_buffer_fileio_free() function

Paul Cercueil (14):
  iio: buffer-dma: Get rid of incoming/outgoing queues
  iio: buffer-dma: Remove unused iio_buffer_block struct
  iio: buffer-dma: Use round_down() instead of rounddown()
  iio: buffer-dma: Enable buffer write support
  iio: buffer-dmaengine: Support specifying buffer direction
  iio: buffer-dmaengine: Enable write support
  iio: core: Add new DMABUF interface infrastructure
  iio: buffer-dma: Use DMABUFs instead of custom solution
  iio: buffer-dma: Implement new DMABUF based userspace API
  iio: buffer-dma: Boost performance using write-combine cache setting
  iio: buffer-dmaengine: Support new DMABUF based userspace API
  iio: core: Add support for cyclic buffers
  iio: buffer-dmaengine: Add support for cyclic buffers
  Documentation: iio: Document high-speed DMABUF based API

 Documentation/driver-api/dma-buf.rst          |   2 +
 Documentation/iio/dmabuf_api.rst              |  94 +++
 Documentation/iio/index.rst                   |   2 +
 drivers/iio/adc/adi-axi-adc.c                 |   3 +-
 drivers/iio/buffer/industrialio-buffer-dma.c  | 670 ++++++++++++++----
 .../buffer/industrialio-buffer-dmaengine.c    |  42 +-
 drivers/iio/industrialio-buffer.c             |  49 ++
 include/linux/iio/buffer-dma.h                |  43 +-
 include/linux/iio/buffer-dmaengine.h          |   5 +-
 include/linux/iio/buffer_impl.h               |   8 +
 include/uapi/linux/iio/buffer.h               |  30 +
 11 files changed, 783 insertions(+), 165 deletions(-)
 create mode 100644 Documentation/iio/dmabuf_api.rst

-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-16  8:23   ` Alexandru Ardelean
                     ` (2 more replies)
  2021-11-15 14:19 ` [PATCH 02/15] iio: buffer-dma: Remove unused iio_buffer_block struct Paul Cercueil
                   ` (12 subsequent siblings)
  13 siblings, 3 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

The buffer-dma code was using two queues, incoming and outgoing, to
manage the state of the blocks in use.

While this totally works, it adds some complexity to the code,
especially since the code only manages 2 blocks. It is much easier to
just check each block's state manually, and keep a counter for the next
block to dequeue.

Since the new DMABUF based API wouldn't use these incoming and outgoing
queues anyway, getting rid of them now makes the upcoming changes
simpler.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/buffer/industrialio-buffer-dma.c | 68 ++++++++++----------
 include/linux/iio/buffer-dma.h               |  7 +-
 2 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index d348af8b9705..abac88f20104 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -191,16 +191,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
 
 static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
 {
-	struct iio_dma_buffer_queue *queue = block->queue;
-
-	/*
-	 * The buffer has already been freed by the application, just drop the
-	 * reference.
-	 */
-	if (block->state != IIO_BLOCK_STATE_DEAD) {
+	if (block->state != IIO_BLOCK_STATE_DEAD)
 		block->state = IIO_BLOCK_STATE_DONE;
-		list_add_tail(&block->head, &queue->outgoing);
-	}
 }
 
 /**
@@ -317,11 +309,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
 	 * dead. This means we can reset the lists without having to fear
 	 * corrution.
 	 */
-	INIT_LIST_HEAD(&queue->outgoing);
 	spin_unlock_irq(&queue->list_lock);
 
-	INIT_LIST_HEAD(&queue->incoming);
-
 	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
 		if (queue->fileio.blocks[i]) {
 			block = queue->fileio.blocks[i];
@@ -346,7 +335,6 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
 		}
 
 		block->state = IIO_BLOCK_STATE_QUEUED;
-		list_add_tail(&block->head, &queue->incoming);
 	}
 
 out_unlock:
@@ -401,13 +389,18 @@ int iio_dma_buffer_enable(struct iio_buffer *buffer,
 	struct iio_dev *indio_dev)
 {
 	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
-	struct iio_dma_buffer_block *block, *_block;
+	struct iio_dma_buffer_block *block;
+	unsigned int i;
 
 	mutex_lock(&queue->lock);
 	queue->active = true;
-	list_for_each_entry_safe(block, _block, &queue->incoming, head) {
-		list_del(&block->head);
-		iio_dma_buffer_submit_block(queue, block);
+	queue->fileio.next_dequeue = 0;
+
+	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+		block = queue->fileio.blocks[i];
+
+		if (block->state == IIO_BLOCK_STATE_QUEUED)
+			iio_dma_buffer_submit_block(queue, block);
 	}
 	mutex_unlock(&queue->lock);
 
@@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
 static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue *queue,
 	struct iio_dma_buffer_block *block)
 {
-	if (block->state == IIO_BLOCK_STATE_DEAD) {
+	if (block->state == IIO_BLOCK_STATE_DEAD)
 		iio_buffer_block_put(block);
-	} else if (queue->active) {
+	else if (queue->active)
 		iio_dma_buffer_submit_block(queue, block);
-	} else {
+	else
 		block->state = IIO_BLOCK_STATE_QUEUED;
-		list_add_tail(&block->head, &queue->incoming);
-	}
 }
 
 static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
 	struct iio_dma_buffer_queue *queue)
 {
 	struct iio_dma_buffer_block *block;
+	unsigned int idx;
 
 	spin_lock_irq(&queue->list_lock);
-	block = list_first_entry_or_null(&queue->outgoing, struct
-		iio_dma_buffer_block, head);
-	if (block != NULL) {
-		list_del(&block->head);
+
+	idx = queue->fileio.next_dequeue;
+	block = queue->fileio.blocks[idx];
+
+	if (block->state == IIO_BLOCK_STATE_DONE) {
 		block->state = IIO_BLOCK_STATE_DEQUEUED;
+		idx = (idx + 1) % ARRAY_SIZE(queue->fileio.blocks);
+		queue->fileio.next_dequeue = idx;
+	} else {
+		block = NULL;
 	}
+
 	spin_unlock_irq(&queue->list_lock);
 
 	return block;
@@ -539,6 +537,7 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
 	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buf);
 	struct iio_dma_buffer_block *block;
 	size_t data_available = 0;
+	unsigned int i;
 
 	/*
 	 * For counting the available bytes we'll use the size of the block not
@@ -552,8 +551,15 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
 		data_available += queue->fileio.active_block->size;
 
 	spin_lock_irq(&queue->list_lock);
-	list_for_each_entry(block, &queue->outgoing, head)
-		data_available += block->size;
+
+	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+		block = queue->fileio.blocks[i];
+
+		if (block != queue->fileio.active_block
+		    && block->state == IIO_BLOCK_STATE_DONE)
+			data_available += block->size;
+	}
+
 	spin_unlock_irq(&queue->list_lock);
 	mutex_unlock(&queue->lock);
 
@@ -616,9 +622,6 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
 	queue->dev = dev;
 	queue->ops = ops;
 
-	INIT_LIST_HEAD(&queue->incoming);
-	INIT_LIST_HEAD(&queue->outgoing);
-
 	mutex_init(&queue->lock);
 	spin_lock_init(&queue->list_lock);
 
@@ -645,11 +648,8 @@ void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
 			continue;
 		queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
 	}
-	INIT_LIST_HEAD(&queue->outgoing);
 	spin_unlock_irq(&queue->list_lock);
 
-	INIT_LIST_HEAD(&queue->incoming);
-
 	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
 		if (!queue->fileio.blocks[i])
 			continue;
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index ff15c61bf319..d4ed5ff39d44 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -78,12 +78,15 @@ struct iio_dma_buffer_block {
  * @active_block: Block being used in read()
  * @pos: Read offset in the active block
  * @block_size: Size of each block
+ * @next_dequeue: index of next block that will be dequeued
  */
 struct iio_dma_buffer_queue_fileio {
 	struct iio_dma_buffer_block *blocks[2];
 	struct iio_dma_buffer_block *active_block;
 	size_t pos;
 	size_t block_size;
+
+	unsigned int next_dequeue;
 };
 
 /**
@@ -97,8 +100,6 @@ struct iio_dma_buffer_queue_fileio {
  *   atomic context as well as blocks on those lists. This is the outgoing queue
  *   list and typically also a list of active blocks in the part that handles
  *   the DMA controller
- * @incoming: List of buffers on the incoming queue
- * @outgoing: List of buffers on the outgoing queue
  * @active: Whether the buffer is currently active
  * @fileio: FileIO state
  */
@@ -109,8 +110,6 @@ struct iio_dma_buffer_queue {
 
 	struct mutex lock;
 	spinlock_t list_lock;
-	struct list_head incoming;
-	struct list_head outgoing;
 
 	bool active;
 
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 02/15] iio: buffer-dma: Remove unused iio_buffer_block struct
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
  2021-11-15 14:19 ` [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-16  8:22   ` Alexandru Ardelean
  2021-11-15 14:19 ` [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown() Paul Cercueil
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

This structure was never used anywhere, so it can safely be dropped.

It will later be re-introduced as a different structure in a
different header.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 include/linux/iio/buffer-dma.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index d4ed5ff39d44..a65a005c4a19 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -17,11 +17,6 @@ struct iio_dma_buffer_queue;
 struct iio_dma_buffer_ops;
 struct device;
 
-struct iio_buffer_block {
-	u32 size;
-	u32 bytes_used;
-};
-
 /**
  * enum iio_block_state - State of a struct iio_dma_buffer_block
  * @IIO_BLOCK_STATE_DEQUEUED: Block is not queued
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown()
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
  2021-11-15 14:19 ` [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues Paul Cercueil
  2021-11-15 14:19 ` [PATCH 02/15] iio: buffer-dma: Remove unused iio_buffer_block struct Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-16  8:26   ` Alexandru Ardelean
  2021-11-21 14:08   ` Jonathan Cameron
  2021-11-15 14:19 ` [PATCH 04/15] iio: buffer-dma: Enable buffer write support Paul Cercueil
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

We know that the buffer's alignment will always be a power of two;
therefore, we can use the faster round_down() macro.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/buffer/industrialio-buffer-dmaengine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index 1ac94c4e9792..f8ce26a24c57 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -67,7 +67,7 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
 	dma_cookie_t cookie;
 
 	block->bytes_used = min(block->size, dmaengine_buffer->max_size);
-	block->bytes_used = rounddown(block->bytes_used,
+	block->bytes_used = round_down(block->bytes_used,
 			dmaengine_buffer->align);
 
 	desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 04/15] iio: buffer-dma: Enable buffer write support
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (2 preceding siblings ...)
  2021-11-15 14:19 ` [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown() Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-16  8:52   ` Alexandru Ardelean
  2021-11-21 14:20   ` Jonathan Cameron
  2021-11-15 14:19 ` [PATCH 05/15] iio: buffer-dmaengine: Support specifying buffer direction Paul Cercueil
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Adding write support to the buffer-dma code is easy - the write()
function basically needs to do the exact same thing as the read()
function: dequeue a block, read or write the data, enqueue the block
when entirely processed.

Therefore, the iio_buffer_dma_read() and the new iio_buffer_dma_write()
now both call a function iio_buffer_dma_io(), which will perform this
task.

The .space_available() callback can return the exact same value as the
.data_available() callback for input buffers, since in both cases we
count the exact same thing (the number of bytes in each available
block).

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/buffer/industrialio-buffer-dma.c | 75 +++++++++++++++-----
 include/linux/iio/buffer-dma.h               |  7 ++
 2 files changed, 66 insertions(+), 16 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index abac88f20104..eeeed6b2e0cf 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -179,7 +179,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
 	}
 
 	block->size = size;
-	block->state = IIO_BLOCK_STATE_DEQUEUED;
+	block->bytes_used = size;
+	block->state = IIO_BLOCK_STATE_DONE;
 	block->queue = queue;
 	INIT_LIST_HEAD(&block->head);
 	kref_init(&block->kref);
@@ -195,6 +196,18 @@ static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
 		block->state = IIO_BLOCK_STATE_DONE;
 }
 
+static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue *queue)
+{
+	__poll_t flags;
+
+	if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
+		flags = EPOLLIN | EPOLLRDNORM;
+	else
+		flags = EPOLLOUT | EPOLLWRNORM;
+
+	wake_up_interruptible_poll(&queue->buffer.pollq, flags);
+}
+
 /**
  * iio_dma_buffer_block_done() - Indicate that a block has been completed
  * @block: The completed block
@@ -212,7 +225,7 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
 	spin_unlock_irqrestore(&queue->list_lock, flags);
 
 	iio_buffer_block_put_atomic(block);
-	wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
+	iio_dma_buffer_queue_wake(queue);
 }
 EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
 
@@ -241,7 +254,7 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
 	}
 	spin_unlock_irqrestore(&queue->list_lock, flags);
 
-	wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
+	iio_dma_buffer_queue_wake(queue);
 }
 EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
 
@@ -334,7 +347,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
 			queue->fileio.blocks[i] = block;
 		}
 
-		block->state = IIO_BLOCK_STATE_QUEUED;
+		if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
+			block->state = IIO_BLOCK_STATE_QUEUED;
 	}
 
 out_unlock:
@@ -467,20 +481,12 @@ static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
 	return block;
 }
 
-/**
- * iio_dma_buffer_read() - DMA buffer read callback
- * @buffer: Buffer to read form
- * @n: Number of bytes to read
- * @user_buffer: Userspace buffer to copy the data to
- *
- * Should be used as the read callback for iio_buffer_access_ops
- * struct for DMA buffers.
- */
-int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
-	char __user *user_buffer)
+static int iio_dma_buffer_io(struct iio_buffer *buffer,
+			     size_t n, char __user *user_buffer, bool is_write)
 {
 	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
 	struct iio_dma_buffer_block *block;
+	void *addr;
 	int ret;
 
 	if (n < buffer->bytes_per_datum)
@@ -503,8 +509,13 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
 	n = rounddown(n, buffer->bytes_per_datum);
 	if (n > block->bytes_used - queue->fileio.pos)
 		n = block->bytes_used - queue->fileio.pos;
+	addr = block->vaddr + queue->fileio.pos;
 
-	if (copy_to_user(user_buffer, block->vaddr + queue->fileio.pos, n)) {
+	if (is_write)
+		ret = !!copy_from_user(addr, user_buffer, n);
+	else
+		ret = !!copy_to_user(user_buffer, addr, n);
+	if (ret) {
 		ret = -EFAULT;
 		goto out_unlock;
 	}
@@ -513,6 +524,7 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
 
 	if (queue->fileio.pos == block->bytes_used) {
 		queue->fileio.active_block = NULL;
+		block->bytes_used = block->size;
 		iio_dma_buffer_enqueue(queue, block);
 	}
 
@@ -523,8 +535,39 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
 
 	return ret;
 }
+
+/**
+ * iio_dma_buffer_read() - DMA buffer read callback
+ * @buffer: Buffer to read form
+ * @n: Number of bytes to read
+ * @user_buffer: Userspace buffer to copy the data to
+ *
+ * Should be used as the read callback for iio_buffer_access_ops
+ * struct for DMA buffers.
+ */
+int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
+	char __user *user_buffer)
+{
+	return iio_dma_buffer_io(buffer, n, user_buffer, false);
+}
 EXPORT_SYMBOL_GPL(iio_dma_buffer_read);
 
+/**
+ * iio_dma_buffer_write() - DMA buffer write callback
+ * @buffer: Buffer to read form
+ * @n: Number of bytes to read
+ * @user_buffer: Userspace buffer to copy the data from
+ *
+ * Should be used as the write callback for iio_buffer_access_ops
+ * struct for DMA buffers.
+ */
+int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
+			 const char __user *user_buffer)
+{
+	return iio_dma_buffer_io(buffer, n, (__force char *)user_buffer, true);
+}
+EXPORT_SYMBOL_GPL(iio_dma_buffer_write);
+
 /**
  * iio_dma_buffer_data_available() - DMA buffer data_available callback
  * @buf: Buffer to check for data availability
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index a65a005c4a19..09c07d5563c0 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -132,6 +132,8 @@ int iio_dma_buffer_disable(struct iio_buffer *buffer,
 	struct iio_dev *indio_dev);
 int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
 	char __user *user_buffer);
+int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
+			 const char __user *user_buffer);
 size_t iio_dma_buffer_data_available(struct iio_buffer *buffer);
 int iio_dma_buffer_set_bytes_per_datum(struct iio_buffer *buffer, size_t bpd);
 int iio_dma_buffer_set_length(struct iio_buffer *buffer, unsigned int length);
@@ -142,4 +144,9 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
 void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
 void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);
 
+static inline size_t iio_dma_buffer_space_available(struct iio_buffer *buffer)
+{
+	return iio_dma_buffer_data_available(buffer);
+}
+
 #endif
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 05/15] iio: buffer-dmaengine: Support specifying buffer direction
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (3 preceding siblings ...)
  2021-11-15 14:19 ` [PATCH 04/15] iio: buffer-dma: Enable buffer write support Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-16  8:53   ` Alexandru Ardelean
  2021-11-15 14:19 ` [PATCH 06/15] iio: buffer-dmaengine: Enable write support Paul Cercueil
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Update the devm_iio_dmaengine_buffer_setup() function to support
specifying the buffer direction.

Update the iio_dmaengine_buffer_submit() function to handle input
buffers as well as output buffers.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/adc/adi-axi-adc.c                 |  3 ++-
 .../buffer/industrialio-buffer-dmaengine.c    | 24 +++++++++++++++----
 include/linux/iio/buffer-dmaengine.h          |  5 +++-
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/iio/adc/adi-axi-adc.c b/drivers/iio/adc/adi-axi-adc.c
index a73e3c2d212f..0a6f2c32b1b9 100644
--- a/drivers/iio/adc/adi-axi-adc.c
+++ b/drivers/iio/adc/adi-axi-adc.c
@@ -113,7 +113,8 @@ static int adi_axi_adc_config_dma_buffer(struct device *dev,
 		dma_name = "rx";
 
 	return devm_iio_dmaengine_buffer_setup(indio_dev->dev.parent,
-					       indio_dev, dma_name);
+					       indio_dev, dma_name,
+					       IIO_BUFFER_DIRECTION_IN);
 }
 
 static int adi_axi_adc_read_raw(struct iio_dev *indio_dev,
diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index f8ce26a24c57..ac26b04aa4a9 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -64,14 +64,25 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
 	struct dmaengine_buffer *dmaengine_buffer =
 		iio_buffer_to_dmaengine_buffer(&queue->buffer);
 	struct dma_async_tx_descriptor *desc;
+	enum dma_transfer_direction dma_dir;
+	size_t max_size;
 	dma_cookie_t cookie;
 
-	block->bytes_used = min(block->size, dmaengine_buffer->max_size);
-	block->bytes_used = round_down(block->bytes_used,
-			dmaengine_buffer->align);
+	max_size = min(block->size, dmaengine_buffer->max_size);
+	max_size = round_down(max_size, dmaengine_buffer->align);
+
+	if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
+		block->bytes_used = max_size;
+		dma_dir = DMA_DEV_TO_MEM;
+	} else {
+		dma_dir = DMA_MEM_TO_DEV;
+	}
+
+	if (!block->bytes_used || block->bytes_used > max_size)
+		return -EINVAL;
 
 	desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
-		block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM,
+		block->phys_addr, block->bytes_used, dma_dir,
 		DMA_PREP_INTERRUPT);
 	if (!desc)
 		return -ENOMEM;
@@ -275,7 +286,8 @@ static struct iio_buffer *devm_iio_dmaengine_buffer_alloc(struct device *dev,
  */
 int devm_iio_dmaengine_buffer_setup(struct device *dev,
 				    struct iio_dev *indio_dev,
-				    const char *channel)
+				    const char *channel,
+				    enum iio_buffer_direction dir)
 {
 	struct iio_buffer *buffer;
 
@@ -286,6 +298,8 @@ int devm_iio_dmaengine_buffer_setup(struct device *dev,
 
 	indio_dev->modes |= INDIO_BUFFER_HARDWARE;
 
+	buffer->direction = dir;
+
 	return iio_device_attach_buffer(indio_dev, buffer);
 }
 EXPORT_SYMBOL_GPL(devm_iio_dmaengine_buffer_setup);
diff --git a/include/linux/iio/buffer-dmaengine.h b/include/linux/iio/buffer-dmaengine.h
index 5c355be89814..538d0479cdd6 100644
--- a/include/linux/iio/buffer-dmaengine.h
+++ b/include/linux/iio/buffer-dmaengine.h
@@ -7,11 +7,14 @@
 #ifndef __IIO_DMAENGINE_H__
 #define __IIO_DMAENGINE_H__
 
+#include <linux/iio/buffer.h>
+
 struct iio_dev;
 struct device;
 
 int devm_iio_dmaengine_buffer_setup(struct device *dev,
 				    struct iio_dev *indio_dev,
-				    const char *channel);
+				    const char *channel,
+				    enum iio_buffer_direction dir);
 
 #endif
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/15] iio: buffer-dmaengine: Enable write support
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (4 preceding siblings ...)
  2021-11-15 14:19 ` [PATCH 05/15] iio: buffer-dmaengine: Support specifying buffer direction Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-16  8:55   ` Alexandru Ardelean
  2021-11-15 14:19 ` [PATCH 07/15] iio: core: Add new DMABUF interface infrastructure Paul Cercueil
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Use the iio_dma_buffer_write() and iio_dma_buffer_space_available()
functions provided by the buffer-dma core, to enable write support in
the buffer-dmaengine code.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/buffer/industrialio-buffer-dmaengine.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index ac26b04aa4a9..5cde8fd81c7f 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -123,12 +123,14 @@ static void iio_dmaengine_buffer_release(struct iio_buffer *buf)
 
 static const struct iio_buffer_access_funcs iio_dmaengine_buffer_ops = {
 	.read = iio_dma_buffer_read,
+	.write = iio_dma_buffer_write,
 	.set_bytes_per_datum = iio_dma_buffer_set_bytes_per_datum,
 	.set_length = iio_dma_buffer_set_length,
 	.request_update = iio_dma_buffer_request_update,
 	.enable = iio_dma_buffer_enable,
 	.disable = iio_dma_buffer_disable,
 	.data_available = iio_dma_buffer_data_available,
+	.space_available = iio_dma_buffer_space_available,
 	.release = iio_dmaengine_buffer_release,
 
 	.modes = INDIO_BUFFER_HARDWARE,
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 07/15] iio: core: Add new DMABUF interface infrastructure
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (5 preceding siblings ...)
  2021-11-15 14:19 ` [PATCH 06/15] iio: buffer-dmaengine: Enable write support Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-21 14:31   ` Jonathan Cameron
  2021-11-15 14:19 ` [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function Paul Cercueil
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Add the necessary infrastructure to the IIO core to support a new DMABUF
based interface.

The advantage of this new DMABUF based interface vs. the read()
interface, is that it avoids an extra copy of the data between the
kernel and userspace. This is particularly userful for high-speed
devices which produce several megabytes or even gigabytes of data per
second.

The data in this new DMABUF interface is managed at the granularity of
DMABUF objects. Reducing the granularity from byte level to block level
is done to reduce the userspace-kernelspace synchronization overhead
since performing syscalls for each byte at a few Mbps is just not
feasible.

This of course leads to a slightly increased latency. For this reason an
application can choose the size of the DMABUFs as well as how many it
allocates. E.g. two DMABUFs would be a traditional double buffering
scheme. But using a higher number might be necessary to avoid
underflow/overflow situations in the presence of scheduling latencies.

As part of the interface, 2 new IOCTLs have been added:

IIO_BUFFER_DMABUF_ALLOC_IOCTL(struct iio_dmabuf_alloc_req *):
 Each call will allocate a new DMABUF object. The return value (if not
 a negative errno value as error) will be the file descriptor of the new
 DMABUF.

IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
 Place the DMABUF object into the queue pending for hardware process.

These two IOCTLs have to be performed on the IIO buffer's file
descriptor (either opened from the corresponding /dev/iio:deviceX, or
obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl).

To access the data stored in a block by userspace the block must be
mapped to the process's memory. This is done by calling mmap() on the
DMABUF's file descriptor.

Before accessing the data through the map, you must use the
DMA_BUF_IOCTL_SYNC(struct dma_buf_sync *) ioctl, with the
DMA_BUF_SYNC_START flag, to make sure that the data is available.
This call may block until the hardware is done with this block. Once
you are done reading or writing the data, you must use this ioctl again
with the DMA_BUF_SYNC_END flag, before enqueueing the DMABUF to the
kernel's queue.

If you need to know when the hardware is done with a DMABUF, you can
poll its file descriptor for the EPOLLOUT event.

Finally, to destroy a DMABUF object, simply call close() on its file
descriptor.

A typical workflow for the new interface is:

  for block in blocks:
    DMABUF_ALLOC block
    mmap block

  enable buffer

  while !done
    for block in blocks:
      DMABUF_ENQUEUE block

      DMABUF_SYNC_START block
      process data
      DMABUF_SYNC_END block

  disable buffer

  for block in blocks:
    close block

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/industrialio-buffer.c | 44 +++++++++++++++++++++++++++++++
 include/linux/iio/buffer_impl.h   |  8 ++++++
 include/uapi/linux/iio/buffer.h   | 29 ++++++++++++++++++++
 3 files changed, 81 insertions(+)

diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
index e180728914c0..30910e6c2346 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -17,6 +17,7 @@
 #include <linux/fs.h>
 #include <linux/cdev.h>
 #include <linux/slab.h>
+#include <linux/mm.h>
 #include <linux/poll.h>
 #include <linux/sched/signal.h>
 
@@ -1585,12 +1586,55 @@ static long iio_device_buffer_getfd(struct iio_dev *indio_dev, unsigned long arg
 	return ret;
 }
 
+static int iio_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
+				     struct iio_dmabuf __user *user_buf)
+{
+	struct iio_dmabuf dmabuf;
+
+	if (!buffer->access->enqueue_dmabuf)
+		return -EPERM;
+
+	if (copy_from_user(&dmabuf, user_buf, sizeof(dmabuf)))
+		return -EFAULT;
+
+	if (dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
+		return -EINVAL;
+
+	return buffer->access->enqueue_dmabuf(buffer, &dmabuf);
+}
+
+static int iio_buffer_alloc_dmabuf(struct iio_buffer *buffer,
+				   struct iio_dmabuf_alloc_req __user *user_req)
+{
+	struct iio_dmabuf_alloc_req req;
+
+	if (!buffer->access->alloc_dmabuf)
+		return -EPERM;
+
+	if (copy_from_user(&req, user_req, sizeof(req)))
+		return -EFAULT;
+
+	if (req.resv)
+		return -EINVAL;
+
+	return buffer->access->alloc_dmabuf(buffer, &req);
+}
+
 static long iio_device_buffer_ioctl(struct iio_dev *indio_dev, struct file *filp,
 				    unsigned int cmd, unsigned long arg)
 {
+	struct iio_dev_buffer_pair *ib = filp->private_data;
+	struct iio_buffer *buffer = ib->buffer;
+	void __user *_arg = (void __user *)arg;
+
 	switch (cmd) {
 	case IIO_BUFFER_GET_FD_IOCTL:
 		return iio_device_buffer_getfd(indio_dev, arg);
+	case IIO_BUFFER_DMABUF_ALLOC_IOCTL:
+		return iio_buffer_alloc_dmabuf(buffer, _arg);
+	case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
+		/* TODO: support non-blocking enqueue operation */
+		return iio_buffer_enqueue_dmabuf(buffer, _arg);
 	default:
 		return IIO_IOCTL_UNHANDLED;
 	}
diff --git a/include/linux/iio/buffer_impl.h b/include/linux/iio/buffer_impl.h
index e2ca8ea23e19..728541bc2c63 100644
--- a/include/linux/iio/buffer_impl.h
+++ b/include/linux/iio/buffer_impl.h
@@ -39,6 +39,9 @@ struct iio_buffer;
  *                      device stops sampling. Calles are balanced with @enable.
  * @release:		called when the last reference to the buffer is dropped,
  *			should free all resources allocated by the buffer.
+ * @alloc_dmabuf:	called from userspace via ioctl to allocate one DMABUF.
+ * @enqueue_dmabuf:	called from userspace via ioctl to queue this DMABUF
+ *			object to this buffer. Requires a valid DMABUF fd.
  * @modes:		Supported operating modes by this buffer type
  * @flags:		A bitmask combination of INDIO_BUFFER_FLAG_*
  *
@@ -68,6 +71,11 @@ struct iio_buffer_access_funcs {
 
 	void (*release)(struct iio_buffer *buffer);
 
+	int (*alloc_dmabuf)(struct iio_buffer *buffer,
+			    struct iio_dmabuf_alloc_req *req);
+	int (*enqueue_dmabuf)(struct iio_buffer *buffer,
+			      struct iio_dmabuf *block);
+
 	unsigned int modes;
 	unsigned int flags;
 };
diff --git a/include/uapi/linux/iio/buffer.h b/include/uapi/linux/iio/buffer.h
index 13939032b3f6..e4621b926262 100644
--- a/include/uapi/linux/iio/buffer.h
+++ b/include/uapi/linux/iio/buffer.h
@@ -5,6 +5,35 @@
 #ifndef _UAPI_IIO_BUFFER_H_
 #define _UAPI_IIO_BUFFER_H_
 
+#include <linux/types.h>
+
+#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS	0x00000000
+
+/**
+ * struct iio_dmabuf_alloc_req - Descriptor for allocating IIO DMABUFs
+ * @size:	the size of a single DMABUF
+ * @resv:	reserved
+ */
+struct iio_dmabuf_alloc_req {
+	__u64 size;
+	__u64 resv;
+};
+
+/**
+ * struct iio_dmabuf - Descriptor for a single IIO DMABUF object
+ * @fd:		file descriptor of the DMABUF object
+ * @flags:	one or more IIO_BUFFER_DMABUF_* flags
+ * @bytes_used:	number of bytes used in this DMABUF for the data transfer.
+ *		If zero, the full buffer is used.
+ */
+struct iio_dmabuf {
+	__u32 fd;
+	__u32 flags;
+	__u64 bytes_used;
+};
+
 #define IIO_BUFFER_GET_FD_IOCTL			_IOWR('i', 0x91, int)
+#define IIO_BUFFER_DMABUF_ALLOC_IOCTL		_IOW('i', 0x92, struct iio_dmabuf_alloc_req)
+#define IIO_BUFFER_DMABUF_ENQUEUE_IOCTL		_IOW('i', 0x93, struct iio_dmabuf)
 
 #endif /* _UAPI_IIO_BUFFER_H_ */
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (6 preceding siblings ...)
  2021-11-15 14:19 ` [PATCH 07/15] iio: core: Add new DMABUF interface infrastructure Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-16 10:59   ` Alexandru Ardelean
  2021-11-15 14:19 ` [PATCH 09/15] iio: buffer-dma: Use DMABUFs instead of custom solution Paul Cercueil
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil,
	Alexandru Ardelean

From: Alexandru Ardelean <alexandru.ardelean@analog.com>

A part of the logic in the iio_dma_buffer_exit() is required for the change
to add mmap support to IIO buffers.
This change splits the logic into a separate function, which will be
re-used later.

Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/buffer/industrialio-buffer-dma.c | 39 +++++++++++---------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index eeeed6b2e0cf..eb8cfd3af030 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -358,6 +358,27 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
 }
 EXPORT_SYMBOL_GPL(iio_dma_buffer_request_update);
 
+static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue)
+{
+	unsigned int i;
+
+	spin_lock_irq(&queue->list_lock);
+	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+		if (!queue->fileio.blocks[i])
+			continue;
+		queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
+	}
+	spin_unlock_irq(&queue->list_lock);
+
+	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+		if (!queue->fileio.blocks[i])
+			continue;
+		iio_buffer_block_put(queue->fileio.blocks[i]);
+		queue->fileio.blocks[i] = NULL;
+	}
+	queue->fileio.active_block = NULL;
+}
+
 static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,
 	struct iio_dma_buffer_block *block)
 {
@@ -681,25 +702,9 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_init);
  */
 void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
 {
-	unsigned int i;
-
 	mutex_lock(&queue->lock);
 
-	spin_lock_irq(&queue->list_lock);
-	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
-		if (!queue->fileio.blocks[i])
-			continue;
-		queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
-	}
-	spin_unlock_irq(&queue->list_lock);
-
-	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
-		if (!queue->fileio.blocks[i])
-			continue;
-		iio_buffer_block_put(queue->fileio.blocks[i]);
-		queue->fileio.blocks[i] = NULL;
-	}
-	queue->fileio.active_block = NULL;
+	iio_dma_buffer_fileio_free(queue);
 	queue->ops = NULL;
 
 	mutex_unlock(&queue->lock);
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 09/15] iio: buffer-dma: Use DMABUFs instead of custom solution
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (7 preceding siblings ...)
  2021-11-15 14:19 ` [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-15 14:19 ` [PATCH 10/15] iio: buffer-dma: Implement new DMABUF based userspace API Paul Cercueil
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Enhance the current fileio code by using DMABUF objects instead of
custom buffers.

This adds more code than it removes, but:
- a lot of the complexity can be dropped, e.g. custom kref and
  iio_buffer_block_put_atomic() are not needed anymore;
- it will be much easier to introduce an API to export these DMABUF
  objects to userspace in a following patch.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/buffer/industrialio-buffer-dma.c | 196 ++++++++++++-------
 include/linux/iio/buffer-dma.h               |   8 +-
 2 files changed, 127 insertions(+), 77 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index eb8cfd3af030..adb20434f2d2 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -14,6 +14,7 @@
 #include <linux/poll.h>
 #include <linux/iio/buffer_impl.h>
 #include <linux/iio/buffer-dma.h>
+#include <linux/dma-buf.h>
 #include <linux/dma-mapping.h>
 #include <linux/sizes.h>
 
@@ -90,104 +91,150 @@
  * callback is called from within the custom callback.
  */
 
-static void iio_buffer_block_release(struct kref *kref)
-{
-	struct iio_dma_buffer_block *block = container_of(kref,
-		struct iio_dma_buffer_block, kref);
-
-	WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
-
-	dma_free_coherent(block->queue->dev, PAGE_ALIGN(block->size),
-					block->vaddr, block->phys_addr);
-
-	iio_buffer_put(&block->queue->buffer);
-	kfree(block);
-}
-
-static void iio_buffer_block_get(struct iio_dma_buffer_block *block)
-{
-	kref_get(&block->kref);
-}
-
-static void iio_buffer_block_put(struct iio_dma_buffer_block *block)
-{
-	kref_put(&block->kref, iio_buffer_block_release);
-}
-
-/*
- * dma_free_coherent can sleep, hence we need to take some special care to be
- * able to drop a reference from an atomic context.
- */
-static LIST_HEAD(iio_dma_buffer_dead_blocks);
-static DEFINE_SPINLOCK(iio_dma_buffer_dead_blocks_lock);
-
-static void iio_dma_buffer_cleanup_worker(struct work_struct *work)
-{
-	struct iio_dma_buffer_block *block, *_block;
-	LIST_HEAD(block_list);
-
-	spin_lock_irq(&iio_dma_buffer_dead_blocks_lock);
-	list_splice_tail_init(&iio_dma_buffer_dead_blocks, &block_list);
-	spin_unlock_irq(&iio_dma_buffer_dead_blocks_lock);
-
-	list_for_each_entry_safe(block, _block, &block_list, head)
-		iio_buffer_block_release(&block->kref);
-}
-static DECLARE_WORK(iio_dma_buffer_cleanup_work, iio_dma_buffer_cleanup_worker);
-
-static void iio_buffer_block_release_atomic(struct kref *kref)
-{
+struct iio_buffer_dma_buf_attachment {
+	struct scatterlist sgl;
+	struct sg_table sg_table;
 	struct iio_dma_buffer_block *block;
-	unsigned long flags;
-
-	block = container_of(kref, struct iio_dma_buffer_block, kref);
-
-	spin_lock_irqsave(&iio_dma_buffer_dead_blocks_lock, flags);
-	list_add_tail(&block->head, &iio_dma_buffer_dead_blocks);
-	spin_unlock_irqrestore(&iio_dma_buffer_dead_blocks_lock, flags);
-
-	schedule_work(&iio_dma_buffer_cleanup_work);
-}
-
-/*
- * Version of iio_buffer_block_put() that can be called from atomic context
- */
-static void iio_buffer_block_put_atomic(struct iio_dma_buffer_block *block)
-{
-	kref_put(&block->kref, iio_buffer_block_release_atomic);
-}
+};
 
 static struct iio_dma_buffer_queue *iio_buffer_to_queue(struct iio_buffer *buf)
 {
 	return container_of(buf, struct iio_dma_buffer_queue, buffer);
 }
 
+static struct iio_buffer_dma_buf_attachment *
+to_iio_buffer_dma_buf_attachment(struct sg_table *table)
+{
+	return container_of(table, struct iio_buffer_dma_buf_attachment, sg_table);
+}
+
+static void iio_buffer_block_get(struct iio_dma_buffer_block *block)
+{
+	get_dma_buf(block->dmabuf);
+}
+
+static void iio_buffer_block_put(struct iio_dma_buffer_block *block)
+{
+	dma_buf_put(block->dmabuf);
+}
+
+static int iio_buffer_dma_buf_attach(struct dma_buf *dbuf,
+				     struct dma_buf_attachment *at)
+{
+	at->priv = dbuf->priv;
+
+	return 0;
+}
+
+static struct sg_table *iio_buffer_dma_buf_map(struct dma_buf_attachment *at,
+					       enum dma_data_direction dma_dir)
+{
+	struct iio_dma_buffer_block *block = at->priv;
+	struct iio_buffer_dma_buf_attachment *dba;
+	int ret;
+
+	dba = kzalloc(sizeof(*dba), GFP_KERNEL);
+	if (!dba)
+		return ERR_PTR(-ENOMEM);
+
+	sg_init_one(&dba->sgl, block->vaddr, PAGE_ALIGN(block->size));
+	dba->sg_table.sgl = &dba->sgl;
+	dba->sg_table.nents = 1;
+	dba->block = block;
+
+	ret = dma_map_sgtable(at->dev, &dba->sg_table, dma_dir, 0);
+	if (ret) {
+		kfree(dba);
+		return ERR_PTR(ret);
+	}
+
+	return &dba->sg_table;
+}
+
+static void iio_buffer_dma_buf_unmap(struct dma_buf_attachment *at,
+				     struct sg_table *sg_table,
+				     enum dma_data_direction dma_dir)
+{
+	struct iio_buffer_dma_buf_attachment *dba =
+		to_iio_buffer_dma_buf_attachment(sg_table);
+
+	dma_unmap_sgtable(at->dev, &dba->sg_table, dma_dir, 0);
+	kfree(dba);
+}
+
+static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
+{
+	struct iio_dma_buffer_block *block = dbuf->priv;
+	struct iio_dma_buffer_queue *queue = block->queue;
+
+	WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
+
+	mutex_lock(&queue->lock);
+
+	dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
+			  block->vaddr, block->phys_addr);
+
+	kfree(block);
+
+	mutex_unlock(&queue->lock);
+	iio_buffer_put(&queue->buffer);
+}
+
+
+static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
+	.attach			= iio_buffer_dma_buf_attach,
+	.map_dma_buf		= iio_buffer_dma_buf_map,
+	.unmap_dma_buf		= iio_buffer_dma_buf_unmap,
+	.release		= iio_buffer_dma_buf_release,
+};
+
 static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
 	struct iio_dma_buffer_queue *queue, size_t size)
 {
 	struct iio_dma_buffer_block *block;
+	DEFINE_DMA_BUF_EXPORT_INFO(einfo);
+	struct dma_buf *dmabuf;
+	int err;
 
 	block = kzalloc(sizeof(*block), GFP_KERNEL);
 	if (!block)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
 		&block->phys_addr, GFP_KERNEL);
 	if (!block->vaddr) {
-		kfree(block);
-		return NULL;
+		err = -ENOMEM;
+		goto err_free_block;
 	}
 
+	einfo.ops = &iio_dma_buffer_dmabuf_ops;
+	einfo.size = PAGE_ALIGN(size);
+	einfo.priv = block;
+	einfo.flags = O_RDWR;
+
+	dmabuf = dma_buf_export(&einfo);
+	if (IS_ERR(dmabuf)) {
+		err = PTR_ERR(dmabuf);
+		goto err_free_dma;
+	}
+
+	block->dmabuf = dmabuf;
 	block->size = size;
 	block->bytes_used = size;
 	block->state = IIO_BLOCK_STATE_DONE;
 	block->queue = queue;
 	INIT_LIST_HEAD(&block->head);
-	kref_init(&block->kref);
 
 	iio_buffer_get(&queue->buffer);
 
 	return block;
+
+err_free_dma:
+	dma_free_coherent(queue->dev, PAGE_ALIGN(size),
+			  block->vaddr, block->phys_addr);
+err_free_block:
+	kfree(block);
+	return ERR_PTR(err);
 }
 
 static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
@@ -224,7 +271,7 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
 	_iio_dma_buffer_block_done(block);
 	spin_unlock_irqrestore(&queue->list_lock, flags);
 
-	iio_buffer_block_put_atomic(block);
+	iio_buffer_block_put(block);
 	iio_dma_buffer_queue_wake(queue);
 }
 EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
@@ -250,7 +297,8 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
 		list_del(&block->head);
 		block->bytes_used = 0;
 		_iio_dma_buffer_block_done(block);
-		iio_buffer_block_put_atomic(block);
+
+		iio_buffer_block_put(block);
 	}
 	spin_unlock_irqrestore(&queue->list_lock, flags);
 
@@ -340,11 +388,13 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
 
 		if (!block) {
 			block = iio_dma_buffer_alloc_block(queue, size);
-			if (!block) {
-				ret = -ENOMEM;
+			if (IS_ERR(block)) {
+				ret = PTR_ERR(block);
 				goto out_unlock;
 			}
 			queue->fileio.blocks[i] = block;
+
+			iio_buffer_block_get(block);
 		}
 
 		if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index 09c07d5563c0..22effd6cfbb6 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -8,7 +8,6 @@
 #define __INDUSTRIALIO_DMA_BUFFER_H__
 
 #include <linux/list.h>
-#include <linux/kref.h>
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
 #include <linux/iio/buffer_impl.h>
@@ -16,6 +15,7 @@
 struct iio_dma_buffer_queue;
 struct iio_dma_buffer_ops;
 struct device;
+struct dma_buf;
 
 /**
  * enum iio_block_state - State of a struct iio_dma_buffer_block
@@ -41,8 +41,8 @@ enum iio_block_state {
  * @vaddr: Virutal address of the blocks memory
  * @phys_addr: Physical address of the blocks memory
  * @queue: Parent DMA buffer queue
- * @kref: kref used to manage the lifetime of block
  * @state: Current state of the block
+ * @dmabuf: Underlying DMABUF object
  */
 struct iio_dma_buffer_block {
 	/* May only be accessed by the owner of the block */
@@ -58,13 +58,13 @@ struct iio_dma_buffer_block {
 	size_t size;
 	struct iio_dma_buffer_queue *queue;
 
-	/* Must not be accessed outside the core. */
-	struct kref kref;
 	/*
 	 * Must not be accessed outside the core. Access needs to hold
 	 * queue->list_lock if the block is not owned by the core.
 	 */
 	enum iio_block_state state;
+
+	struct dma_buf *dmabuf;
 };
 
 /**
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 10/15] iio: buffer-dma: Implement new DMABUF based userspace API
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (8 preceding siblings ...)
  2021-11-15 14:19 ` [PATCH 09/15] iio: buffer-dma: Use DMABUFs instead of custom solution Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-15 14:19 ` [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting Paul Cercueil
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Implement the two functions iio_dma_buffer_alloc_dmabuf() and
iio_dma_buffer_enqueue_dmabuf(), as well as all the necesary bits to
enable userspace access to the DMABUF objects.

These two functions are exported as GPL symbols so that IIO buffer
implementations can support the new DMABUF based userspace API.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/buffer/industrialio-buffer-dma.c | 273 ++++++++++++++++++-
 include/linux/iio/buffer-dma.h               |  13 +
 2 files changed, 279 insertions(+), 7 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index adb20434f2d2..92356ee02f30 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -15,7 +15,9 @@
 #include <linux/iio/buffer_impl.h>
 #include <linux/iio/buffer-dma.h>
 #include <linux/dma-buf.h>
+#include <linux/dma-fence.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-resv.h>
 #include <linux/sizes.h>
 
 /*
@@ -97,6 +99,18 @@ struct iio_buffer_dma_buf_attachment {
 	struct iio_dma_buffer_block *block;
 };
 
+struct iio_buffer_dma_fence {
+	struct dma_fence base;
+	struct iio_dma_buffer_block *block;
+	spinlock_t lock;
+};
+
+static struct iio_buffer_dma_fence *
+to_iio_buffer_dma_fence(struct dma_fence *fence)
+{
+	return container_of(fence, struct iio_buffer_dma_fence, base);
+}
+
 static struct iio_dma_buffer_queue *iio_buffer_to_queue(struct iio_buffer *buf)
 {
 	return container_of(buf, struct iio_dma_buffer_queue, buffer);
@@ -118,6 +132,48 @@ static void iio_buffer_block_put(struct iio_dma_buffer_block *block)
 	dma_buf_put(block->dmabuf);
 }
 
+static const char *
+iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence)
+{
+	struct iio_buffer_dma_fence *iio_fence = to_iio_buffer_dma_fence(fence);
+
+	return dev_name(iio_fence->block->queue->dev);
+}
+
+static void iio_buffer_dma_fence_release(struct dma_fence *fence)
+{
+	struct iio_buffer_dma_fence *iio_fence = to_iio_buffer_dma_fence(fence);
+
+	kfree(iio_fence);
+}
+
+static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
+	.get_driver_name	= iio_buffer_dma_fence_get_driver_name,
+	.get_timeline_name	= iio_buffer_dma_fence_get_driver_name,
+	.release		= iio_buffer_dma_fence_release,
+};
+
+static struct dma_fence *
+iio_dma_buffer_create_dma_fence(struct iio_dma_buffer_block *block)
+{
+	struct iio_buffer_dma_fence *fence;
+	u64 ctx;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return ERR_PTR(-ENOMEM);
+
+	fence->block = block;
+	spin_lock_init(&fence->lock);
+
+	ctx = dma_fence_context_alloc(1);
+
+	dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
+		       &fence->lock, ctx, 0);
+
+	return &fence->base;
+}
+
 static int iio_buffer_dma_buf_attach(struct dma_buf *dbuf,
 				     struct dma_buf_attachment *at)
 {
@@ -162,10 +218,26 @@ static void iio_buffer_dma_buf_unmap(struct dma_buf_attachment *at,
 	kfree(dba);
 }
 
+static int iio_buffer_dma_buf_mmap(struct dma_buf *dbuf,
+				   struct vm_area_struct *vma)
+{
+	struct iio_dma_buffer_block *block = dbuf->priv;
+	struct device *dev = block->queue->dev;
+
+	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+
+	if (vma->vm_ops->open)
+		vma->vm_ops->open(vma);
+
+	return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
+			      virt_to_page(block->vaddr));
+}
+
 static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
 {
 	struct iio_dma_buffer_block *block = dbuf->priv;
 	struct iio_dma_buffer_queue *queue = block->queue;
+	bool is_fileio = block->fileio;
 
 	WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
 
@@ -176,20 +248,51 @@ static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
 
 	kfree(block);
 
+	queue->num_blocks--;
+	if (is_fileio)
+		queue->num_fileio_blocks--;
 	mutex_unlock(&queue->lock);
 	iio_buffer_put(&queue->buffer);
 }
 
+static int iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
+					       enum dma_data_direction dma_dir)
+{
+	struct iio_dma_buffer_block *block = dbuf->priv;
+	struct device *dev = block->queue->dev;
+
+	/* We only need to invalidate the cache for input buffers */
+	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
+		dma_sync_single_for_cpu(dev, block->phys_addr, block->size, dma_dir);
+
+	return 0;
+}
+
+static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
+					     enum dma_data_direction dma_dir)
+{
+	struct iio_dma_buffer_block *block = dbuf->priv;
+	struct device *dev = block->queue->dev;
+
+	/* We only need to sync the cache for output buffers */
+	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
+		dma_sync_single_for_device(dev, block->phys_addr, block->size, dma_dir);
+
+	return 0;
+}
 
 static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
 	.attach			= iio_buffer_dma_buf_attach,
 	.map_dma_buf		= iio_buffer_dma_buf_map,
 	.unmap_dma_buf		= iio_buffer_dma_buf_unmap,
+	.mmap			= iio_buffer_dma_buf_mmap,
 	.release		= iio_buffer_dma_buf_release,
+	.begin_cpu_access	= iio_buffer_dma_buf_begin_cpu_access,
+	.end_cpu_access		= iio_buffer_dma_buf_end_cpu_access,
 };
 
 static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
-	struct iio_dma_buffer_queue *queue, size_t size)
+	struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
 {
 	struct iio_dma_buffer_block *block;
 	DEFINE_DMA_BUF_EXPORT_INFO(einfo);
@@ -223,10 +326,15 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
 	block->bytes_used = size;
 	block->state = IIO_BLOCK_STATE_DONE;
 	block->queue = queue;
+	block->fileio = fileio;
 	INIT_LIST_HEAD(&block->head);
 
 	iio_buffer_get(&queue->buffer);
 
+	queue->num_blocks++;
+	if (fileio)
+		queue->num_fileio_blocks++;
+
 	return block;
 
 err_free_dma:
@@ -265,14 +373,22 @@ static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue *queue)
 void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
 {
 	struct iio_dma_buffer_queue *queue = block->queue;
+	struct dma_resv *resv = block->dmabuf->resv;
+	struct dma_fence *fence;
 	unsigned long flags;
 
 	spin_lock_irqsave(&queue->list_lock, flags);
 	_iio_dma_buffer_block_done(block);
 	spin_unlock_irqrestore(&queue->list_lock, flags);
 
+	fence = dma_resv_excl_fence(resv);
+	if (fence)
+		dma_fence_signal(fence);
+	dma_resv_unlock(resv);
+
 	iio_buffer_block_put(block);
-	iio_dma_buffer_queue_wake(queue);
+	if (queue->fileio.enabled)
+		iio_dma_buffer_queue_wake(queue);
 }
 EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
 
@@ -298,6 +414,8 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
 		block->bytes_used = 0;
 		_iio_dma_buffer_block_done(block);
 
+		if (dma_resv_is_locked(block->dmabuf->resv))
+			dma_resv_unlock(block->dmabuf->resv);
 		iio_buffer_block_put(block);
 	}
 	spin_unlock_irqrestore(&queue->list_lock, flags);
@@ -323,6 +441,12 @@ static bool iio_dma_block_reusable(struct iio_dma_buffer_block *block)
 	}
 }
 
+static bool iio_dma_buffer_fileio_mode(struct iio_dma_buffer_queue *queue)
+{
+	return queue->fileio.enabled ||
+		queue->num_blocks == queue->num_fileio_blocks;
+}
+
 /**
  * iio_dma_buffer_request_update() - DMA buffer request_update callback
  * @buffer: The buffer which to request an update
@@ -349,6 +473,12 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
 
 	mutex_lock(&queue->lock);
 
+	queue->fileio.enabled = iio_dma_buffer_fileio_mode(queue);
+
+	/* If DMABUFs were created, disable fileio interface */
+	if (!queue->fileio.enabled)
+		goto out_unlock;
+
 	/* Allocations are page aligned */
 	if (PAGE_ALIGN(queue->fileio.block_size) == PAGE_ALIGN(size))
 		try_reuse = true;
@@ -387,7 +517,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
 		}
 
 		if (!block) {
-			block = iio_dma_buffer_alloc_block(queue, size);
+			block = iio_dma_buffer_alloc_block(queue, size, true);
 			if (IS_ERR(block)) {
 				ret = PTR_ERR(block);
 				goto out_unlock;
@@ -444,6 +574,8 @@ static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,
 
 	block->state = IIO_BLOCK_STATE_ACTIVE;
 	iio_buffer_block_get(block);
+	dma_resv_lock(block->dmabuf->resv, NULL);
+
 	ret = queue->ops->submit(queue, block);
 	if (ret) {
 		/*
@@ -480,12 +612,18 @@ int iio_dma_buffer_enable(struct iio_buffer *buffer,
 	mutex_lock(&queue->lock);
 	queue->active = true;
 	queue->fileio.next_dequeue = 0;
+	queue->fileio.enabled = iio_dma_buffer_fileio_mode(queue);
 
-	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
-		block = queue->fileio.blocks[i];
+	dev_dbg(queue->dev, "Buffer enabled in %s mode\n",
+		queue->fileio.enabled ? "fileio" : "dmabuf");
 
-		if (block->state == IIO_BLOCK_STATE_QUEUED)
-			iio_dma_buffer_submit_block(queue, block);
+	if (queue->fileio.enabled) {
+		for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
+			block = queue->fileio.blocks[i];
+
+			if (block->state == IIO_BLOCK_STATE_QUEUED)
+				iio_dma_buffer_submit_block(queue, block);
+		}
 	}
 	mutex_unlock(&queue->lock);
 
@@ -507,6 +645,7 @@ int iio_dma_buffer_disable(struct iio_buffer *buffer,
 	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
 
 	mutex_lock(&queue->lock);
+	queue->fileio.enabled = false;
 	queue->active = false;
 
 	if (queue->ops && queue->ops->abort)
@@ -565,6 +704,11 @@ static int iio_dma_buffer_io(struct iio_buffer *buffer,
 
 	mutex_lock(&queue->lock);
 
+	if (!queue->fileio.enabled) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	if (!queue->fileio.active_block) {
 		block = iio_dma_buffer_dequeue(queue);
 		if (block == NULL) {
@@ -681,6 +825,121 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
 }
 EXPORT_SYMBOL_GPL(iio_dma_buffer_data_available);
 
+int iio_dma_buffer_alloc_dmabuf(struct iio_buffer *buffer,
+				struct iio_dmabuf_alloc_req *req)
+{
+	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
+	struct iio_dma_buffer_block *block;
+	int ret = 0;
+
+	mutex_lock(&queue->lock);
+
+	/*
+	 * If the buffer is enabled and in fileio mode new blocks can't be
+	 * allocated.
+	 */
+	if (queue->fileio.enabled) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	if (!req->size || req->size > SIZE_MAX) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* Free memory that might be in use for fileio mode */
+	iio_dma_buffer_fileio_free(queue);
+
+	block = iio_dma_buffer_alloc_block(queue, req->size, false);
+	if (IS_ERR(block)) {
+		ret = PTR_ERR(block);
+		goto out_unlock;
+	}
+
+	ret = dma_buf_fd(block->dmabuf, O_CLOEXEC);
+	if (ret < 0) {
+		dma_buf_put(block->dmabuf);
+		goto out_unlock;
+	}
+
+out_unlock:
+	mutex_unlock(&queue->lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iio_dma_buffer_alloc_dmabuf);
+
+int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
+				  struct iio_dmabuf *iio_dmabuf)
+{
+	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
+	struct iio_dma_buffer_block *dma_block;
+	struct dma_fence *fence;
+	struct dma_buf *dmabuf;
+	int ret = 0;
+
+	mutex_lock(&queue->lock);
+
+	/* If in fileio mode buffers can't be enqueued. */
+	if (queue->fileio.enabled) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	dmabuf = dma_buf_get(iio_dmabuf->fd);
+	if (IS_ERR(dmabuf)) {
+		ret = PTR_ERR(dmabuf);
+		goto out_unlock;
+	}
+
+	if (dmabuf->ops != &iio_dma_buffer_dmabuf_ops) {
+		dev_err(queue->dev, "importing DMABUFs from other drivers is not yet supported.\n");
+		ret = -EINVAL;
+		goto out_dma_buf_put;
+	}
+
+	dma_block = dmabuf->priv;
+
+	if (iio_dmabuf->bytes_used > dma_block->size) {
+		ret = -EINVAL;
+		goto out_dma_buf_put;
+	}
+
+	dma_block->bytes_used = iio_dmabuf->bytes_used ?: dma_block->size;
+
+	switch (dma_block->state) {
+	case IIO_BLOCK_STATE_QUEUED:
+		/* Nothing to do */
+		goto out_unlock;
+	case IIO_BLOCK_STATE_DONE:
+		break;
+	default:
+		ret = -EBUSY;
+		goto out_dma_buf_put;
+	}
+
+	fence = iio_dma_buffer_create_dma_fence(dma_block);
+	if (IS_ERR(fence)) {
+		ret = PTR_ERR(fence);
+		goto out_dma_buf_put;
+	}
+
+	dma_resv_lock(dmabuf->resv, NULL);
+	dma_resv_add_excl_fence(dmabuf->resv, fence);
+	dma_resv_unlock(dmabuf->resv);
+
+	iio_dma_buffer_enqueue(queue, dma_block);
+
+out_dma_buf_put:
+	dma_buf_put(dmabuf);
+out_unlock:
+	mutex_unlock(&queue->lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iio_dma_buffer_enqueue_dmabuf);
+
 /**
  * iio_dma_buffer_set_bytes_per_datum() - DMA buffer set_bytes_per_datum callback
  * @buffer: Buffer to set the bytes-per-datum for
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index 22effd6cfbb6..85e55fe35282 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -42,6 +42,7 @@ enum iio_block_state {
  * @phys_addr: Physical address of the blocks memory
  * @queue: Parent DMA buffer queue
  * @state: Current state of the block
+ * @fileio: True if this buffer is used for fileio mode
  * @dmabuf: Underlying DMABUF object
  */
 struct iio_dma_buffer_block {
@@ -64,6 +65,7 @@ struct iio_dma_buffer_block {
 	 */
 	enum iio_block_state state;
 
+	bool fileio;
 	struct dma_buf *dmabuf;
 };
 
@@ -74,6 +76,7 @@ struct iio_dma_buffer_block {
  * @pos: Read offset in the active block
  * @block_size: Size of each block
  * @next_dequeue: index of next block that will be dequeued
+ * @enabled: Whether the buffer is operating in fileio mode
  */
 struct iio_dma_buffer_queue_fileio {
 	struct iio_dma_buffer_block *blocks[2];
@@ -82,6 +85,7 @@ struct iio_dma_buffer_queue_fileio {
 	size_t block_size;
 
 	unsigned int next_dequeue;
+	bool enabled;
 };
 
 /**
@@ -96,6 +100,8 @@ struct iio_dma_buffer_queue_fileio {
  *   list and typically also a list of active blocks in the part that handles
  *   the DMA controller
  * @active: Whether the buffer is currently active
+ * @num_blocks: Total number of blocks in the queue
+ * @num_fileio_blocks: Number of blocks used for fileio interface
  * @fileio: FileIO state
  */
 struct iio_dma_buffer_queue {
@@ -107,6 +113,8 @@ struct iio_dma_buffer_queue {
 	spinlock_t list_lock;
 
 	bool active;
+	unsigned int num_blocks;
+	unsigned int num_fileio_blocks;
 
 	struct iio_dma_buffer_queue_fileio fileio;
 };
@@ -149,4 +157,9 @@ static inline size_t iio_dma_buffer_space_available(struct iio_buffer *buffer)
 	return iio_dma_buffer_data_available(buffer);
 }
 
+int iio_dma_buffer_alloc_dmabuf(struct iio_buffer *buffer,
+				struct iio_dmabuf_alloc_req *req);
+int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
+				  struct iio_dmabuf *dmabuf);
+
 #endif
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (9 preceding siblings ...)
  2021-11-15 14:19 ` [PATCH 10/15] iio: buffer-dma: Implement new DMABUF based userspace API Paul Cercueil
@ 2021-11-15 14:19 ` Paul Cercueil
  2021-11-18 11:45   ` Paul Cercueil
  2021-11-21 15:00   ` Jonathan Cameron
  2021-11-15 14:22 ` [PATCH 12/15] iio: buffer-dmaengine: Support new DMABUF based userspace API Paul Cercueil
                   ` (2 subsequent siblings)
  13 siblings, 2 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

We can be certain that the input buffers will only be accessed by
userspace for reading, and output buffers will mostly be accessed by
userspace for writing.

Therefore, it makes more sense to use only fully cached input buffers,
and to use the write-combine cache coherency setting for output buffers.

This boosts performance, as the data written to the output buffers does
not have to be sync'd for coherency. It will halve performance if the
userspace application tries to read from the output buffer, but this
should never happen.

Since we don't need to sync the cache when disabling CPU access either
for input buffers or output buffers, the .end_cpu_access() callback can
be dropped completely.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/buffer/industrialio-buffer-dma.c | 82 +++++++++++++-------
 1 file changed, 54 insertions(+), 28 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index 92356ee02f30..fb39054d8c15 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -229,8 +229,33 @@ static int iio_buffer_dma_buf_mmap(struct dma_buf *dbuf,
 	if (vma->vm_ops->open)
 		vma->vm_ops->open(vma);
 
-	return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
-			      virt_to_page(block->vaddr));
+	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
+		/*
+		 * With an input buffer, userspace will only read the data and
+		 * never write. We can mmap the buffer fully cached.
+		 */
+		return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
+				      virt_to_page(block->vaddr));
+	} else {
+		/*
+		 * With an output buffer, userspace will only write the data
+		 * and should rarely (if never) read from it. It is better to
+		 * use write-combine in this case.
+		 */
+		return dma_mmap_wc(dev, vma, block->vaddr, block->phys_addr,
+				   vma->vm_end - vma->vm_start);
+	}
+}
+
+static void iio_dma_buffer_free_dmamem(struct iio_dma_buffer_block *block)
+{
+	struct device *dev = block->queue->dev;
+	size_t size = PAGE_ALIGN(block->size);
+
+	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
+		dma_free_coherent(dev, size, block->vaddr, block->phys_addr);
+	else
+		dma_free_wc(dev, size, block->vaddr, block->phys_addr);
 }
 
 static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
@@ -243,9 +268,7 @@ static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
 
 	mutex_lock(&queue->lock);
 
-	dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
-			  block->vaddr, block->phys_addr);
-
+	iio_dma_buffer_free_dmamem(block);
 	kfree(block);
 
 	queue->num_blocks--;
@@ -268,19 +291,6 @@ static int iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
 	return 0;
 }
 
-static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
-					     enum dma_data_direction dma_dir)
-{
-	struct iio_dma_buffer_block *block = dbuf->priv;
-	struct device *dev = block->queue->dev;
-
-	/* We only need to sync the cache for output buffers */
-	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
-		dma_sync_single_for_device(dev, block->phys_addr, block->size, dma_dir);
-
-	return 0;
-}
-
 static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
 	.attach			= iio_buffer_dma_buf_attach,
 	.map_dma_buf		= iio_buffer_dma_buf_map,
@@ -288,9 +298,28 @@ static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
 	.mmap			= iio_buffer_dma_buf_mmap,
 	.release		= iio_buffer_dma_buf_release,
 	.begin_cpu_access	= iio_buffer_dma_buf_begin_cpu_access,
-	.end_cpu_access		= iio_buffer_dma_buf_end_cpu_access,
 };
 
+static int iio_dma_buffer_alloc_dmamem(struct iio_dma_buffer_block *block)
+{
+	struct device *dev = block->queue->dev;
+	size_t size = PAGE_ALIGN(block->size);
+
+	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
+		block->vaddr = dma_alloc_coherent(dev, size,
+						  &block->phys_addr,
+						  GFP_KERNEL);
+	} else {
+		block->vaddr = dma_alloc_wc(dev, size,
+					    &block->phys_addr,
+					    GFP_KERNEL);
+	}
+	if (!block->vaddr)
+		return -ENOMEM;
+
+	return 0;
+}
+
 static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
 	struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
 {
@@ -303,12 +332,12 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
 	if (!block)
 		return ERR_PTR(-ENOMEM);
 
-	block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
-		&block->phys_addr, GFP_KERNEL);
-	if (!block->vaddr) {
-		err = -ENOMEM;
+	block->size = size;
+	block->queue = queue;
+
+	err = iio_dma_buffer_alloc_dmamem(block);
+	if (err)
 		goto err_free_block;
-	}
 
 	einfo.ops = &iio_dma_buffer_dmabuf_ops;
 	einfo.size = PAGE_ALIGN(size);
@@ -322,10 +351,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
 	}
 
 	block->dmabuf = dmabuf;
-	block->size = size;
 	block->bytes_used = size;
 	block->state = IIO_BLOCK_STATE_DONE;
-	block->queue = queue;
 	block->fileio = fileio;
 	INIT_LIST_HEAD(&block->head);
 
@@ -338,8 +365,7 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
 	return block;
 
 err_free_dma:
-	dma_free_coherent(queue->dev, PAGE_ALIGN(size),
-			  block->vaddr, block->phys_addr);
+	iio_dma_buffer_free_dmamem(block);
 err_free_block:
 	kfree(block);
 	return ERR_PTR(err);
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 12/15] iio: buffer-dmaengine: Support new DMABUF based userspace API
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (10 preceding siblings ...)
  2021-11-15 14:19 ` [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting Paul Cercueil
@ 2021-11-15 14:22 ` Paul Cercueil
  2021-11-15 14:22   ` [PATCH 13/15] iio: core: Add support for cyclic buffers Paul Cercueil
                     ` (2 more replies)
  2021-11-15 14:37 ` [PATCH 00/15] iio: buffer-dma: write() and new " Daniel Vetter
  2021-11-21 13:57 ` Jonathan Cameron
  13 siblings, 3 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:22 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Use the functions provided by the buffer-dma core to implement the
DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/buffer/industrialio-buffer-dmaengine.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index 5cde8fd81c7f..57a8b2e4ba3c 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -133,6 +133,9 @@ static const struct iio_buffer_access_funcs iio_dmaengine_buffer_ops = {
 	.space_available = iio_dma_buffer_space_available,
 	.release = iio_dmaengine_buffer_release,
 
+	.alloc_dmabuf = iio_dma_buffer_alloc_dmabuf,
+	.enqueue_dmabuf = iio_dma_buffer_enqueue_dmabuf,
+
 	.modes = INDIO_BUFFER_HARDWARE,
 	.flags = INDIO_BUFFER_FLAG_FIXED_WATERMARK,
 };
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 13/15] iio: core: Add support for cyclic buffers
  2021-11-15 14:22 ` [PATCH 12/15] iio: buffer-dmaengine: Support new DMABUF based userspace API Paul Cercueil
@ 2021-11-15 14:22   ` Paul Cercueil
  2021-11-16  9:50     ` Alexandru Ardelean
  2021-11-15 14:22   ` [PATCH 14/15] iio: buffer-dmaengine: " Paul Cercueil
  2021-11-15 14:22   ` [PATCH 15/15] Documentation: iio: Document high-speed DMABUF based API Paul Cercueil
  2 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:22 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Introduce a new flag IIO_BUFFER_DMABUF_CYCLIC in the "flags" field of
the iio_dmabuf uapi structure.

When set, the DMABUF enqueued with the enqueue ioctl will be endlessly
repeated on the TX output, until the buffer is disabled.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/industrialio-buffer.c | 5 +++++
 include/uapi/linux/iio/buffer.h   | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
index 30910e6c2346..41bc51c88002 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -1600,6 +1600,11 @@ static int iio_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
 	if (dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
 		return -EINVAL;
 
+	/* Cyclic flag is only supported on output buffers */
+	if ((dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC) &&
+	    buffer->direction != IIO_BUFFER_DIRECTION_OUT)
+		return -EINVAL;
+
 	return buffer->access->enqueue_dmabuf(buffer, &dmabuf);
 }
 
diff --git a/include/uapi/linux/iio/buffer.h b/include/uapi/linux/iio/buffer.h
index e4621b926262..2d541d038c02 100644
--- a/include/uapi/linux/iio/buffer.h
+++ b/include/uapi/linux/iio/buffer.h
@@ -7,7 +7,8 @@
 
 #include <linux/types.h>
 
-#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS	0x00000000
+#define IIO_BUFFER_DMABUF_CYCLIC		(1 << 0)
+#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS	0x00000001
 
 /**
  * struct iio_dmabuf_alloc_req - Descriptor for allocating IIO DMABUFs
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 14/15] iio: buffer-dmaengine: Add support for cyclic buffers
  2021-11-15 14:22 ` [PATCH 12/15] iio: buffer-dmaengine: Support new DMABUF based userspace API Paul Cercueil
  2021-11-15 14:22   ` [PATCH 13/15] iio: core: Add support for cyclic buffers Paul Cercueil
@ 2021-11-15 14:22   ` Paul Cercueil
  2021-11-16  9:50     ` Alexandru Ardelean
  2021-11-15 14:22   ` [PATCH 15/15] Documentation: iio: Document high-speed DMABUF based API Paul Cercueil
  2 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:22 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Handle the IIO_BUFFER_DMABUF_CYCLIC flag to support cyclic buffers.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/iio/buffer/industrialio-buffer-dma.c      |  1 +
 .../iio/buffer/industrialio-buffer-dmaengine.c    | 15 ++++++++++++---
 include/linux/iio/buffer-dma.h                    |  3 +++
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
index fb39054d8c15..6658f103ee17 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -933,6 +933,7 @@ int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
 	}
 
 	dma_block->bytes_used = iio_dmabuf->bytes_used ?: dma_block->size;
+	dma_block->cyclic = iio_dmabuf->flags & IIO_BUFFER_DMABUF_CYCLIC;
 
 	switch (dma_block->state) {
 	case IIO_BLOCK_STATE_QUEUED:
diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index 57a8b2e4ba3c..952e2160a11e 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -81,9 +81,18 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
 	if (!block->bytes_used || block->bytes_used > max_size)
 		return -EINVAL;
 
-	desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
-		block->phys_addr, block->bytes_used, dma_dir,
-		DMA_PREP_INTERRUPT);
+	if (block->cyclic) {
+		desc = dmaengine_prep_dma_cyclic(dmaengine_buffer->chan,
+						 block->phys_addr,
+						 block->size,
+						 block->bytes_used,
+						 dma_dir, 0);
+	} else {
+		desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
+						   block->phys_addr,
+						   block->bytes_used, dma_dir,
+						   DMA_PREP_INTERRUPT);
+	}
 	if (!desc)
 		return -ENOMEM;
 
diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
index 85e55fe35282..27639fdf7b54 100644
--- a/include/linux/iio/buffer-dma.h
+++ b/include/linux/iio/buffer-dma.h
@@ -42,6 +42,7 @@ enum iio_block_state {
  * @phys_addr: Physical address of the blocks memory
  * @queue: Parent DMA buffer queue
  * @state: Current state of the block
+ * @cyclic: True if this is a cyclic buffer
  * @fileio: True if this buffer is used for fileio mode
  * @dmabuf: Underlying DMABUF object
  */
@@ -65,6 +66,8 @@ struct iio_dma_buffer_block {
 	 */
 	enum iio_block_state state;
 
+	bool cyclic;
+
 	bool fileio;
 	struct dma_buf *dmabuf;
 };
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 15/15] Documentation: iio: Document high-speed DMABUF based API
  2021-11-15 14:22 ` [PATCH 12/15] iio: buffer-dmaengine: Support new DMABUF based userspace API Paul Cercueil
  2021-11-15 14:22   ` [PATCH 13/15] iio: core: Add support for cyclic buffers Paul Cercueil
  2021-11-15 14:22   ` [PATCH 14/15] iio: buffer-dmaengine: " Paul Cercueil
@ 2021-11-15 14:22   ` Paul Cercueil
  2021-11-21 15:10     ` Jonathan Cameron
  2 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:22 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, Paul Cercueil

Document the new DMABUF based API.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 Documentation/driver-api/dma-buf.rst |  2 +
 Documentation/iio/dmabuf_api.rst     | 94 ++++++++++++++++++++++++++++
 Documentation/iio/index.rst          |  2 +
 3 files changed, 98 insertions(+)
 create mode 100644 Documentation/iio/dmabuf_api.rst

diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index 2cd7db82d9fe..d3c9b58d2706 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -1,3 +1,5 @@
+.. _dma-buf:
+
 Buffer Sharing and Synchronization
 ==================================
 
diff --git a/Documentation/iio/dmabuf_api.rst b/Documentation/iio/dmabuf_api.rst
new file mode 100644
index 000000000000..b4e120a4ef0c
--- /dev/null
+++ b/Documentation/iio/dmabuf_api.rst
@@ -0,0 +1,94 @@
+===================================
+High-speed DMABUF interface for IIO
+===================================
+
+1. Overview
+===========
+
+The Industrial I/O subsystem supports access to buffers through a file-based
+interface, with read() and write() access calls through the IIO device's dev
+node.
+
+It additionally supports a DMABUF based interface, where the userspace
+application can allocate and append DMABUF objects to the buffer's queue.
+
+The advantage of this DMABUF based interface vs. the fileio
+interface, is that it avoids an extra copy of the data between the
+kernel and userspace. This is particularly userful for high-speed
+devices which produce several megabytes or even gigabytes of data per
+second.
+
+The data in this DMABUF interface is managed at the granularity of
+DMABUF objects. Reducing the granularity from byte level to block level
+is done to reduce the userspace-kernelspace synchronization overhead
+since performing syscalls for each byte at a few Mbps is just not
+feasible.
+
+This of course leads to a slightly increased latency. For this reason an
+application can choose the size of the DMABUFs as well as how many it
+allocates. E.g. two DMABUFs would be a traditional double buffering
+scheme. But using a higher number might be necessary to avoid
+underflow/overflow situations in the presence of scheduling latencies.
+
+2. User API
+===========
+
+``IIO_BUFFER_DMABUF_ALLOC_IOCTL(struct iio_dmabuf_alloc_req *)``
+----------------------------------------------------------------
+
+Each call will allocate a new DMABUF object. The return value (if not
+a negative errno value as error) will be the file descriptor of the new
+DMABUF.
+
+``IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *)``
+--------------------------------------------------------
+
+Place the DMABUF object into the queue pending for hardware process.
+
+These two IOCTLs have to be performed on the IIO buffer's file
+descriptor (either opened from the corresponding /dev/iio:deviceX, or
+obtained using the `IIO_BUFFER_GET_FD_IOCTL` ioctl).
+
+3. Usage
+========
+
+To access the data stored in a block by userspace the block must be
+mapped to the process's memory. This is done by calling mmap() on the
+DMABUF's file descriptor.
+
+Before accessing the data through the map, you must use the
+DMA_BUF_IOCTL_SYNC(struct dma_buf_sync *) ioctl, with the
+DMA_BUF_SYNC_START flag, to make sure that the data is available.
+This call may block until the hardware is done with this block. Once
+you are done reading or writing the data, you must use this ioctl again
+with the DMA_BUF_SYNC_END flag, before enqueueing the DMABUF to the
+kernel's queue.
+
+If you need to know when the hardware is done with a DMABUF, you can
+poll its file descriptor for the EPOLLOUT event.
+
+Finally, to destroy a DMABUF object, simply call close() on its file
+descriptor.
+
+For more information about manipulating DMABUF objects, see: :ref:`dma-buf`.
+
+A typical workflow for the new interface is:
+
+    for block in blocks:
+      DMABUF_ALLOC block
+      mmap block
+
+    enable buffer
+
+    while !done
+      for block in blocks:
+        DMABUF_ENQUEUE block
+
+        DMABUF_SYNC_START block
+        process data
+        DMABUF_SYNC_END block
+
+    disable buffer
+
+    for block in blocks:
+      close block
diff --git a/Documentation/iio/index.rst b/Documentation/iio/index.rst
index 58b7a4ebac51..9ce799fbf262 100644
--- a/Documentation/iio/index.rst
+++ b/Documentation/iio/index.rst
@@ -10,3 +10,5 @@ Industrial I/O
    iio_configfs
 
    ep93xx_adc
+
+   dmabuf_api
-- 
2.33.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (11 preceding siblings ...)
  2021-11-15 14:22 ` [PATCH 12/15] iio: buffer-dmaengine: Support new DMABUF based userspace API Paul Cercueil
@ 2021-11-15 14:37 ` Daniel Vetter
  2021-11-15 14:57   ` Paul Cercueil
  2021-11-21 13:57 ` Jonathan Cameron
  13 siblings, 1 reply; 59+ messages in thread
From: Daniel Vetter @ 2021-11-15 14:37 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Michael Hennerich, linux-iio, linux-kernel,
	dri-devel, Christian König, linaro-mm-sig,
	Alexandru Ardelean, linux-media

On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
> Hi Jonathan,
> 
> This patchset introduces a new userspace interface based on DMABUF
> objects, to complement the existing fileio based API.
> 
> The advantage of this DMABUF based interface vs. the fileio
> interface, is that it avoids an extra copy of the data between the
> kernel and userspace. This is particularly userful for high-speed
> devices which produce several megabytes or even gigabytes of data per
> second.
> 
> The first few patches [01/15] to [03/15] are not really related, but
> allow to reduce the size of the patches that introduce the new API.
> 
> Patch [04/15] to [06/15] enables write() support to the buffer-dma
> implementation of the buffer API, to continue the work done by
> Mihail Chindris.
> 
> Patches [07/15] to [12/15] introduce the new DMABUF based API.
> 
> Patches [13/15] and [14/15] add support for cyclic buffers, only through
> the new API. A cyclic buffer will be repeated on the output until the
> buffer is disabled.
> 
> Patch [15/15] adds documentation about the new API.
> 
> For now, the API allows you to alloc DMABUF objects and mmap() them to
> read or write the samples. It does not yet allow to import DMABUFs
> parented to other subsystems, but that should eventually be possible
> once it's wired.
> 
> This patchset is inspired by the "mmap interface" that was previously
> submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would be
> great if I could get a review from you guys. Alexandru's commit was
> signed with his @analog.com address but he doesn't work at ADI anymore,
> so I believe I'll need him to sign with a new email.

Why dma-buf? dma-buf looks like something super generic and useful, until
you realize that there's a metric ton of gpu/accelerator bagage piled in.
So unless buffer sharing with a gpu/video/accel/whatever device is the
goal here, and it's just for a convenient way to get at buffer handles,
this doesn't sound like a good idea.

Also if the idea is to this with gpus/accelerators then I'd really like to
see the full thing, since most likely at that point you also want
dma_fence. And once we talk dma_fence things get truly horrible from a
locking pov :-( Or well, just highly constrained and I get to review what
iio is doing with these buffers to make sure it all fits.

Cheers, Daniel

> 
> Cheers,
> -Paul
> 
> Alexandru Ardelean (1):
>   iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> 
> Paul Cercueil (14):
>   iio: buffer-dma: Get rid of incoming/outgoing queues
>   iio: buffer-dma: Remove unused iio_buffer_block struct
>   iio: buffer-dma: Use round_down() instead of rounddown()
>   iio: buffer-dma: Enable buffer write support
>   iio: buffer-dmaengine: Support specifying buffer direction
>   iio: buffer-dmaengine: Enable write support
>   iio: core: Add new DMABUF interface infrastructure
>   iio: buffer-dma: Use DMABUFs instead of custom solution
>   iio: buffer-dma: Implement new DMABUF based userspace API
>   iio: buffer-dma: Boost performance using write-combine cache setting
>   iio: buffer-dmaengine: Support new DMABUF based userspace API
>   iio: core: Add support for cyclic buffers
>   iio: buffer-dmaengine: Add support for cyclic buffers
>   Documentation: iio: Document high-speed DMABUF based API
> 
>  Documentation/driver-api/dma-buf.rst          |   2 +
>  Documentation/iio/dmabuf_api.rst              |  94 +++
>  Documentation/iio/index.rst                   |   2 +
>  drivers/iio/adc/adi-axi-adc.c                 |   3 +-
>  drivers/iio/buffer/industrialio-buffer-dma.c  | 670 ++++++++++++++----
>  .../buffer/industrialio-buffer-dmaengine.c    |  42 +-
>  drivers/iio/industrialio-buffer.c             |  49 ++
>  include/linux/iio/buffer-dma.h                |  43 +-
>  include/linux/iio/buffer-dmaengine.h          |   5 +-
>  include/linux/iio/buffer_impl.h               |   8 +
>  include/uapi/linux/iio/buffer.h               |  30 +
>  11 files changed, 783 insertions(+), 165 deletions(-)
>  create mode 100644 Documentation/iio/dmabuf_api.rst
> 
> -- 
> 2.33.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API
  2021-11-15 14:37 ` [PATCH 00/15] iio: buffer-dma: write() and new " Daniel Vetter
@ 2021-11-15 14:57   ` Paul Cercueil
  2021-11-16 16:02     ` Daniel Vetter
  0 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-15 14:57 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jonathan Cameron, Michael Hennerich, linux-iio, linux-kernel,
	dri-devel, Christian König, linaro-mm-sig,
	Alexandru Ardelean, linux-media

Hi Daniel,

Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter 
<daniel@ffwll.ch> a écrit :
> On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
>>  Hi Jonathan,
>> 
>>  This patchset introduces a new userspace interface based on DMABUF
>>  objects, to complement the existing fileio based API.
>> 
>>  The advantage of this DMABUF based interface vs. the fileio
>>  interface, is that it avoids an extra copy of the data between the
>>  kernel and userspace. This is particularly userful for high-speed
>>  devices which produce several megabytes or even gigabytes of data 
>> per
>>  second.
>> 
>>  The first few patches [01/15] to [03/15] are not really related, but
>>  allow to reduce the size of the patches that introduce the new API.
>> 
>>  Patch [04/15] to [06/15] enables write() support to the buffer-dma
>>  implementation of the buffer API, to continue the work done by
>>  Mihail Chindris.
>> 
>>  Patches [07/15] to [12/15] introduce the new DMABUF based API.
>> 
>>  Patches [13/15] and [14/15] add support for cyclic buffers, only 
>> through
>>  the new API. A cyclic buffer will be repeated on the output until 
>> the
>>  buffer is disabled.
>> 
>>  Patch [15/15] adds documentation about the new API.
>> 
>>  For now, the API allows you to alloc DMABUF objects and mmap() them 
>> to
>>  read or write the samples. It does not yet allow to import DMABUFs
>>  parented to other subsystems, but that should eventually be possible
>>  once it's wired.
>> 
>>  This patchset is inspired by the "mmap interface" that was 
>> previously
>>  submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would 
>> be
>>  great if I could get a review from you guys. Alexandru's commit was
>>  signed with his @analog.com address but he doesn't work at ADI 
>> anymore,
>>  so I believe I'll need him to sign with a new email.
> 
> Why dma-buf? dma-buf looks like something super generic and useful, 
> until
> you realize that there's a metric ton of gpu/accelerator bagage piled 
> in.
> So unless buffer sharing with a gpu/video/accel/whatever device is the
> goal here, and it's just for a convenient way to get at buffer 
> handles,
> this doesn't sound like a good idea.

Good question. The first reason is that a somewhat similar API was 
intented before[1], but refused upstream as it was kind of re-inventing 
the wheel.

The second reason, is that we want to be able to share buffers too, not 
with gpu/video but with the network (zctap) and in the future with USB 
(functionFS) too.

[1]: 
https://lore.kernel.org/linux-iio/20210217073638.21681-1-alexandru.ardelean@analog.com/T/

> Also if the idea is to this with gpus/accelerators then I'd really 
> like to
> see the full thing, since most likely at that point you also want
> dma_fence. And once we talk dma_fence things get truly horrible from a
> locking pov :-( Or well, just highly constrained and I get to review 
> what
> iio is doing with these buffers to make sure it all fits.

There is some dma_fence action in patch #10, which is enough for the 
userspace apps to use the API.

What "horribleness" are we talking about here? It doesn't look that 
scary to me, but I certainly don't have the complete picture.

Cheers,
-Paul

> Cheers, Daniel
> 
>> 
>>  Cheers,
>>  -Paul
>> 
>>  Alexandru Ardelean (1):
>>    iio: buffer-dma: split iio_dma_buffer_fileio_free() function
>> 
>>  Paul Cercueil (14):
>>    iio: buffer-dma: Get rid of incoming/outgoing queues
>>    iio: buffer-dma: Remove unused iio_buffer_block struct
>>    iio: buffer-dma: Use round_down() instead of rounddown()
>>    iio: buffer-dma: Enable buffer write support
>>    iio: buffer-dmaengine: Support specifying buffer direction
>>    iio: buffer-dmaengine: Enable write support
>>    iio: core: Add new DMABUF interface infrastructure
>>    iio: buffer-dma: Use DMABUFs instead of custom solution
>>    iio: buffer-dma: Implement new DMABUF based userspace API
>>    iio: buffer-dma: Boost performance using write-combine cache 
>> setting
>>    iio: buffer-dmaengine: Support new DMABUF based userspace API
>>    iio: core: Add support for cyclic buffers
>>    iio: buffer-dmaengine: Add support for cyclic buffers
>>    Documentation: iio: Document high-speed DMABUF based API
>> 
>>   Documentation/driver-api/dma-buf.rst          |   2 +
>>   Documentation/iio/dmabuf_api.rst              |  94 +++
>>   Documentation/iio/index.rst                   |   2 +
>>   drivers/iio/adc/adi-axi-adc.c                 |   3 +-
>>   drivers/iio/buffer/industrialio-buffer-dma.c  | 670 
>> ++++++++++++++----
>>   .../buffer/industrialio-buffer-dmaengine.c    |  42 +-
>>   drivers/iio/industrialio-buffer.c             |  49 ++
>>   include/linux/iio/buffer-dma.h                |  43 +-
>>   include/linux/iio/buffer-dmaengine.h          |   5 +-
>>   include/linux/iio/buffer_impl.h               |   8 +
>>   include/uapi/linux/iio/buffer.h               |  30 +
>>   11 files changed, 783 insertions(+), 165 deletions(-)
>>   create mode 100644 Documentation/iio/dmabuf_api.rst
>> 
>>  --
>>  2.33.0
>> 
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 02/15] iio: buffer-dma: Remove unused iio_buffer_block struct
  2021-11-15 14:19 ` [PATCH 02/15] iio: buffer-dma: Remove unused iio_buffer_block struct Paul Cercueil
@ 2021-11-16  8:22   ` Alexandru Ardelean
  0 siblings, 0 replies; 59+ messages in thread
From: Alexandru Ardelean @ 2021-11-16  8:22 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, LKML, linux-media,
	dri-devel, linaro-mm-sig

On Mon, Nov 15, 2021 at 4:19 PM Paul Cercueil <paul@crapouillou.net> wrote:
>
> This structure was never used anywhere, so it can safely be dropped.
>
> It will later be re-introduced as a different structure in a
> different header.

Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>

>
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  include/linux/iio/buffer-dma.h | 5 -----
>  1 file changed, 5 deletions(-)
>
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index d4ed5ff39d44..a65a005c4a19 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -17,11 +17,6 @@ struct iio_dma_buffer_queue;
>  struct iio_dma_buffer_ops;
>  struct device;
>
> -struct iio_buffer_block {
> -       u32 size;
> -       u32 bytes_used;
> -};
> -
>  /**
>   * enum iio_block_state - State of a struct iio_dma_buffer_block
>   * @IIO_BLOCK_STATE_DEQUEUED: Block is not queued
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-15 14:19 ` [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues Paul Cercueil
@ 2021-11-16  8:23   ` Alexandru Ardelean
  2021-11-21 14:05   ` Jonathan Cameron
  2021-11-21 16:23   ` Lars-Peter Clausen
  2 siblings, 0 replies; 59+ messages in thread
From: Alexandru Ardelean @ 2021-11-16  8:23 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, LKML, linux-media,
	dri-devel, linaro-mm-sig

On Mon, Nov 15, 2021 at 4:19 PM Paul Cercueil <paul@crapouillou.net> wrote:
>
> The buffer-dma code was using two queues, incoming and outgoing, to
> manage the state of the blocks in use.
>
> While this totally works, it adds some complexity to the code,
> especially since the code only manages 2 blocks. It is much easier to
> just check each block's state manually, and keep a counter for the next
> block to dequeue.
>
> Since the new DMABUF based API wouldn't use these incoming and outgoing
> queues anyway, getting rid of them now makes the upcoming changes
> simpler.
>

Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>

> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  drivers/iio/buffer/industrialio-buffer-dma.c | 68 ++++++++++----------
>  include/linux/iio/buffer-dma.h               |  7 +-
>  2 files changed, 37 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index d348af8b9705..abac88f20104 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -191,16 +191,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>
>  static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
>  {
> -       struct iio_dma_buffer_queue *queue = block->queue;
> -
> -       /*
> -        * The buffer has already been freed by the application, just drop the
> -        * reference.
> -        */
> -       if (block->state != IIO_BLOCK_STATE_DEAD) {
> +       if (block->state != IIO_BLOCK_STATE_DEAD)
>                 block->state = IIO_BLOCK_STATE_DONE;
> -               list_add_tail(&block->head, &queue->outgoing);
> -       }
>  }
>
>  /**
> @@ -317,11 +309,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
>          * dead. This means we can reset the lists without having to fear
>          * corrution.
>          */
> -       INIT_LIST_HEAD(&queue->outgoing);
>         spin_unlock_irq(&queue->list_lock);
>
> -       INIT_LIST_HEAD(&queue->incoming);
> -
>         for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
>                 if (queue->fileio.blocks[i]) {
>                         block = queue->fileio.blocks[i];
> @@ -346,7 +335,6 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
>                 }
>
>                 block->state = IIO_BLOCK_STATE_QUEUED;
> -               list_add_tail(&block->head, &queue->incoming);
>         }
>
>  out_unlock:
> @@ -401,13 +389,18 @@ int iio_dma_buffer_enable(struct iio_buffer *buffer,
>         struct iio_dev *indio_dev)
>  {
>         struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> -       struct iio_dma_buffer_block *block, *_block;
> +       struct iio_dma_buffer_block *block;
> +       unsigned int i;
>
>         mutex_lock(&queue->lock);
>         queue->active = true;
> -       list_for_each_entry_safe(block, _block, &queue->incoming, head) {
> -               list_del(&block->head);
> -               iio_dma_buffer_submit_block(queue, block);
> +       queue->fileio.next_dequeue = 0;
> +
> +       for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> +               block = queue->fileio.blocks[i];
> +
> +               if (block->state == IIO_BLOCK_STATE_QUEUED)
> +                       iio_dma_buffer_submit_block(queue, block);
>         }
>         mutex_unlock(&queue->lock);
>
> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>  static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue *queue,
>         struct iio_dma_buffer_block *block)
>  {
> -       if (block->state == IIO_BLOCK_STATE_DEAD) {
> +       if (block->state == IIO_BLOCK_STATE_DEAD)
>                 iio_buffer_block_put(block);
> -       } else if (queue->active) {
> +       else if (queue->active)
>                 iio_dma_buffer_submit_block(queue, block);
> -       } else {
> +       else
>                 block->state = IIO_BLOCK_STATE_QUEUED;
> -               list_add_tail(&block->head, &queue->incoming);
> -       }
>  }
>
>  static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
>         struct iio_dma_buffer_queue *queue)
>  {
>         struct iio_dma_buffer_block *block;
> +       unsigned int idx;
>
>         spin_lock_irq(&queue->list_lock);
> -       block = list_first_entry_or_null(&queue->outgoing, struct
> -               iio_dma_buffer_block, head);
> -       if (block != NULL) {
> -               list_del(&block->head);
> +
> +       idx = queue->fileio.next_dequeue;
> +       block = queue->fileio.blocks[idx];
> +
> +       if (block->state == IIO_BLOCK_STATE_DONE) {
>                 block->state = IIO_BLOCK_STATE_DEQUEUED;
> +               idx = (idx + 1) % ARRAY_SIZE(queue->fileio.blocks);
> +               queue->fileio.next_dequeue = idx;
> +       } else {
> +               block = NULL;
>         }
> +
>         spin_unlock_irq(&queue->list_lock);
>
>         return block;
> @@ -539,6 +537,7 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
>         struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buf);
>         struct iio_dma_buffer_block *block;
>         size_t data_available = 0;
> +       unsigned int i;
>
>         /*
>          * For counting the available bytes we'll use the size of the block not
> @@ -552,8 +551,15 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
>                 data_available += queue->fileio.active_block->size;
>
>         spin_lock_irq(&queue->list_lock);
> -       list_for_each_entry(block, &queue->outgoing, head)
> -               data_available += block->size;
> +
> +       for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> +               block = queue->fileio.blocks[i];
> +
> +               if (block != queue->fileio.active_block
> +                   && block->state == IIO_BLOCK_STATE_DONE)
> +                       data_available += block->size;
> +       }
> +
>         spin_unlock_irq(&queue->list_lock);
>         mutex_unlock(&queue->lock);
>
> @@ -616,9 +622,6 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
>         queue->dev = dev;
>         queue->ops = ops;
>
> -       INIT_LIST_HEAD(&queue->incoming);
> -       INIT_LIST_HEAD(&queue->outgoing);
> -
>         mutex_init(&queue->lock);
>         spin_lock_init(&queue->list_lock);
>
> @@ -645,11 +648,8 @@ void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
>                         continue;
>                 queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
>         }
> -       INIT_LIST_HEAD(&queue->outgoing);
>         spin_unlock_irq(&queue->list_lock);
>
> -       INIT_LIST_HEAD(&queue->incoming);
> -
>         for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
>                 if (!queue->fileio.blocks[i])
>                         continue;
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index ff15c61bf319..d4ed5ff39d44 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -78,12 +78,15 @@ struct iio_dma_buffer_block {
>   * @active_block: Block being used in read()
>   * @pos: Read offset in the active block
>   * @block_size: Size of each block
> + * @next_dequeue: index of next block that will be dequeued
>   */
>  struct iio_dma_buffer_queue_fileio {
>         struct iio_dma_buffer_block *blocks[2];
>         struct iio_dma_buffer_block *active_block;
>         size_t pos;
>         size_t block_size;
> +
> +       unsigned int next_dequeue;
>  };
>
>  /**
> @@ -97,8 +100,6 @@ struct iio_dma_buffer_queue_fileio {
>   *   atomic context as well as blocks on those lists. This is the outgoing queue
>   *   list and typically also a list of active blocks in the part that handles
>   *   the DMA controller
> - * @incoming: List of buffers on the incoming queue
> - * @outgoing: List of buffers on the outgoing queue
>   * @active: Whether the buffer is currently active
>   * @fileio: FileIO state
>   */
> @@ -109,8 +110,6 @@ struct iio_dma_buffer_queue {
>
>         struct mutex lock;
>         spinlock_t list_lock;
> -       struct list_head incoming;
> -       struct list_head outgoing;
>
>         bool active;
>
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown()
  2021-11-15 14:19 ` [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown() Paul Cercueil
@ 2021-11-16  8:26   ` Alexandru Ardelean
  2021-11-21 14:08   ` Jonathan Cameron
  1 sibling, 0 replies; 59+ messages in thread
From: Alexandru Ardelean @ 2021-11-16  8:26 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, LKML, linux-media,
	dri-devel, linaro-mm-sig

On Mon, Nov 15, 2021 at 4:19 PM Paul Cercueil <paul@crapouillou.net> wrote:
>
> We know that the buffer's alignment will always be a power of two;
> therefore, we can use the faster round_down() macro.
>

Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>

> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  drivers/iio/buffer/industrialio-buffer-dmaengine.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> index 1ac94c4e9792..f8ce26a24c57 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> @@ -67,7 +67,7 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
>         dma_cookie_t cookie;
>
>         block->bytes_used = min(block->size, dmaengine_buffer->max_size);
> -       block->bytes_used = rounddown(block->bytes_used,
> +       block->bytes_used = round_down(block->bytes_used,
>                         dmaengine_buffer->align);
>
>         desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/15] iio: buffer-dma: Enable buffer write support
  2021-11-15 14:19 ` [PATCH 04/15] iio: buffer-dma: Enable buffer write support Paul Cercueil
@ 2021-11-16  8:52   ` Alexandru Ardelean
  2021-11-21 14:20   ` Jonathan Cameron
  1 sibling, 0 replies; 59+ messages in thread
From: Alexandru Ardelean @ 2021-11-16  8:52 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, LKML, linux-media,
	dri-devel, linaro-mm-sig

On Mon, Nov 15, 2021 at 4:19 PM Paul Cercueil <paul@crapouillou.net> wrote:
>
> Adding write support to the buffer-dma code is easy - the write()
> function basically needs to do the exact same thing as the read()
> function: dequeue a block, read or write the data, enqueue the block
> when entirely processed.
>
> Therefore, the iio_buffer_dma_read() and the new iio_buffer_dma_write()
> now both call a function iio_buffer_dma_io(), which will perform this
> task.
>
> The .space_available() callback can return the exact same value as the
> .data_available() callback for input buffers, since in both cases we
> count the exact same thing (the number of bytes in each available
> block).
>

A few comments inline from me.
Otherwise fine from me.


Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>



> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  drivers/iio/buffer/industrialio-buffer-dma.c | 75 +++++++++++++++-----
>  include/linux/iio/buffer-dma.h               |  7 ++
>  2 files changed, 66 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index abac88f20104..eeeed6b2e0cf 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -179,7 +179,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>         }
>
>         block->size = size;
> -       block->state = IIO_BLOCK_STATE_DEQUEUED;
> +       block->bytes_used = size;
> +       block->state = IIO_BLOCK_STATE_DONE;

This initial state change looks like it could use a comment in the git log.
Maybe it's fine, but I can't remember 100% about it.

>         block->queue = queue;
>         INIT_LIST_HEAD(&block->head);
>         kref_init(&block->kref);
> @@ -195,6 +196,18 @@ static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
>                 block->state = IIO_BLOCK_STATE_DONE;
>  }
>
> +static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue *queue)
> +{
> +       __poll_t flags;
> +
> +       if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> +               flags = EPOLLIN | EPOLLRDNORM;
> +       else
> +               flags = EPOLLOUT | EPOLLWRNORM;
> +
> +       wake_up_interruptible_poll(&queue->buffer.pollq, flags);
> +}
> +
>  /**
>   * iio_dma_buffer_block_done() - Indicate that a block has been completed
>   * @block: The completed block
> @@ -212,7 +225,7 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
>         spin_unlock_irqrestore(&queue->list_lock, flags);
>
>         iio_buffer_block_put_atomic(block);
> -       wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> +       iio_dma_buffer_queue_wake(queue);
>  }
>  EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
>
> @@ -241,7 +254,7 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
>         }
>         spin_unlock_irqrestore(&queue->list_lock, flags);
>
> -       wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> +       iio_dma_buffer_queue_wake(queue);
>  }
>  EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
>
> @@ -334,7 +347,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
>                         queue->fileio.blocks[i] = block;
>                 }
>
> -               block->state = IIO_BLOCK_STATE_QUEUED;
> +               if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> +                       block->state = IIO_BLOCK_STATE_QUEUED;
>         }
>
>  out_unlock:
> @@ -467,20 +481,12 @@ static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
>         return block;
>  }
>
> -/**
> - * iio_dma_buffer_read() - DMA buffer read callback
> - * @buffer: Buffer to read form
> - * @n: Number of bytes to read
> - * @user_buffer: Userspace buffer to copy the data to
> - *
> - * Should be used as the read callback for iio_buffer_access_ops
> - * struct for DMA buffers.
> - */
> -int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> -       char __user *user_buffer)
> +static int iio_dma_buffer_io(struct iio_buffer *buffer,
> +                            size_t n, char __user *user_buffer, bool is_write)
>  {
>         struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
>         struct iio_dma_buffer_block *block;
> +       void *addr;
>         int ret;
>
>         if (n < buffer->bytes_per_datum)
> @@ -503,8 +509,13 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>         n = rounddown(n, buffer->bytes_per_datum);
>         if (n > block->bytes_used - queue->fileio.pos)
>                 n = block->bytes_used - queue->fileio.pos;
> +       addr = block->vaddr + queue->fileio.pos;
>
> -       if (copy_to_user(user_buffer, block->vaddr + queue->fileio.pos, n)) {
> +       if (is_write)
> +               ret = !!copy_from_user(addr, user_buffer, n);
> +       else
> +               ret = !!copy_to_user(user_buffer, addr, n);
> +       if (ret) {
>                 ret = -EFAULT;
>                 goto out_unlock;
>         }
> @@ -513,6 +524,7 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>
>         if (queue->fileio.pos == block->bytes_used) {
>                 queue->fileio.active_block = NULL;
> +               block->bytes_used = block->size;
>                 iio_dma_buffer_enqueue(queue, block);
>         }
>
> @@ -523,8 +535,39 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>
>         return ret;
>  }
> +
> +/**
> + * iio_dma_buffer_read() - DMA buffer read callback
> + * @buffer: Buffer to read form
> + * @n: Number of bytes to read
> + * @user_buffer: Userspace buffer to copy the data to
> + *
> + * Should be used as the read callback for iio_buffer_access_ops
> + * struct for DMA buffers.
> + */
> +int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> +       char __user *user_buffer)
> +{
> +       return iio_dma_buffer_io(buffer, n, user_buffer, false);
> +}
>  EXPORT_SYMBOL_GPL(iio_dma_buffer_read);
>
> +/**
> + * iio_dma_buffer_write() - DMA buffer write callback
> + * @buffer: Buffer to read form
> + * @n: Number of bytes to read
> + * @user_buffer: Userspace buffer to copy the data from
> + *
> + * Should be used as the write callback for iio_buffer_access_ops
> + * struct for DMA buffers.
> + */
> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> +                        const char __user *user_buffer)
> +{
> +       return iio_dma_buffer_io(buffer, n, (__force char *)user_buffer, true);
> +}
> +EXPORT_SYMBOL_GPL(iio_dma_buffer_write);
> +
>  /**
>   * iio_dma_buffer_data_available() - DMA buffer data_available callback
>   * @buf: Buffer to check for data availability
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index a65a005c4a19..09c07d5563c0 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -132,6 +132,8 @@ int iio_dma_buffer_disable(struct iio_buffer *buffer,
>         struct iio_dev *indio_dev);
>  int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>         char __user *user_buffer);
> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> +                        const char __user *user_buffer);
>  size_t iio_dma_buffer_data_available(struct iio_buffer *buffer);
>  int iio_dma_buffer_set_bytes_per_datum(struct iio_buffer *buffer, size_t bpd);
>  int iio_dma_buffer_set_length(struct iio_buffer *buffer, unsigned int length);
> @@ -142,4 +144,9 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
>  void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
>  void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);
>
> +static inline size_t iio_dma_buffer_space_available(struct iio_buffer *buffer)
> +{
> +       return iio_dma_buffer_data_available(buffer);
> +}

Maybe re-using iio_dma_buffer_data_available() (and updating the
return type) would have been a little neater than wrapping this.
Also fine like this though.

> +
>  #endif
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/15] iio: buffer-dmaengine: Support specifying buffer direction
  2021-11-15 14:19 ` [PATCH 05/15] iio: buffer-dmaengine: Support specifying buffer direction Paul Cercueil
@ 2021-11-16  8:53   ` Alexandru Ardelean
  0 siblings, 0 replies; 59+ messages in thread
From: Alexandru Ardelean @ 2021-11-16  8:53 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, LKML, linux-media,
	dri-devel, linaro-mm-sig

On Mon, Nov 15, 2021 at 4:20 PM Paul Cercueil <paul@crapouillou.net> wrote:
>
> Update the devm_iio_dmaengine_buffer_setup() function to support
> specifying the buffer direction.
>
> Update the iio_dmaengine_buffer_submit() function to handle input
> buffers as well as output buffers.
>


Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>


> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  drivers/iio/adc/adi-axi-adc.c                 |  3 ++-
>  .../buffer/industrialio-buffer-dmaengine.c    | 24 +++++++++++++++----
>  include/linux/iio/buffer-dmaengine.h          |  5 +++-
>  3 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/iio/adc/adi-axi-adc.c b/drivers/iio/adc/adi-axi-adc.c
> index a73e3c2d212f..0a6f2c32b1b9 100644
> --- a/drivers/iio/adc/adi-axi-adc.c
> +++ b/drivers/iio/adc/adi-axi-adc.c
> @@ -113,7 +113,8 @@ static int adi_axi_adc_config_dma_buffer(struct device *dev,
>                 dma_name = "rx";
>
>         return devm_iio_dmaengine_buffer_setup(indio_dev->dev.parent,
> -                                              indio_dev, dma_name);
> +                                              indio_dev, dma_name,
> +                                              IIO_BUFFER_DIRECTION_IN);
>  }
>
>  static int adi_axi_adc_read_raw(struct iio_dev *indio_dev,
> diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> index f8ce26a24c57..ac26b04aa4a9 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> @@ -64,14 +64,25 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
>         struct dmaengine_buffer *dmaengine_buffer =
>                 iio_buffer_to_dmaengine_buffer(&queue->buffer);
>         struct dma_async_tx_descriptor *desc;
> +       enum dma_transfer_direction dma_dir;
> +       size_t max_size;
>         dma_cookie_t cookie;
>
> -       block->bytes_used = min(block->size, dmaengine_buffer->max_size);
> -       block->bytes_used = round_down(block->bytes_used,
> -                       dmaengine_buffer->align);
> +       max_size = min(block->size, dmaengine_buffer->max_size);
> +       max_size = round_down(max_size, dmaengine_buffer->align);
> +
> +       if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> +               block->bytes_used = max_size;
> +               dma_dir = DMA_DEV_TO_MEM;
> +       } else {
> +               dma_dir = DMA_MEM_TO_DEV;
> +       }
> +
> +       if (!block->bytes_used || block->bytes_used > max_size)
> +               return -EINVAL;
>
>         desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> -               block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM,
> +               block->phys_addr, block->bytes_used, dma_dir,
>                 DMA_PREP_INTERRUPT);
>         if (!desc)
>                 return -ENOMEM;
> @@ -275,7 +286,8 @@ static struct iio_buffer *devm_iio_dmaengine_buffer_alloc(struct device *dev,
>   */
>  int devm_iio_dmaengine_buffer_setup(struct device *dev,
>                                     struct iio_dev *indio_dev,
> -                                   const char *channel)
> +                                   const char *channel,
> +                                   enum iio_buffer_direction dir)
>  {
>         struct iio_buffer *buffer;
>
> @@ -286,6 +298,8 @@ int devm_iio_dmaengine_buffer_setup(struct device *dev,
>
>         indio_dev->modes |= INDIO_BUFFER_HARDWARE;
>
> +       buffer->direction = dir;
> +
>         return iio_device_attach_buffer(indio_dev, buffer);
>  }
>  EXPORT_SYMBOL_GPL(devm_iio_dmaengine_buffer_setup);
> diff --git a/include/linux/iio/buffer-dmaengine.h b/include/linux/iio/buffer-dmaengine.h
> index 5c355be89814..538d0479cdd6 100644
> --- a/include/linux/iio/buffer-dmaengine.h
> +++ b/include/linux/iio/buffer-dmaengine.h
> @@ -7,11 +7,14 @@
>  #ifndef __IIO_DMAENGINE_H__
>  #define __IIO_DMAENGINE_H__
>
> +#include <linux/iio/buffer.h>
> +
>  struct iio_dev;
>  struct device;
>
>  int devm_iio_dmaengine_buffer_setup(struct device *dev,
>                                     struct iio_dev *indio_dev,
> -                                   const char *channel);
> +                                   const char *channel,
> +                                   enum iio_buffer_direction dir);
>
>  #endif
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/15] iio: buffer-dmaengine: Enable write support
  2021-11-15 14:19 ` [PATCH 06/15] iio: buffer-dmaengine: Enable write support Paul Cercueil
@ 2021-11-16  8:55   ` Alexandru Ardelean
  0 siblings, 0 replies; 59+ messages in thread
From: Alexandru Ardelean @ 2021-11-16  8:55 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, LKML, linux-media,
	dri-devel, linaro-mm-sig

On Mon, Nov 15, 2021 at 4:20 PM Paul Cercueil <paul@crapouillou.net> wrote:
>
> Use the iio_dma_buffer_write() and iio_dma_buffer_space_available()
> functions provided by the buffer-dma core, to enable write support in
> the buffer-dmaengine code.
>

This is a bit related to the comment about
iio_dma_buffer_space_available() in patch 4

But otherwise:

Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>


> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  drivers/iio/buffer/industrialio-buffer-dmaengine.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> index ac26b04aa4a9..5cde8fd81c7f 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> @@ -123,12 +123,14 @@ static void iio_dmaengine_buffer_release(struct iio_buffer *buf)
>
>  static const struct iio_buffer_access_funcs iio_dmaengine_buffer_ops = {
>         .read = iio_dma_buffer_read,
> +       .write = iio_dma_buffer_write,
>         .set_bytes_per_datum = iio_dma_buffer_set_bytes_per_datum,
>         .set_length = iio_dma_buffer_set_length,
>         .request_update = iio_dma_buffer_request_update,
>         .enable = iio_dma_buffer_enable,
>         .disable = iio_dma_buffer_disable,
>         .data_available = iio_dma_buffer_data_available,
> +       .space_available = iio_dma_buffer_space_available,
>         .release = iio_dmaengine_buffer_release,
>
>         .modes = INDIO_BUFFER_HARDWARE,
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 13/15] iio: core: Add support for cyclic buffers
  2021-11-15 14:22   ` [PATCH 13/15] iio: core: Add support for cyclic buffers Paul Cercueil
@ 2021-11-16  9:50     ` Alexandru Ardelean
  0 siblings, 0 replies; 59+ messages in thread
From: Alexandru Ardelean @ 2021-11-16  9:50 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, LKML, linux-media,
	dri-devel, linaro-mm-sig

On Mon, Nov 15, 2021 at 4:22 PM Paul Cercueil <paul@crapouillou.net> wrote:
>
> Introduce a new flag IIO_BUFFER_DMABUF_CYCLIC in the "flags" field of
> the iio_dmabuf uapi structure.
>
> When set, the DMABUF enqueued with the enqueue ioctl will be endlessly
> repeated on the TX output, until the buffer is disabled.
>

Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>

> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  drivers/iio/industrialio-buffer.c | 5 +++++
>  include/uapi/linux/iio/buffer.h   | 3 ++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
> index 30910e6c2346..41bc51c88002 100644
> --- a/drivers/iio/industrialio-buffer.c
> +++ b/drivers/iio/industrialio-buffer.c
> @@ -1600,6 +1600,11 @@ static int iio_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
>         if (dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
>                 return -EINVAL;
>
> +       /* Cyclic flag is only supported on output buffers */
> +       if ((dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC) &&
> +           buffer->direction != IIO_BUFFER_DIRECTION_OUT)
> +               return -EINVAL;
> +
>         return buffer->access->enqueue_dmabuf(buffer, &dmabuf);
>  }
>
> diff --git a/include/uapi/linux/iio/buffer.h b/include/uapi/linux/iio/buffer.h
> index e4621b926262..2d541d038c02 100644
> --- a/include/uapi/linux/iio/buffer.h
> +++ b/include/uapi/linux/iio/buffer.h
> @@ -7,7 +7,8 @@
>
>  #include <linux/types.h>
>
> -#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS      0x00000000
> +#define IIO_BUFFER_DMABUF_CYCLIC               (1 << 0)
> +#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS      0x00000001
>
>  /**
>   * struct iio_dmabuf_alloc_req - Descriptor for allocating IIO DMABUFs
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/15] iio: buffer-dmaengine: Add support for cyclic buffers
  2021-11-15 14:22   ` [PATCH 14/15] iio: buffer-dmaengine: " Paul Cercueil
@ 2021-11-16  9:50     ` Alexandru Ardelean
  0 siblings, 0 replies; 59+ messages in thread
From: Alexandru Ardelean @ 2021-11-16  9:50 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, LKML, linux-media,
	dri-devel, linaro-mm-sig

On Mon, Nov 15, 2021 at 4:23 PM Paul Cercueil <paul@crapouillou.net> wrote:
>
> Handle the IIO_BUFFER_DMABUF_CYCLIC flag to support cyclic buffers.
>

Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>

> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  drivers/iio/buffer/industrialio-buffer-dma.c      |  1 +
>  .../iio/buffer/industrialio-buffer-dmaengine.c    | 15 ++++++++++++---
>  include/linux/iio/buffer-dma.h                    |  3 +++
>  3 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index fb39054d8c15..6658f103ee17 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -933,6 +933,7 @@ int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
>         }
>
>         dma_block->bytes_used = iio_dmabuf->bytes_used ?: dma_block->size;
> +       dma_block->cyclic = iio_dmabuf->flags & IIO_BUFFER_DMABUF_CYCLIC;
>
>         switch (dma_block->state) {
>         case IIO_BLOCK_STATE_QUEUED:
> diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> index 57a8b2e4ba3c..952e2160a11e 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> @@ -81,9 +81,18 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
>         if (!block->bytes_used || block->bytes_used > max_size)
>                 return -EINVAL;
>
> -       desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> -               block->phys_addr, block->bytes_used, dma_dir,
> -               DMA_PREP_INTERRUPT);
> +       if (block->cyclic) {
> +               desc = dmaengine_prep_dma_cyclic(dmaengine_buffer->chan,
> +                                                block->phys_addr,
> +                                                block->size,
> +                                                block->bytes_used,
> +                                                dma_dir, 0);
> +       } else {
> +               desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
> +                                                  block->phys_addr,
> +                                                  block->bytes_used, dma_dir,
> +                                                  DMA_PREP_INTERRUPT);
> +       }
>         if (!desc)
>                 return -ENOMEM;
>
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index 85e55fe35282..27639fdf7b54 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -42,6 +42,7 @@ enum iio_block_state {
>   * @phys_addr: Physical address of the blocks memory
>   * @queue: Parent DMA buffer queue
>   * @state: Current state of the block
> + * @cyclic: True if this is a cyclic buffer
>   * @fileio: True if this buffer is used for fileio mode
>   * @dmabuf: Underlying DMABUF object
>   */
> @@ -65,6 +66,8 @@ struct iio_dma_buffer_block {
>          */
>         enum iio_block_state state;
>
> +       bool cyclic;
> +
>         bool fileio;
>         struct dma_buf *dmabuf;
>  };
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function
  2021-11-15 14:19 ` [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function Paul Cercueil
@ 2021-11-16 10:59   ` Alexandru Ardelean
  2021-11-21 13:49     ` Jonathan Cameron
  0 siblings, 1 reply; 59+ messages in thread
From: Alexandru Ardelean @ 2021-11-16 10:59 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, LKML, linux-media,
	dri-devel, linaro-mm-sig, Alexandru Ardelean

On Mon, Nov 15, 2021 at 4:20 PM Paul Cercueil <paul@crapouillou.net> wrote:
>
> From: Alexandru Ardelean <alexandru.ardelean@analog.com>
>
> A part of the logic in the iio_dma_buffer_exit() is required for the change
> to add mmap support to IIO buffers.
> This change splits the logic into a separate function, which will be
> re-used later.
>

Not sure how the protocol is here, since my old @analog.com email
isn't working anymore.

But:

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>

Thanks :)
Alex

> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  drivers/iio/buffer/industrialio-buffer-dma.c | 39 +++++++++++---------
>  1 file changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index eeeed6b2e0cf..eb8cfd3af030 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -358,6 +358,27 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
>  }
>  EXPORT_SYMBOL_GPL(iio_dma_buffer_request_update);
>
> +static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue)
> +{
> +       unsigned int i;
> +
> +       spin_lock_irq(&queue->list_lock);
> +       for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> +               if (!queue->fileio.blocks[i])
> +                       continue;
> +               queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
> +       }
> +       spin_unlock_irq(&queue->list_lock);
> +
> +       for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> +               if (!queue->fileio.blocks[i])
> +                       continue;
> +               iio_buffer_block_put(queue->fileio.blocks[i]);
> +               queue->fileio.blocks[i] = NULL;
> +       }
> +       queue->fileio.active_block = NULL;
> +}
> +
>  static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,
>         struct iio_dma_buffer_block *block)
>  {
> @@ -681,25 +702,9 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_init);
>   */
>  void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
>  {
> -       unsigned int i;
> -
>         mutex_lock(&queue->lock);
>
> -       spin_lock_irq(&queue->list_lock);
> -       for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> -               if (!queue->fileio.blocks[i])
> -                       continue;
> -               queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
> -       }
> -       spin_unlock_irq(&queue->list_lock);
> -
> -       for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> -               if (!queue->fileio.blocks[i])
> -                       continue;
> -               iio_buffer_block_put(queue->fileio.blocks[i]);
> -               queue->fileio.blocks[i] = NULL;
> -       }
> -       queue->fileio.active_block = NULL;
> +       iio_dma_buffer_fileio_free(queue);
>         queue->ops = NULL;
>
>         mutex_unlock(&queue->lock);
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API
  2021-11-15 14:57   ` Paul Cercueil
@ 2021-11-16 16:02     ` Daniel Vetter
  2021-11-16 16:31       ` Laurent Pinchart
  2021-11-17 12:50       ` Paul Cercueil
  0 siblings, 2 replies; 59+ messages in thread
From: Daniel Vetter @ 2021-11-16 16:02 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Daniel Vetter, Jonathan Cameron, Michael Hennerich, linux-iio,
	linux-kernel, dri-devel, Christian König, linaro-mm-sig,
	Alexandru Ardelean, linux-media

On Mon, Nov 15, 2021 at 02:57:37PM +0000, Paul Cercueil wrote:
> Hi Daniel,
> 
> Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter <daniel@ffwll.ch> a
> écrit :
> > On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
> > >  Hi Jonathan,
> > > 
> > >  This patchset introduces a new userspace interface based on DMABUF
> > >  objects, to complement the existing fileio based API.
> > > 
> > >  The advantage of this DMABUF based interface vs. the fileio
> > >  interface, is that it avoids an extra copy of the data between the
> > >  kernel and userspace. This is particularly userful for high-speed
> > >  devices which produce several megabytes or even gigabytes of data
> > > per
> > >  second.
> > > 
> > >  The first few patches [01/15] to [03/15] are not really related, but
> > >  allow to reduce the size of the patches that introduce the new API.
> > > 
> > >  Patch [04/15] to [06/15] enables write() support to the buffer-dma
> > >  implementation of the buffer API, to continue the work done by
> > >  Mihail Chindris.
> > > 
> > >  Patches [07/15] to [12/15] introduce the new DMABUF based API.
> > > 
> > >  Patches [13/15] and [14/15] add support for cyclic buffers, only
> > > through
> > >  the new API. A cyclic buffer will be repeated on the output until
> > > the
> > >  buffer is disabled.
> > > 
> > >  Patch [15/15] adds documentation about the new API.
> > > 
> > >  For now, the API allows you to alloc DMABUF objects and mmap() them
> > > to
> > >  read or write the samples. It does not yet allow to import DMABUFs
> > >  parented to other subsystems, but that should eventually be possible
> > >  once it's wired.
> > > 
> > >  This patchset is inspired by the "mmap interface" that was
> > > previously
> > >  submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would
> > > be
> > >  great if I could get a review from you guys. Alexandru's commit was
> > >  signed with his @analog.com address but he doesn't work at ADI
> > > anymore,
> > >  so I believe I'll need him to sign with a new email.
> > 
> > Why dma-buf? dma-buf looks like something super generic and useful,
> > until
> > you realize that there's a metric ton of gpu/accelerator bagage piled
> > in.
> > So unless buffer sharing with a gpu/video/accel/whatever device is the
> > goal here, and it's just for a convenient way to get at buffer handles,
> > this doesn't sound like a good idea.
> 
> Good question. The first reason is that a somewhat similar API was intented
> before[1], but refused upstream as it was kind of re-inventing the wheel.
> 
> The second reason, is that we want to be able to share buffers too, not with
> gpu/video but with the network (zctap) and in the future with USB
> (functionFS) too.
> 
> [1]: https://lore.kernel.org/linux-iio/20210217073638.21681-1-alexandru.ardelean@analog.com/T/

Hm is that code merged already in upstream already?

I know that dma-buf looks really generic, but like I said if there's no
need ever to interface with any of the gpu buffer sharing it might be
better to use something else (like get_user_pages maybe, would that work?).

> > Also if the idea is to this with gpus/accelerators then I'd really like
> > to
> > see the full thing, since most likely at that point you also want
> > dma_fence. And once we talk dma_fence things get truly horrible from a
> > locking pov :-( Or well, just highly constrained and I get to review
> > what
> > iio is doing with these buffers to make sure it all fits.
> 
> There is some dma_fence action in patch #10, which is enough for the
> userspace apps to use the API.
> 
> What "horribleness" are we talking about here? It doesn't look that scary to
> me, but I certainly don't have the complete picture.

You need to annotate all the code involved in signalling that dma_fence
using dma_fence_begin/end_signalling, and then enable full lockdep and
everything.

You can safely assume you'll find bugs, because we even have bugs about
this in gpu drivers (where that annotation isn't fully rolled out yet).

The tldr is that you can allocate memory in there. And a pile of other
restrictions, but not being able to allocate memory (well GFP_ATOMIC is
ok, but that can fail) is a very serious restriction.
-Daniel


> 
> Cheers,
> -Paul
> 
> > Cheers, Daniel
> > 
> > > 
> > >  Cheers,
> > >  -Paul
> > > 
> > >  Alexandru Ardelean (1):
> > >    iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> > > 
> > >  Paul Cercueil (14):
> > >    iio: buffer-dma: Get rid of incoming/outgoing queues
> > >    iio: buffer-dma: Remove unused iio_buffer_block struct
> > >    iio: buffer-dma: Use round_down() instead of rounddown()
> > >    iio: buffer-dma: Enable buffer write support
> > >    iio: buffer-dmaengine: Support specifying buffer direction
> > >    iio: buffer-dmaengine: Enable write support
> > >    iio: core: Add new DMABUF interface infrastructure
> > >    iio: buffer-dma: Use DMABUFs instead of custom solution
> > >    iio: buffer-dma: Implement new DMABUF based userspace API
> > >    iio: buffer-dma: Boost performance using write-combine cache
> > > setting
> > >    iio: buffer-dmaengine: Support new DMABUF based userspace API
> > >    iio: core: Add support for cyclic buffers
> > >    iio: buffer-dmaengine: Add support for cyclic buffers
> > >    Documentation: iio: Document high-speed DMABUF based API
> > > 
> > >   Documentation/driver-api/dma-buf.rst          |   2 +
> > >   Documentation/iio/dmabuf_api.rst              |  94 +++
> > >   Documentation/iio/index.rst                   |   2 +
> > >   drivers/iio/adc/adi-axi-adc.c                 |   3 +-
> > >   drivers/iio/buffer/industrialio-buffer-dma.c  | 670
> > > ++++++++++++++----
> > >   .../buffer/industrialio-buffer-dmaengine.c    |  42 +-
> > >   drivers/iio/industrialio-buffer.c             |  49 ++
> > >   include/linux/iio/buffer-dma.h                |  43 +-
> > >   include/linux/iio/buffer-dmaengine.h          |   5 +-
> > >   include/linux/iio/buffer_impl.h               |   8 +
> > >   include/uapi/linux/iio/buffer.h               |  30 +
> > >   11 files changed, 783 insertions(+), 165 deletions(-)
> > >   create mode 100644 Documentation/iio/dmabuf_api.rst
> > > 
> > >  --
> > >  2.33.0
> > > 
> > 
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API
  2021-11-16 16:02     ` Daniel Vetter
@ 2021-11-16 16:31       ` Laurent Pinchart
  2021-11-17  8:48         ` Christian König
  2021-11-17 12:50       ` Paul Cercueil
  1 sibling, 1 reply; 59+ messages in thread
From: Laurent Pinchart @ 2021-11-16 16:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Paul Cercueil, Michael Hennerich, linux-iio, linux-kernel,
	dri-devel, linaro-mm-sig, Alexandru Ardelean,
	Christian König, Jonathan Cameron, linux-media

On Tue, Nov 16, 2021 at 05:02:25PM +0100, Daniel Vetter wrote:
> On Mon, Nov 15, 2021 at 02:57:37PM +0000, Paul Cercueil wrote:
> > Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter a écrit :
> > > On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
> > > >  Hi Jonathan,
> > > > 
> > > >  This patchset introduces a new userspace interface based on DMABUF
> > > >  objects, to complement the existing fileio based API.
> > > > 
> > > >  The advantage of this DMABUF based interface vs. the fileio
> > > >  interface, is that it avoids an extra copy of the data between the
> > > >  kernel and userspace. This is particularly userful for high-speed
> > > >  devices which produce several megabytes or even gigabytes of data per
> > > >  second.
> > > > 
> > > >  The first few patches [01/15] to [03/15] are not really related, but
> > > >  allow to reduce the size of the patches that introduce the new API.
> > > > 
> > > >  Patch [04/15] to [06/15] enables write() support to the buffer-dma
> > > >  implementation of the buffer API, to continue the work done by
> > > >  Mihail Chindris.
> > > > 
> > > >  Patches [07/15] to [12/15] introduce the new DMABUF based API.
> > > > 
> > > >  Patches [13/15] and [14/15] add support for cyclic buffers, only through
> > > >  the new API. A cyclic buffer will be repeated on the output until the
> > > >  buffer is disabled.
> > > > 
> > > >  Patch [15/15] adds documentation about the new API.
> > > > 
> > > >  For now, the API allows you to alloc DMABUF objects and mmap() them
> > > > to
> > > >  read or write the samples. It does not yet allow to import DMABUFs
> > > >  parented to other subsystems, but that should eventually be possible
> > > >  once it's wired.
> > > > 
> > > >  This patchset is inspired by the "mmap interface" that was previously
> > > >  submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would be
> > > >  great if I could get a review from you guys. Alexandru's commit was
> > > >  signed with his @analog.com address but he doesn't work at ADI anymore,
> > > >  so I believe I'll need him to sign with a new email.
> > > 
> > > Why dma-buf? dma-buf looks like something super generic and useful, until
> > > you realize that there's a metric ton of gpu/accelerator bagage piled in.
> > > So unless buffer sharing with a gpu/video/accel/whatever device is the

And cameras (maybe they're included in "whatever" ?).

> > > goal here, and it's just for a convenient way to get at buffer handles,
> > > this doesn't sound like a good idea.
> > 
> > Good question. The first reason is that a somewhat similar API was intented
> > before[1], but refused upstream as it was kind of re-inventing the wheel.
> > 
> > The second reason, is that we want to be able to share buffers too, not with
> > gpu/video but with the network (zctap) and in the future with USB
> > (functionFS) too.
> > 
> > [1]: https://lore.kernel.org/linux-iio/20210217073638.21681-1-alexandru.ardelean@analog.com/T/
> 
> Hm is that code merged already in upstream already?
> 
> I know that dma-buf looks really generic, but like I said if there's no
> need ever to interface with any of the gpu buffer sharing it might be
> better to use something else (like get_user_pages maybe, would that work?).

Not GUP please. That brings another set of issues, especially when
dealing with DMA, we've suffered enough from the USERPTR support in V4L2
to avoid repeating this. Let's make dma-buf better instead.

> > > Also if the idea is to this with gpus/accelerators then I'd really like to
> > > see the full thing, since most likely at that point you also want
> > > dma_fence. And once we talk dma_fence things get truly horrible from a
> > > locking pov :-( Or well, just highly constrained and I get to review what
> > > iio is doing with these buffers to make sure it all fits.
> > 
> > There is some dma_fence action in patch #10, which is enough for the
> > userspace apps to use the API.
> > 
> > What "horribleness" are we talking about here? It doesn't look that scary to
> > me, but I certainly don't have the complete picture.
> 
> You need to annotate all the code involved in signalling that dma_fence
> using dma_fence_begin/end_signalling, and then enable full lockdep and
> everything.
> 
> You can safely assume you'll find bugs, because we even have bugs about
> this in gpu drivers (where that annotation isn't fully rolled out yet).
> 
> The tldr is that you can allocate memory in there. And a pile of other
> restrictions, but not being able to allocate memory (well GFP_ATOMIC is
> ok, but that can fail) is a very serious restriction.
> -Daniel
> 
> > > >  Alexandru Ardelean (1):
> > > >    iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> > > > 
> > > >  Paul Cercueil (14):
> > > >    iio: buffer-dma: Get rid of incoming/outgoing queues
> > > >    iio: buffer-dma: Remove unused iio_buffer_block struct
> > > >    iio: buffer-dma: Use round_down() instead of rounddown()
> > > >    iio: buffer-dma: Enable buffer write support
> > > >    iio: buffer-dmaengine: Support specifying buffer direction
> > > >    iio: buffer-dmaengine: Enable write support
> > > >    iio: core: Add new DMABUF interface infrastructure
> > > >    iio: buffer-dma: Use DMABUFs instead of custom solution
> > > >    iio: buffer-dma: Implement new DMABUF based userspace API
> > > >    iio: buffer-dma: Boost performance using write-combine cache setting
> > > >    iio: buffer-dmaengine: Support new DMABUF based userspace API
> > > >    iio: core: Add support for cyclic buffers
> > > >    iio: buffer-dmaengine: Add support for cyclic buffers
> > > >    Documentation: iio: Document high-speed DMABUF based API
> > > > 
> > > >   Documentation/driver-api/dma-buf.rst          |   2 +
> > > >   Documentation/iio/dmabuf_api.rst              |  94 +++
> > > >   Documentation/iio/index.rst                   |   2 +
> > > >   drivers/iio/adc/adi-axi-adc.c                 |   3 +-
> > > >   drivers/iio/buffer/industrialio-buffer-dma.c  | 670 ++++++++++++++----
> > > >   .../buffer/industrialio-buffer-dmaengine.c    |  42 +-
> > > >   drivers/iio/industrialio-buffer.c             |  49 ++
> > > >   include/linux/iio/buffer-dma.h                |  43 +-
> > > >   include/linux/iio/buffer-dmaengine.h          |   5 +-
> > > >   include/linux/iio/buffer_impl.h               |   8 +
> > > >   include/uapi/linux/iio/buffer.h               |  30 +
> > > >   11 files changed, 783 insertions(+), 165 deletions(-)
> > > >   create mode 100644 Documentation/iio/dmabuf_api.rst

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API
  2021-11-16 16:31       ` Laurent Pinchart
@ 2021-11-17  8:48         ` Christian König
  0 siblings, 0 replies; 59+ messages in thread
From: Christian König @ 2021-11-17  8:48 UTC (permalink / raw)
  To: Laurent Pinchart, Daniel Vetter
  Cc: Paul Cercueil, Michael Hennerich, linux-iio, linux-kernel,
	dri-devel, linaro-mm-sig, Alexandru Ardelean, Jonathan Cameron,
	linux-media

Am 16.11.21 um 17:31 schrieb Laurent Pinchart:
> On Tue, Nov 16, 2021 at 05:02:25PM +0100, Daniel Vetter wrote:
>> On Mon, Nov 15, 2021 at 02:57:37PM +0000, Paul Cercueil wrote:
>>> Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter a écrit :
>>>> On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
>>>>>   Hi Jonathan,
>>>>>
>>>>>   This patchset introduces a new userspace interface based on DMABUF
>>>>>   objects, to complement the existing fileio based API.
>>>>>
>>>>>   The advantage of this DMABUF based interface vs. the fileio
>>>>>   interface, is that it avoids an extra copy of the data between the
>>>>>   kernel and userspace. This is particularly userful for high-speed
>>>>>   devices which produce several megabytes or even gigabytes of data per
>>>>>   second.
>>>>>
>>>>>   The first few patches [01/15] to [03/15] are not really related, but
>>>>>   allow to reduce the size of the patches that introduce the new API.
>>>>>
>>>>>   Patch [04/15] to [06/15] enables write() support to the buffer-dma
>>>>>   implementation of the buffer API, to continue the work done by
>>>>>   Mihail Chindris.
>>>>>
>>>>>   Patches [07/15] to [12/15] introduce the new DMABUF based API.
>>>>>
>>>>>   Patches [13/15] and [14/15] add support for cyclic buffers, only through
>>>>>   the new API. A cyclic buffer will be repeated on the output until the
>>>>>   buffer is disabled.
>>>>>
>>>>>   Patch [15/15] adds documentation about the new API.
>>>>>
>>>>>   For now, the API allows you to alloc DMABUF objects and mmap() them
>>>>> to
>>>>>   read or write the samples. It does not yet allow to import DMABUFs
>>>>>   parented to other subsystems, but that should eventually be possible
>>>>>   once it's wired.
>>>>>
>>>>>   This patchset is inspired by the "mmap interface" that was previously
>>>>>   submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would be
>>>>>   great if I could get a review from you guys. Alexandru's commit was
>>>>>   signed with his @analog.com address but he doesn't work at ADI anymore,
>>>>>   so I believe I'll need him to sign with a new email.
>>>> Why dma-buf? dma-buf looks like something super generic and useful, until
>>>> you realize that there's a metric ton of gpu/accelerator bagage piled in.
>>>> So unless buffer sharing with a gpu/video/accel/whatever device is the
> And cameras (maybe they're included in "whatever" ?).
>
>>>> goal here, and it's just for a convenient way to get at buffer handles,
>>>> this doesn't sound like a good idea.
>>> Good question. The first reason is that a somewhat similar API was intented
>>> before[1], but refused upstream as it was kind of re-inventing the wheel.
>>>
>>> The second reason, is that we want to be able to share buffers too, not with
>>> gpu/video but with the network (zctap) and in the future with USB
>>> (functionFS) too.
>>>
>>> [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-iio%2F20210217073638.21681-1-alexandru.ardelean%40analog.com%2FT%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C7fffe09e82514577747808d9a91e9dd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637726772007336951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=w1vD8IKz5G7ut%2FlsOyuYXKKQRBV1s8dN7qJBUwo8x9g%3D&amp;reserved=0
>> Hm is that code merged already in upstream already?
>>
>> I know that dma-buf looks really generic, but like I said if there's no
>> need ever to interface with any of the gpu buffer sharing it might be
>> better to use something else (like get_user_pages maybe, would that work?).
> Not GUP please. That brings another set of issues, especially when
> dealing with DMA, we've suffered enough from the USERPTR support in V4L2
> to avoid repeating this. Let's make dma-buf better instead.

Yeah, when comparing GUP and DMA-buf the later is clearly the lesser evil.

DMA-buf indeed has some design issues we need to work on, especially 
around the async operations and synchronization. But I still think those 
are solvable.

GUP on the other hand has some hard fundamental problems which you can 
only solved completely if the hardware is capable of fast and reliable 
recoverable page faults.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API
  2021-11-16 16:02     ` Daniel Vetter
  2021-11-16 16:31       ` Laurent Pinchart
@ 2021-11-17 12:50       ` Paul Cercueil
  2021-11-17 13:42         ` Hennerich, Michael
  1 sibling, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-17 12:50 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jonathan Cameron, Michael Hennerich, linux-iio, linux-kernel,
	dri-devel, Christian König, linaro-mm-sig,
	Alexandru Ardelean, linux-media

Hi Daniel,

Le mar., nov. 16 2021 at 17:02:25 +0100, Daniel Vetter 
<daniel@ffwll.ch> a écrit :
> On Mon, Nov 15, 2021 at 02:57:37PM +0000, Paul Cercueil wrote:
>>  Hi Daniel,
>> 
>>  Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter 
>> <daniel@ffwll.ch> a
>>  écrit :
>>  > On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
>>  > >  Hi Jonathan,
>>  > >
>>  > >  This patchset introduces a new userspace interface based on 
>> DMABUF
>>  > >  objects, to complement the existing fileio based API.
>>  > >
>>  > >  The advantage of this DMABUF based interface vs. the fileio
>>  > >  interface, is that it avoids an extra copy of the data between 
>> the
>>  > >  kernel and userspace. This is particularly userful for 
>> high-speed
>>  > >  devices which produce several megabytes or even gigabytes of 
>> data
>>  > > per
>>  > >  second.
>>  > >
>>  > >  The first few patches [01/15] to [03/15] are not really 
>> related, but
>>  > >  allow to reduce the size of the patches that introduce the new 
>> API.
>>  > >
>>  > >  Patch [04/15] to [06/15] enables write() support to the 
>> buffer-dma
>>  > >  implementation of the buffer API, to continue the work done by
>>  > >  Mihail Chindris.
>>  > >
>>  > >  Patches [07/15] to [12/15] introduce the new DMABUF based API.
>>  > >
>>  > >  Patches [13/15] and [14/15] add support for cyclic buffers, 
>> only
>>  > > through
>>  > >  the new API. A cyclic buffer will be repeated on the output 
>> until
>>  > > the
>>  > >  buffer is disabled.
>>  > >
>>  > >  Patch [15/15] adds documentation about the new API.
>>  > >
>>  > >  For now, the API allows you to alloc DMABUF objects and mmap() 
>> them
>>  > > to
>>  > >  read or write the samples. It does not yet allow to import 
>> DMABUFs
>>  > >  parented to other subsystems, but that should eventually be 
>> possible
>>  > >  once it's wired.
>>  > >
>>  > >  This patchset is inspired by the "mmap interface" that was
>>  > > previously
>>  > >  submitted by Alexandru Ardelean and Lars-Peter Clausen, so it 
>> would
>>  > > be
>>  > >  great if I could get a review from you guys. Alexandru's 
>> commit was
>>  > >  signed with his @analog.com address but he doesn't work at ADI
>>  > > anymore,
>>  > >  so I believe I'll need him to sign with a new email.
>>  >
>>  > Why dma-buf? dma-buf looks like something super generic and 
>> useful,
>>  > until
>>  > you realize that there's a metric ton of gpu/accelerator bagage 
>> piled
>>  > in.
>>  > So unless buffer sharing with a gpu/video/accel/whatever device 
>> is the
>>  > goal here, and it's just for a convenient way to get at buffer 
>> handles,
>>  > this doesn't sound like a good idea.
>> 
>>  Good question. The first reason is that a somewhat similar API was 
>> intented
>>  before[1], but refused upstream as it was kind of re-inventing the 
>> wheel.
>> 
>>  The second reason, is that we want to be able to share buffers too, 
>> not with
>>  gpu/video but with the network (zctap) and in the future with USB
>>  (functionFS) too.
>> 
>>  [1]: 
>> https://lore.kernel.org/linux-iio/20210217073638.21681-1-alexandru.ardelean@analog.com/T/
> 
> Hm is that code merged already in upstream already?

No, it was never merged.

> I know that dma-buf looks really generic, but like I said if there's 
> no
> need ever to interface with any of the gpu buffer sharing it might be
> better to use something else (like get_user_pages maybe, would that 
> work?).

If it was such a bad idea, why didn't you say it in the first place 
when you replied to my request for feedback? [1]

I don't think we have any other solution. We can design a custom API to 
pass buffers between IIO and user space, but that won't allow us to 
share these buffers with other subsystems. If dma-buf is not a generic 
solution, then we need a generic solution.

[1]: 
https://x-lore.kernel.org/io-uring/b0a336c0-ae2f-e77f-3c5f-51fdb3fc51fe@amd.com/T/

>>  > Also if the idea is to this with gpus/accelerators then I'd 
>> really like
>>  > to
>>  > see the full thing, since most likely at that point you also want
>>  > dma_fence. And once we talk dma_fence things get truly horrible 
>> from a
>>  > locking pov :-( Or well, just highly constrained and I get to 
>> review
>>  > what
>>  > iio is doing with these buffers to make sure it all fits.
>> 
>>  There is some dma_fence action in patch #10, which is enough for the
>>  userspace apps to use the API.
>> 
>>  What "horribleness" are we talking about here? It doesn't look that 
>> scary to
>>  me, but I certainly don't have the complete picture.
> 
> You need to annotate all the code involved in signalling that 
> dma_fence
> using dma_fence_begin/end_signalling, and then enable full lockdep and
> everything.

Doesn't dma_fence_signal() do it for me? Looking at the code, it does 
call dma_fence_begin/end_signalling.

Cheers,
-Paul

> You can safely assume you'll find bugs, because we even have bugs 
> about
> this in gpu drivers (where that annotation isn't fully rolled out 
> yet).
> 
> The tldr is that you can allocate memory in there. And a pile of other
> restrictions, but not being able to allocate memory (well GFP_ATOMIC 
> is
> ok, but that can fail) is a very serious restriction.
> -Daniel
> 
> 
>> 
>>  Cheers,
>>  -Paul
>> 
>>  > Cheers, Daniel
>>  >
>>  > >
>>  > >  Cheers,
>>  > >  -Paul
>>  > >
>>  > >  Alexandru Ardelean (1):
>>  > >    iio: buffer-dma: split iio_dma_buffer_fileio_free() function
>>  > >
>>  > >  Paul Cercueil (14):
>>  > >    iio: buffer-dma: Get rid of incoming/outgoing queues
>>  > >    iio: buffer-dma: Remove unused iio_buffer_block struct
>>  > >    iio: buffer-dma: Use round_down() instead of rounddown()
>>  > >    iio: buffer-dma: Enable buffer write support
>>  > >    iio: buffer-dmaengine: Support specifying buffer direction
>>  > >    iio: buffer-dmaengine: Enable write support
>>  > >    iio: core: Add new DMABUF interface infrastructure
>>  > >    iio: buffer-dma: Use DMABUFs instead of custom solution
>>  > >    iio: buffer-dma: Implement new DMABUF based userspace API
>>  > >    iio: buffer-dma: Boost performance using write-combine cache
>>  > > setting
>>  > >    iio: buffer-dmaengine: Support new DMABUF based userspace API
>>  > >    iio: core: Add support for cyclic buffers
>>  > >    iio: buffer-dmaengine: Add support for cyclic buffers
>>  > >    Documentation: iio: Document high-speed DMABUF based API
>>  > >
>>  > >   Documentation/driver-api/dma-buf.rst          |   2 +
>>  > >   Documentation/iio/dmabuf_api.rst              |  94 +++
>>  > >   Documentation/iio/index.rst                   |   2 +
>>  > >   drivers/iio/adc/adi-axi-adc.c                 |   3 +-
>>  > >   drivers/iio/buffer/industrialio-buffer-dma.c  | 670
>>  > > ++++++++++++++----
>>  > >   .../buffer/industrialio-buffer-dmaengine.c    |  42 +-
>>  > >   drivers/iio/industrialio-buffer.c             |  49 ++
>>  > >   include/linux/iio/buffer-dma.h                |  43 +-
>>  > >   include/linux/iio/buffer-dmaengine.h          |   5 +-
>>  > >   include/linux/iio/buffer_impl.h               |   8 +
>>  > >   include/uapi/linux/iio/buffer.h               |  30 +
>>  > >   11 files changed, 783 insertions(+), 165 deletions(-)
>>  > >   create mode 100644 Documentation/iio/dmabuf_api.rst
>>  > >
>>  > >  --
>>  > >  2.33.0
>>  > >
>>  >
>>  > --
>>  > Daniel Vetter
>>  > Software Engineer, Intel Corporation
>>  > http://blog.ffwll.ch
>> 
>> 
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API
  2021-11-17 12:50       ` Paul Cercueil
@ 2021-11-17 13:42         ` Hennerich, Michael
  0 siblings, 0 replies; 59+ messages in thread
From: Hennerich, Michael @ 2021-11-17 13:42 UTC (permalink / raw)
  To: Paul Cercueil, Daniel Vetter
  Cc: Jonathan Cameron, linux-iio, linux-kernel, dri-devel,
	Christian König, linaro-mm-sig, Alexandru Ardelean,
	linux-media



> -----Original Message-----
> From: Paul Cercueil <paul@crapouillou.net>
> Sent: Mittwoch, 17. November 2021 13:50
> To: Daniel Vetter <daniel@ffwll.ch>
> Cc: Jonathan Cameron <jic23@kernel.org>; Hennerich, Michael
> <Michael.Hennerich@analog.com>; linux-iio@vger.kernel.org; linux-
> kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Christian König
> <christian.koenig@amd.com>; linaro-mm-sig@lists.linaro.org; Alexandru
> Ardelean <ardeleanalex@gmail.com>; linux-media@vger.kernel.org
> Subject: Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based
> API
> 
> [External]
> 
> Hi Daniel,
> 
> Le mar., nov. 16 2021 at 17:02:25 +0100, Daniel Vetter <daniel@ffwll.ch> a
> écrit :
> > On Mon, Nov 15, 2021 at 02:57:37PM +0000, Paul Cercueil wrote:
> >>  Hi Daniel,
> >>
> >>  Le lun., nov. 15 2021 at 15:37:16 +0100, Daniel Vetter
> >> <daniel@ffwll.ch> a  écrit :
> >>  > On Mon, Nov 15, 2021 at 02:19:10PM +0000, Paul Cercueil wrote:
> >>  > >  Hi Jonathan,
> >>  > >
> >>  > >  This patchset introduces a new userspace interface based on
> >> DMABUF  > >  objects, to complement the existing fileio based API.
> >>  > >
> >>  > >  The advantage of this DMABUF based interface vs. the fileio  >
> >> >  interface, is that it avoids an extra copy of the data between the
> >> > >  kernel and userspace. This is particularly userful for
> >> high-speed  > >  devices which produce several megabytes or even
> >> gigabytes of data  > > per  > >  second.
> >>  > >
> >>  > >  The first few patches [01/15] to [03/15] are not really
> >> related, but  > >  allow to reduce the size of the patches that
> >> introduce the new API.
> >>  > >
> >>  > >  Patch [04/15] to [06/15] enables write() support to the
> >> buffer-dma  > >  implementation of the buffer API, to continue the
> >> work done by  > >  Mihail Chindris.
> >>  > >
> >>  > >  Patches [07/15] to [12/15] introduce the new DMABUF based API.
> >>  > >
> >>  > >  Patches [13/15] and [14/15] add support for cyclic buffers,
> >> only  > > through  > >  the new API. A cyclic buffer will be repeated
> >> on the output until  > > the  > >  buffer is disabled.
> >>  > >
> >>  > >  Patch [15/15] adds documentation about the new API.
> >>  > >
> >>  > >  For now, the API allows you to alloc DMABUF objects and mmap()
> >> them  > > to  > >  read or write the samples. It does not yet allow
> >> to import DMABUFs  > >  parented to other subsystems, but that should
> >> eventually be possible  > >  once it's wired.
> >>  > >
> >>  > >  This patchset is inspired by the "mmap interface" that was  > >
> >> previously  > >  submitted by Alexandru Ardelean and Lars-Peter
> >> Clausen, so it would  > > be  > >  great if I could get a review from
> >> you guys. Alexandru's commit was  > >  signed with his @analog.com
> >> address but he doesn't work at ADI  > > anymore,  > >  so I believe
> >> I'll need him to sign with a new email.
> >>  >
> >>  > Why dma-buf? dma-buf looks like something super generic and
> >> useful,  > until  > you realize that there's a metric ton of
> >> gpu/accelerator bagage piled  > in.
> >>  > So unless buffer sharing with a gpu/video/accel/whatever device is
> >> the  > goal here, and it's just for a convenient way to get at buffer
> >> handles,  > this doesn't sound like a good idea.
> >>
> >>  Good question. The first reason is that a somewhat similar API was
> >> intented  before[1], but refused upstream as it was kind of
> >> re-inventing the wheel.
> >>
> >>  The second reason, is that we want to be able to share buffers too,
> >> not with  gpu/video but with the network (zctap) and in the future
> >> with USB
> >>  (functionFS) too.
> >>
> >>  [1]:
> >> https://urldefense.com/v3/__https://lore.kernel.org/linux-iio/2021021
> >> 7073638.21681-1-
> alexandru.ardelean@analog.com/T/__;!!A3Ni8CS0y2Y!p67g
> >>
> KdXW2zXUxASbwbCKzXgcwxEo1R3h4AumAE6QHiSPyI3KYaz9TmGpnSIF3wsQ
> c3gAgQ$
> >
> > Hm is that code merged already in upstream already?
> 
> No, it was never merged.
> 
> > I know that dma-buf looks really generic, but like I said if there's
> > no need ever to interface with any of the gpu buffer sharing it might
> > be better to use something else (like get_user_pages maybe, would that
> > work?).
> 
> If it was such a bad idea, why didn't you say it in the first place when you
> replied to my request for feedback? [1]
> 
> I don't think we have any other solution. We can design a custom API to pass
> buffers between IIO and user space, but that won't allow us to share these
> buffers with other subsystems. If dma-buf is not a generic solution, then we
> need a generic solution.

I don't think we need another generic solution - dma-buf is the solution:

From https://www.kernel.org/doc/html/v5.15/driver-api/dma-buf.html:
" The dma-buf subsystem provides the framework for sharing buffers for hardware (DMA)
 access across multiple device drivers and subsystems, and for synchronizing asynchronous hardware access."

That's sounds pretty much like what we need.

It lives under linux/drivers/dma-buf 
if it would be a video only shouldn't this be documented somewhere, and perhaps the code should be somewhere else?

Just my 2cents.

Greetings,
Michael 

> 
> [1]:
> https://urldefense.com/v3/__https://x-lore.kernel.org/io-uring/b0a336c0-
> ae2f-e77f-3c5f-
> 51fdb3fc51fe@amd.com/T/__;!!A3Ni8CS0y2Y!p67gKdXW2zXUxASbwbCKzXg
> cwxEo1R3h4AumAE6QHiSPyI3KYaz9TmGpnSIF3wu9jAKL1Q$
> 
> >>  > Also if the idea is to this with gpus/accelerators then I'd really
> >> like  > to  > see the full thing, since most likely at that point you
> >> also want  > dma_fence. And once we talk dma_fence things get truly
> >> horrible from a  > locking pov :-( Or well, just highly constrained
> >> and I get to review  > what  > iio is doing with these buffers to
> >> make sure it all fits.
> >>
> >>  There is some dma_fence action in patch #10, which is enough for the
> >> userspace apps to use the API.
> >>
> >>  What "horribleness" are we talking about here? It doesn't look that
> >> scary to  me, but I certainly don't have the complete picture.
> >
> > You need to annotate all the code involved in signalling that
> > dma_fence using dma_fence_begin/end_signalling, and then enable full
> > lockdep and everything.
> 
> Doesn't dma_fence_signal() do it for me? Looking at the code, it does call
> dma_fence_begin/end_signalling.
> 
> Cheers,
> -Paul
> 
> > You can safely assume you'll find bugs, because we even have bugs
> > about this in gpu drivers (where that annotation isn't fully rolled
> > out yet).
> >
> > The tldr is that you can allocate memory in there. And a pile of other
> > restrictions, but not being able to allocate memory (well GFP_ATOMIC
> > is ok, but that can fail) is a very serious restriction.
> > -Daniel
> >
> >
> >>
> >>  Cheers,
> >>  -Paul
> >>
> >>  > Cheers, Daniel
> >>  >
> >>  > >
> >>  > >  Cheers,
> >>  > >  -Paul
> >>  > >
> >>  > >  Alexandru Ardelean (1):
> >>  > >    iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> >>  > >
> >>  > >  Paul Cercueil (14):
> >>  > >    iio: buffer-dma: Get rid of incoming/outgoing queues
> >>  > >    iio: buffer-dma: Remove unused iio_buffer_block struct
> >>  > >    iio: buffer-dma: Use round_down() instead of rounddown()
> >>  > >    iio: buffer-dma: Enable buffer write support
> >>  > >    iio: buffer-dmaengine: Support specifying buffer direction
> >>  > >    iio: buffer-dmaengine: Enable write support
> >>  > >    iio: core: Add new DMABUF interface infrastructure
> >>  > >    iio: buffer-dma: Use DMABUFs instead of custom solution
> >>  > >    iio: buffer-dma: Implement new DMABUF based userspace API
> >>  > >    iio: buffer-dma: Boost performance using write-combine cache
> >>  > > setting
> >>  > >    iio: buffer-dmaengine: Support new DMABUF based userspace API
> >>  > >    iio: core: Add support for cyclic buffers
> >>  > >    iio: buffer-dmaengine: Add support for cyclic buffers
> >>  > >    Documentation: iio: Document high-speed DMABUF based API
> >>  > >
> >>  > >   Documentation/driver-api/dma-buf.rst          |   2 +
> >>  > >   Documentation/iio/dmabuf_api.rst              |  94 +++
> >>  > >   Documentation/iio/index.rst                   |   2 +
> >>  > >   drivers/iio/adc/adi-axi-adc.c                 |   3 +-
> >>  > >   drivers/iio/buffer/industrialio-buffer-dma.c  | 670
> >>  > > ++++++++++++++----
> >>  > >   .../buffer/industrialio-buffer-dmaengine.c    |  42 +-
> >>  > >   drivers/iio/industrialio-buffer.c             |  49 ++
> >>  > >   include/linux/iio/buffer-dma.h                |  43 +-
> >>  > >   include/linux/iio/buffer-dmaengine.h          |   5 +-
> >>  > >   include/linux/iio/buffer_impl.h               |   8 +
> >>  > >   include/uapi/linux/iio/buffer.h               |  30 +
> >>  > >   11 files changed, 783 insertions(+), 165 deletions(-)
> >>  > >   create mode 100644 Documentation/iio/dmabuf_api.rst
> >>  > >
> >>  > >  --
> >>  > >  2.33.0
> >>  > >
> >>  >
> >>  > --
> >>  > Daniel Vetter
> >>  > Software Engineer, Intel Corporation  >
> >> https://urldefense.com/v3/__http://blog.ffwll.ch__;!!A3Ni8CS0y2Y!p67g
> >>
> KdXW2zXUxASbwbCKzXgcwxEo1R3h4AumAE6QHiSPyI3KYaz9TmGpnSIF3ws_
> BZIjsg$
> >>
> >>
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > https://urldefense.com/v3/__http://blog.ffwll.ch__;!!A3Ni8CS0y2Y!p67gK
> >
> dXW2zXUxASbwbCKzXgcwxEo1R3h4AumAE6QHiSPyI3KYaz9TmGpnSIF3ws_B
> ZIjsg$
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting
  2021-11-15 14:19 ` [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting Paul Cercueil
@ 2021-11-18 11:45   ` Paul Cercueil
  2021-11-21 15:00   ` Jonathan Cameron
  1 sibling, 0 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-18 11:45 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

Hi,

Le lun., nov. 15 2021 at 14:19:21 +0000, Paul Cercueil 
<paul@crapouillou.net> a écrit :
> We can be certain that the input buffers will only be accessed by
> userspace for reading, and output buffers will mostly be accessed by
> userspace for writing.
> 
> Therefore, it makes more sense to use only fully cached input buffers,
> and to use the write-combine cache coherency setting for output 
> buffers.
> 
> This boosts performance, as the data written to the output buffers 
> does
> not have to be sync'd for coherency. It will halve performance if the
> userspace application tries to read from the output buffer, but this
> should never happen.
> 
> Since we don't need to sync the cache when disabling CPU access either
> for input buffers or output buffers, the .end_cpu_access() callback 
> can
> be dropped completely.
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  drivers/iio/buffer/industrialio-buffer-dma.c | 82 
> +++++++++++++-------
>  1 file changed, 54 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c 
> b/drivers/iio/buffer/industrialio-buffer-dma.c
> index 92356ee02f30..fb39054d8c15 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -229,8 +229,33 @@ static int iio_buffer_dma_buf_mmap(struct 
> dma_buf *dbuf,
>  	if (vma->vm_ops->open)
>  		vma->vm_ops->open(vma);
> 
> -	return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> -			      virt_to_page(block->vaddr));
> +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> +		/*
> +		 * With an input buffer, userspace will only read the data and
> +		 * never write. We can mmap the buffer fully cached.
> +		 */
> +		return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> +				      virt_to_page(block->vaddr));
> +	} else {
> +		/*
> +		 * With an output buffer, userspace will only write the data
> +		 * and should rarely (if never) read from it. It is better to
> +		 * use write-combine in this case.
> +		 */
> +		return dma_mmap_wc(dev, vma, block->vaddr, block->phys_addr,
> +				   vma->vm_end - vma->vm_start);
> +	}
> +}
> +
> +static void iio_dma_buffer_free_dmamem(struct iio_dma_buffer_block 
> *block)
> +{
> +	struct device *dev = block->queue->dev;
> +	size_t size = PAGE_ALIGN(block->size);
> +
> +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> +		dma_free_coherent(dev, size, block->vaddr, block->phys_addr);
> +	else
> +		dma_free_wc(dev, size, block->vaddr, block->phys_addr);
>  }
> 
>  static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
> @@ -243,9 +268,7 @@ static void iio_buffer_dma_buf_release(struct 
> dma_buf *dbuf)
> 
>  	mutex_lock(&queue->lock);
> 
> -	dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
> -			  block->vaddr, block->phys_addr);
> -
> +	iio_dma_buffer_free_dmamem(block);
>  	kfree(block);
> 
>  	queue->num_blocks--;
> @@ -268,19 +291,6 @@ static int 
> iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
>  	return 0;
>  }
> 
> -static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
> -					     enum dma_data_direction dma_dir)
> -{
> -	struct iio_dma_buffer_block *block = dbuf->priv;
> -	struct device *dev = block->queue->dev;
> -
> -	/* We only need to sync the cache for output buffers */
> -	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
> -		dma_sync_single_for_device(dev, block->phys_addr, block->size, 
> dma_dir);
> -
> -	return 0;
> -}
> -
>  static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
>  	.attach			= iio_buffer_dma_buf_attach,
>  	.map_dma_buf		= iio_buffer_dma_buf_map,
> @@ -288,9 +298,28 @@ static const struct dma_buf_ops 
> iio_dma_buffer_dmabuf_ops = {
>  	.mmap			= iio_buffer_dma_buf_mmap,
>  	.release		= iio_buffer_dma_buf_release,
>  	.begin_cpu_access	= iio_buffer_dma_buf_begin_cpu_access,
> -	.end_cpu_access		= iio_buffer_dma_buf_end_cpu_access,
>  };
> 
> +static int iio_dma_buffer_alloc_dmamem(struct iio_dma_buffer_block 
> *block)
> +{
> +	struct device *dev = block->queue->dev;
> +	size_t size = PAGE_ALIGN(block->size);
> +
> +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> +		block->vaddr = dma_alloc_coherent(dev, size,
> +						  &block->phys_addr,
> +						  GFP_KERNEL);

I'm so used to dma_alloc_noncoherent() that I didn't even notice that 
it was dma_alloc_coherent() here. The code I added meant to work with 
non-coherent memory - hence the dma_sync_* operations and the use of 
dma_mmap_pages().

I'll fix that in V2.

Cheers,
-Paul

> +	} else {
> +		block->vaddr = dma_alloc_wc(dev, size,
> +					    &block->phys_addr,
> +					    GFP_KERNEL);
> +	}
> +	if (!block->vaddr)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
>  static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>  	struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
>  {
> @@ -303,12 +332,12 @@ static struct iio_dma_buffer_block 
> *iio_dma_buffer_alloc_block(
>  	if (!block)
>  		return ERR_PTR(-ENOMEM);
> 
> -	block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
> -		&block->phys_addr, GFP_KERNEL);
> -	if (!block->vaddr) {
> -		err = -ENOMEM;
> +	block->size = size;
> +	block->queue = queue;
> +
> +	err = iio_dma_buffer_alloc_dmamem(block);
> +	if (err)
>  		goto err_free_block;
> -	}
> 
>  	einfo.ops = &iio_dma_buffer_dmabuf_ops;
>  	einfo.size = PAGE_ALIGN(size);
> @@ -322,10 +351,8 @@ static struct iio_dma_buffer_block 
> *iio_dma_buffer_alloc_block(
>  	}
> 
>  	block->dmabuf = dmabuf;
> -	block->size = size;
>  	block->bytes_used = size;
>  	block->state = IIO_BLOCK_STATE_DONE;
> -	block->queue = queue;
>  	block->fileio = fileio;
>  	INIT_LIST_HEAD(&block->head);
> 
> @@ -338,8 +365,7 @@ static struct iio_dma_buffer_block 
> *iio_dma_buffer_alloc_block(
>  	return block;
> 
>  err_free_dma:
> -	dma_free_coherent(queue->dev, PAGE_ALIGN(size),
> -			  block->vaddr, block->phys_addr);
> +	iio_dma_buffer_free_dmamem(block);
>  err_free_block:
>  	kfree(block);
>  	return ERR_PTR(err);
> --
> 2.33.0
> 



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function
  2021-11-16 10:59   ` Alexandru Ardelean
@ 2021-11-21 13:49     ` Jonathan Cameron
  0 siblings, 0 replies; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-21 13:49 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: Paul Cercueil, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, LKML, linux-media,
	dri-devel, linaro-mm-sig, Alexandru Ardelean

On Tue, 16 Nov 2021 12:59:30 +0200
Alexandru Ardelean <ardeleanalex@gmail.com> wrote:

> On Mon, Nov 15, 2021 at 4:20 PM Paul Cercueil <paul@crapouillou.net> wrote:
> >
> > From: Alexandru Ardelean <alexandru.ardelean@analog.com>
> >
> > A part of the logic in the iio_dma_buffer_exit() is required for the change
> > to add mmap support to IIO buffers.
> > This change splits the logic into a separate function, which will be
> > re-used later.
> >  
> 
> Not sure how the protocol is here, since my old @analog.com email
> isn't working anymore.
> 
> But:
> 
> Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>

If you want to both document your email address change and make various scripts
find you more easily, send a patch for .mailmap

In meantime we can probably leave your original sign off in place but add
a Cc: with an up to date email address. That will reflect that the sign off
for this was (I assume) given back when you were at Analog.

> 
> Thanks :)
> Alex
> 
> > Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
> > Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> > ---
> >  drivers/iio/buffer/industrialio-buffer-dma.c | 39 +++++++++++---------
> >  1 file changed, 22 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> > index eeeed6b2e0cf..eb8cfd3af030 100644
> > --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> > +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> > @@ -358,6 +358,27 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
> >  }
> >  EXPORT_SYMBOL_GPL(iio_dma_buffer_request_update);
> >
> > +static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue)
> > +{
> > +       unsigned int i;
> > +
> > +       spin_lock_irq(&queue->list_lock);
> > +       for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> > +               if (!queue->fileio.blocks[i])
> > +                       continue;
> > +               queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
> > +       }
> > +       spin_unlock_irq(&queue->list_lock);
> > +
> > +       for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> > +               if (!queue->fileio.blocks[i])
> > +                       continue;
> > +               iio_buffer_block_put(queue->fileio.blocks[i]);
> > +               queue->fileio.blocks[i] = NULL;
> > +       }
> > +       queue->fileio.active_block = NULL;
> > +}
> > +
> >  static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,
> >         struct iio_dma_buffer_block *block)
> >  {
> > @@ -681,25 +702,9 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_init);
> >   */
> >  void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
> >  {
> > -       unsigned int i;
> > -
> >         mutex_lock(&queue->lock);
> >
> > -       spin_lock_irq(&queue->list_lock);
> > -       for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> > -               if (!queue->fileio.blocks[i])
> > -                       continue;
> > -               queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
> > -       }
> > -       spin_unlock_irq(&queue->list_lock);
> > -
> > -       for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> > -               if (!queue->fileio.blocks[i])
> > -                       continue;
> > -               iio_buffer_block_put(queue->fileio.blocks[i]);
> > -               queue->fileio.blocks[i] = NULL;
> > -       }
> > -       queue->fileio.active_block = NULL;
> > +       iio_dma_buffer_fileio_free(queue);
> >         queue->ops = NULL;
> >
> >         mutex_unlock(&queue->lock);
> > --
> > 2.33.0
> >  


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API
  2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
                   ` (12 preceding siblings ...)
  2021-11-15 14:37 ` [PATCH 00/15] iio: buffer-dma: write() and new " Daniel Vetter
@ 2021-11-21 13:57 ` Jonathan Cameron
  13 siblings, 0 replies; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-21 13:57 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Mon, 15 Nov 2021 14:19:10 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> Hi Jonathan,
> 
> This patchset introduces a new userspace interface based on DMABUF
> objects, to complement the existing fileio based API.
> 
> The advantage of this DMABUF based interface vs. the fileio
> interface, is that it avoids an extra copy of the data between the
> kernel and userspace. This is particularly userful for high-speed
> devices which produce several megabytes or even gigabytes of data per
> second.

I'm going to defer to others for the discussion of whether dmabuf is
the right way to do this. I look forwards to some conclusions.
Much like yourselves, I want to see 'a' way of doing this but I'm
not knowledgeable enough about the quirks and corner cases of the
different options to offer a strong opinion.

Either way, it's great to get that discussion going and it would never
have happened without having actual code so thanks for doing this!

> 
> The first few patches [01/15] to [03/15] are not really related, but
> allow to reduce the size of the patches that introduce the new API.
> 
> Patch [04/15] to [06/15] enables write() support to the buffer-dma
> implementation of the buffer API, to continue the work done by
> Mihail Chindris.
> 
> Patches [07/15] to [12/15] introduce the new DMABUF based API.
> 
> Patches [13/15] and [14/15] add support for cyclic buffers, only through
> the new API. A cyclic buffer will be repeated on the output until the
> buffer is disabled.
> 
> Patch [15/15] adds documentation about the new API.
> 
> For now, the API allows you to alloc DMABUF objects and mmap() them to
> read or write the samples. It does not yet allow to import DMABUFs
> parented to other subsystems, but that should eventually be possible
> once it's wired.
> 
> This patchset is inspired by the "mmap interface" that was previously
> submitted by Alexandru Ardelean and Lars-Peter Clausen, so it would be
> great if I could get a review from you guys. Alexandru's commit was
> signed with his @analog.com address but he doesn't work at ADI anymore,
> so I believe I'll need him to sign with a new email.

Given it reflects a sign off given back then I think it's fine to
leave them as analog.com but maybe he can give an Acked-by: on a different
email address to reflect that he is happy for these to go forwards.

It's far from unusual to have out of date email addresses in sign offs
given often code is coming out of trees many years after someone
originally wrote it for a vendor tree or similar.  Whilst not a lawyer
I don't think that's a problem.

Jonathan

> 
> Cheers,
> -Paul
> 
> Alexandru Ardelean (1):
>   iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> 
> Paul Cercueil (14):
>   iio: buffer-dma: Get rid of incoming/outgoing queues
>   iio: buffer-dma: Remove unused iio_buffer_block struct
>   iio: buffer-dma: Use round_down() instead of rounddown()
>   iio: buffer-dma: Enable buffer write support
>   iio: buffer-dmaengine: Support specifying buffer direction
>   iio: buffer-dmaengine: Enable write support
>   iio: core: Add new DMABUF interface infrastructure
>   iio: buffer-dma: Use DMABUFs instead of custom solution
>   iio: buffer-dma: Implement new DMABUF based userspace API
>   iio: buffer-dma: Boost performance using write-combine cache setting
>   iio: buffer-dmaengine: Support new DMABUF based userspace API
>   iio: core: Add support for cyclic buffers
>   iio: buffer-dmaengine: Add support for cyclic buffers
>   Documentation: iio: Document high-speed DMABUF based API
> 
>  Documentation/driver-api/dma-buf.rst          |   2 +
>  Documentation/iio/dmabuf_api.rst              |  94 +++
>  Documentation/iio/index.rst                   |   2 +
>  drivers/iio/adc/adi-axi-adc.c                 |   3 +-
>  drivers/iio/buffer/industrialio-buffer-dma.c  | 670 ++++++++++++++----
>  .../buffer/industrialio-buffer-dmaengine.c    |  42 +-
>  drivers/iio/industrialio-buffer.c             |  49 ++
>  include/linux/iio/buffer-dma.h                |  43 +-
>  include/linux/iio/buffer-dmaengine.h          |   5 +-
>  include/linux/iio/buffer_impl.h               |   8 +
>  include/uapi/linux/iio/buffer.h               |  30 +
>  11 files changed, 783 insertions(+), 165 deletions(-)
>  create mode 100644 Documentation/iio/dmabuf_api.rst
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-15 14:19 ` [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues Paul Cercueil
  2021-11-16  8:23   ` Alexandru Ardelean
@ 2021-11-21 14:05   ` Jonathan Cameron
  2021-11-21 16:23   ` Lars-Peter Clausen
  2 siblings, 0 replies; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-21 14:05 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Mon, 15 Nov 2021 14:19:11 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> The buffer-dma code was using two queues, incoming and outgoing, to
> manage the state of the blocks in use.
> 
> While this totally works, it adds some complexity to the code,
> especially since the code only manages 2 blocks. It is much easier to
> just check each block's state manually, and keep a counter for the next
> block to dequeue.
> 
> Since the new DMABUF based API wouldn't use these incoming and outgoing
> queues anyway, getting rid of them now makes the upcoming changes
> simpler.
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
This looks to be a nice little cleanup on it's own.

Let me know if you'd like me to pick this up before the main discussion
comes to any conclusion.

Jonathan

> ---
>  drivers/iio/buffer/industrialio-buffer-dma.c | 68 ++++++++++----------
>  include/linux/iio/buffer-dma.h               |  7 +-
>  2 files changed, 37 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index d348af8b9705..abac88f20104 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -191,16 +191,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>  
>  static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
>  {
> -	struct iio_dma_buffer_queue *queue = block->queue;
> -
> -	/*
> -	 * The buffer has already been freed by the application, just drop the
> -	 * reference.
> -	 */
> -	if (block->state != IIO_BLOCK_STATE_DEAD) {
> +	if (block->state != IIO_BLOCK_STATE_DEAD)
>  		block->state = IIO_BLOCK_STATE_DONE;
> -		list_add_tail(&block->head, &queue->outgoing);
> -	}
>  }
>  
>  /**
> @@ -317,11 +309,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
>  	 * dead. This means we can reset the lists without having to fear
>  	 * corrution.
>  	 */
> -	INIT_LIST_HEAD(&queue->outgoing);
>  	spin_unlock_irq(&queue->list_lock);
>  
> -	INIT_LIST_HEAD(&queue->incoming);
> -
>  	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
>  		if (queue->fileio.blocks[i]) {
>  			block = queue->fileio.blocks[i];
> @@ -346,7 +335,6 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
>  		}
>  
>  		block->state = IIO_BLOCK_STATE_QUEUED;
> -		list_add_tail(&block->head, &queue->incoming);
>  	}
>  
>  out_unlock:
> @@ -401,13 +389,18 @@ int iio_dma_buffer_enable(struct iio_buffer *buffer,
>  	struct iio_dev *indio_dev)
>  {
>  	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> -	struct iio_dma_buffer_block *block, *_block;
> +	struct iio_dma_buffer_block *block;
> +	unsigned int i;
>  
>  	mutex_lock(&queue->lock);
>  	queue->active = true;
> -	list_for_each_entry_safe(block, _block, &queue->incoming, head) {
> -		list_del(&block->head);
> -		iio_dma_buffer_submit_block(queue, block);
> +	queue->fileio.next_dequeue = 0;
> +
> +	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> +		block = queue->fileio.blocks[i];
> +
> +		if (block->state == IIO_BLOCK_STATE_QUEUED)
> +			iio_dma_buffer_submit_block(queue, block);
>  	}
>  	mutex_unlock(&queue->lock);
>  
> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>  static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue *queue,
>  	struct iio_dma_buffer_block *block)
>  {
> -	if (block->state == IIO_BLOCK_STATE_DEAD) {
> +	if (block->state == IIO_BLOCK_STATE_DEAD)
>  		iio_buffer_block_put(block);
> -	} else if (queue->active) {
> +	else if (queue->active)
>  		iio_dma_buffer_submit_block(queue, block);
> -	} else {
> +	else
>  		block->state = IIO_BLOCK_STATE_QUEUED;
> -		list_add_tail(&block->head, &queue->incoming);
> -	}
>  }
>  
>  static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
>  	struct iio_dma_buffer_queue *queue)
>  {
>  	struct iio_dma_buffer_block *block;
> +	unsigned int idx;
>  
>  	spin_lock_irq(&queue->list_lock);
> -	block = list_first_entry_or_null(&queue->outgoing, struct
> -		iio_dma_buffer_block, head);
> -	if (block != NULL) {
> -		list_del(&block->head);
> +
> +	idx = queue->fileio.next_dequeue;
> +	block = queue->fileio.blocks[idx];
> +
> +	if (block->state == IIO_BLOCK_STATE_DONE) {
>  		block->state = IIO_BLOCK_STATE_DEQUEUED;
> +		idx = (idx + 1) % ARRAY_SIZE(queue->fileio.blocks);
> +		queue->fileio.next_dequeue = idx;
> +	} else {
> +		block = NULL;
>  	}
> +
>  	spin_unlock_irq(&queue->list_lock);
>  
>  	return block;
> @@ -539,6 +537,7 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
>  	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buf);
>  	struct iio_dma_buffer_block *block;
>  	size_t data_available = 0;
> +	unsigned int i;
>  
>  	/*
>  	 * For counting the available bytes we'll use the size of the block not
> @@ -552,8 +551,15 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf)
>  		data_available += queue->fileio.active_block->size;
>  
>  	spin_lock_irq(&queue->list_lock);
> -	list_for_each_entry(block, &queue->outgoing, head)
> -		data_available += block->size;
> +
> +	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
> +		block = queue->fileio.blocks[i];
> +
> +		if (block != queue->fileio.active_block
> +		    && block->state == IIO_BLOCK_STATE_DONE)
> +			data_available += block->size;
> +	}
> +
>  	spin_unlock_irq(&queue->list_lock);
>  	mutex_unlock(&queue->lock);
>  
> @@ -616,9 +622,6 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
>  	queue->dev = dev;
>  	queue->ops = ops;
>  
> -	INIT_LIST_HEAD(&queue->incoming);
> -	INIT_LIST_HEAD(&queue->outgoing);
> -
>  	mutex_init(&queue->lock);
>  	spin_lock_init(&queue->list_lock);
>  
> @@ -645,11 +648,8 @@ void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue)
>  			continue;
>  		queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD;
>  	}
> -	INIT_LIST_HEAD(&queue->outgoing);
>  	spin_unlock_irq(&queue->list_lock);
>  
> -	INIT_LIST_HEAD(&queue->incoming);
> -
>  	for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) {
>  		if (!queue->fileio.blocks[i])
>  			continue;
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index ff15c61bf319..d4ed5ff39d44 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -78,12 +78,15 @@ struct iio_dma_buffer_block {
>   * @active_block: Block being used in read()
>   * @pos: Read offset in the active block
>   * @block_size: Size of each block
> + * @next_dequeue: index of next block that will be dequeued
>   */
>  struct iio_dma_buffer_queue_fileio {
>  	struct iio_dma_buffer_block *blocks[2];
>  	struct iio_dma_buffer_block *active_block;
>  	size_t pos;
>  	size_t block_size;
> +
> +	unsigned int next_dequeue;
>  };
>  
>  /**
> @@ -97,8 +100,6 @@ struct iio_dma_buffer_queue_fileio {
>   *   atomic context as well as blocks on those lists. This is the outgoing queue
>   *   list and typically also a list of active blocks in the part that handles
>   *   the DMA controller
> - * @incoming: List of buffers on the incoming queue
> - * @outgoing: List of buffers on the outgoing queue
>   * @active: Whether the buffer is currently active
>   * @fileio: FileIO state
>   */
> @@ -109,8 +110,6 @@ struct iio_dma_buffer_queue {
>  
>  	struct mutex lock;
>  	spinlock_t list_lock;
> -	struct list_head incoming;
> -	struct list_head outgoing;
>  
>  	bool active;
>  


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown()
  2021-11-15 14:19 ` [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown() Paul Cercueil
  2021-11-16  8:26   ` Alexandru Ardelean
@ 2021-11-21 14:08   ` Jonathan Cameron
  2021-11-22 10:00     ` Paul Cercueil
  1 sibling, 1 reply; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-21 14:08 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Mon, 15 Nov 2021 14:19:13 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> We know that the buffer's alignment will always be a power of two;
> therefore, we can use the faster round_down() macro.
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
*groan*.  I don't want to know where the naming of these two came from but that
is spectacular... 

Anyhow, happy to pick up 1-3 now if you like as all are good cleanup of
existing code.

Jonathan

> ---
>  drivers/iio/buffer/industrialio-buffer-dmaengine.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> index 1ac94c4e9792..f8ce26a24c57 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
> @@ -67,7 +67,7 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue,
>  	dma_cookie_t cookie;
>  
>  	block->bytes_used = min(block->size, dmaengine_buffer->max_size);
> -	block->bytes_used = rounddown(block->bytes_used,
> +	block->bytes_used = round_down(block->bytes_used,
>  			dmaengine_buffer->align);
>  
>  	desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/15] iio: buffer-dma: Enable buffer write support
  2021-11-15 14:19 ` [PATCH 04/15] iio: buffer-dma: Enable buffer write support Paul Cercueil
  2021-11-16  8:52   ` Alexandru Ardelean
@ 2021-11-21 14:20   ` Jonathan Cameron
  2021-11-21 17:19     ` Paul Cercueil
  1 sibling, 1 reply; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-21 14:20 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Mon, 15 Nov 2021 14:19:14 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> Adding write support to the buffer-dma code is easy - the write()
> function basically needs to do the exact same thing as the read()
> function: dequeue a block, read or write the data, enqueue the block
> when entirely processed.
> 
> Therefore, the iio_buffer_dma_read() and the new iio_buffer_dma_write()
> now both call a function iio_buffer_dma_io(), which will perform this
> task.
> 
> The .space_available() callback can return the exact same value as the
> .data_available() callback for input buffers, since in both cases we
> count the exact same thing (the number of bytes in each available
> block).
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Hi Paul,

There are a few changes in here, such as the bytes_used value being set that
I'm not following the reasoning behind. More info on those?
Also good to provide something about those in this patch description.

Thanks,

Jonathan


> ---
>  drivers/iio/buffer/industrialio-buffer-dma.c | 75 +++++++++++++++-----
>  include/linux/iio/buffer-dma.h               |  7 ++
>  2 files changed, 66 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index abac88f20104..eeeed6b2e0cf 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -179,7 +179,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>  	}
>  
>  	block->size = size;
> -	block->state = IIO_BLOCK_STATE_DEQUEUED;
> +	block->bytes_used = size;
> +	block->state = IIO_BLOCK_STATE_DONE;

I don't know why these are here - some more info?

>  	block->queue = queue;
>  	INIT_LIST_HEAD(&block->head);
>  	kref_init(&block->kref);
> @@ -195,6 +196,18 @@ static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
>  		block->state = IIO_BLOCK_STATE_DONE;
>  }
>  
> +static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue *queue)
> +{
> +	__poll_t flags;
> +
> +	if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> +		flags = EPOLLIN | EPOLLRDNORM;
> +	else
> +		flags = EPOLLOUT | EPOLLWRNORM;
> +
> +	wake_up_interruptible_poll(&queue->buffer.pollq, flags);
> +}
> +
>  /**
>   * iio_dma_buffer_block_done() - Indicate that a block has been completed
>   * @block: The completed block
> @@ -212,7 +225,7 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block)
>  	spin_unlock_irqrestore(&queue->list_lock, flags);
>  
>  	iio_buffer_block_put_atomic(block);
> -	wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> +	iio_dma_buffer_queue_wake(queue);
>  }
>  EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
>  
> @@ -241,7 +254,7 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue,
>  	}
>  	spin_unlock_irqrestore(&queue->list_lock, flags);
>  
> -	wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
> +	iio_dma_buffer_queue_wake(queue);
>  }
>  EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
>  
> @@ -334,7 +347,8 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
>  			queue->fileio.blocks[i] = block;
>  		}
>  
> -		block->state = IIO_BLOCK_STATE_QUEUED;
> +		if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> +			block->state = IIO_BLOCK_STATE_QUEUED;

Possibly worth a comment on the state being set here.  I figured it out, but might
save some brain cells in future if it's stated in the code.

>  	}
>  
>  out_unlock:
> @@ -467,20 +481,12 @@ static struct iio_dma_buffer_block *iio_dma_buffer_dequeue(
>  	return block;
>  }
>  
> -/**
> - * iio_dma_buffer_read() - DMA buffer read callback
> - * @buffer: Buffer to read form
> - * @n: Number of bytes to read
> - * @user_buffer: Userspace buffer to copy the data to
> - *
> - * Should be used as the read callback for iio_buffer_access_ops
> - * struct for DMA buffers.
> - */
> -int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> -	char __user *user_buffer)
> +static int iio_dma_buffer_io(struct iio_buffer *buffer,
> +			     size_t n, char __user *user_buffer, bool is_write)
>  {
>  	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
>  	struct iio_dma_buffer_block *block;
> +	void *addr;
>  	int ret;
>  
>  	if (n < buffer->bytes_per_datum)
> @@ -503,8 +509,13 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>  	n = rounddown(n, buffer->bytes_per_datum);
>  	if (n > block->bytes_used - queue->fileio.pos)
>  		n = block->bytes_used - queue->fileio.pos;
> +	addr = block->vaddr + queue->fileio.pos;
>  
> -	if (copy_to_user(user_buffer, block->vaddr + queue->fileio.pos, n)) {
> +	if (is_write)
> +		ret = !!copy_from_user(addr, user_buffer, n);
> +	else
> +		ret = !!copy_to_user(user_buffer, addr, n);

What is the !! gaining us here?  We only care about == 0 vs != 0 so
forcing it to be 0 or 1 isn't useful.

> +	if (ret) {
>  		ret = -EFAULT;
>  		goto out_unlock;
>  	}
> @@ -513,6 +524,7 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>  
>  	if (queue->fileio.pos == block->bytes_used) {
>  		queue->fileio.active_block = NULL;
> +		block->bytes_used = block->size;

This seems to be a functional change that isn't called out in the patch description.

>  		iio_dma_buffer_enqueue(queue, block);
>  	}
>  
> @@ -523,8 +535,39 @@ int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>  
>  	return ret;
>  }
> +
> +/**
> + * iio_dma_buffer_read() - DMA buffer read callback
> + * @buffer: Buffer to read form
> + * @n: Number of bytes to read
> + * @user_buffer: Userspace buffer to copy the data to
> + *
> + * Should be used as the read callback for iio_buffer_access_ops
> + * struct for DMA buffers.
> + */
> +int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> +	char __user *user_buffer)
> +{
> +	return iio_dma_buffer_io(buffer, n, user_buffer, false);
> +}
>  EXPORT_SYMBOL_GPL(iio_dma_buffer_read);
>  
> +/**
> + * iio_dma_buffer_write() - DMA buffer write callback
> + * @buffer: Buffer to read form
> + * @n: Number of bytes to read
> + * @user_buffer: Userspace buffer to copy the data from
> + *
> + * Should be used as the write callback for iio_buffer_access_ops
> + * struct for DMA buffers.
> + */
> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> +			 const char __user *user_buffer)
> +{
> +	return iio_dma_buffer_io(buffer, n, (__force char *)user_buffer, true);
> +}
> +EXPORT_SYMBOL_GPL(iio_dma_buffer_write);
> +
>  /**
>   * iio_dma_buffer_data_available() - DMA buffer data_available callback
>   * @buf: Buffer to check for data availability
> diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h
> index a65a005c4a19..09c07d5563c0 100644
> --- a/include/linux/iio/buffer-dma.h
> +++ b/include/linux/iio/buffer-dma.h
> @@ -132,6 +132,8 @@ int iio_dma_buffer_disable(struct iio_buffer *buffer,
>  	struct iio_dev *indio_dev);
>  int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>  	char __user *user_buffer);
> +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> +			 const char __user *user_buffer);
>  size_t iio_dma_buffer_data_available(struct iio_buffer *buffer);
>  int iio_dma_buffer_set_bytes_per_datum(struct iio_buffer *buffer, size_t bpd);
>  int iio_dma_buffer_set_length(struct iio_buffer *buffer, unsigned int length);
> @@ -142,4 +144,9 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue,
>  void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
>  void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);
>  
> +static inline size_t iio_dma_buffer_space_available(struct iio_buffer *buffer)
> +{
> +	return iio_dma_buffer_data_available(buffer);
> +}
> +
>  #endif


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 07/15] iio: core: Add new DMABUF interface infrastructure
  2021-11-15 14:19 ` [PATCH 07/15] iio: core: Add new DMABUF interface infrastructure Paul Cercueil
@ 2021-11-21 14:31   ` Jonathan Cameron
  0 siblings, 0 replies; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-21 14:31 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Mon, 15 Nov 2021 14:19:17 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> Add the necessary infrastructure to the IIO core to support a new DMABUF
> based interface.
> 
> The advantage of this new DMABUF based interface vs. the read()
> interface, is that it avoids an extra copy of the data between the
> kernel and userspace. This is particularly userful for high-speed
> devices which produce several megabytes or even gigabytes of data per
> second.
> 
> The data in this new DMABUF interface is managed at the granularity of
> DMABUF objects. Reducing the granularity from byte level to block level
> is done to reduce the userspace-kernelspace synchronization overhead
> since performing syscalls for each byte at a few Mbps is just not
> feasible.
> 
> This of course leads to a slightly increased latency. For this reason an
> application can choose the size of the DMABUFs as well as how many it
> allocates. E.g. two DMABUFs would be a traditional double buffering
> scheme. But using a higher number might be necessary to avoid
> underflow/overflow situations in the presence of scheduling latencies.
> 
> As part of the interface, 2 new IOCTLs have been added:
> 
> IIO_BUFFER_DMABUF_ALLOC_IOCTL(struct iio_dmabuf_alloc_req *):
>  Each call will allocate a new DMABUF object. The return value (if not
>  a negative errno value as error) will be the file descriptor of the new
>  DMABUF.
> 
> IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
>  Place the DMABUF object into the queue pending for hardware process.
> 
> These two IOCTLs have to be performed on the IIO buffer's file
> descriptor (either opened from the corresponding /dev/iio:deviceX, or
> obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl).

Unrelated to this patch except tangentially.  Maybe we should enable
new buffer features only on the IIO_BUFFER_GET_FD_IOCTL() route as
we probably want to deprecate the old interfaces due to the it only
supporting a single buffer / datastream per device.

Possibly something for another day...

Nothing to add on actual code...

Jonathan

> 
> To access the data stored in a block by userspace the block must be
> mapped to the process's memory. This is done by calling mmap() on the
> DMABUF's file descriptor.
> 
> Before accessing the data through the map, you must use the
> DMA_BUF_IOCTL_SYNC(struct dma_buf_sync *) ioctl, with the
> DMA_BUF_SYNC_START flag, to make sure that the data is available.
> This call may block until the hardware is done with this block. Once
> you are done reading or writing the data, you must use this ioctl again
> with the DMA_BUF_SYNC_END flag, before enqueueing the DMABUF to the
> kernel's queue.
> 
> If you need to know when the hardware is done with a DMABUF, you can
> poll its file descriptor for the EPOLLOUT event.
> 
> Finally, to destroy a DMABUF object, simply call close() on its file
> descriptor.
> 
> A typical workflow for the new interface is:
> 
>   for block in blocks:
>     DMABUF_ALLOC block
>     mmap block
> 
>   enable buffer
> 
>   while !done
>     for block in blocks:
>       DMABUF_ENQUEUE block
> 
>       DMABUF_SYNC_START block
>       process data
>       DMABUF_SYNC_END block
> 
>   disable buffer
> 
>   for block in blocks:
>     close block
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>  drivers/iio/industrialio-buffer.c | 44 +++++++++++++++++++++++++++++++
>  include/linux/iio/buffer_impl.h   |  8 ++++++
>  include/uapi/linux/iio/buffer.h   | 29 ++++++++++++++++++++
>  3 files changed, 81 insertions(+)
> 
> diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
> index e180728914c0..30910e6c2346 100644
> --- a/drivers/iio/industrialio-buffer.c
> +++ b/drivers/iio/industrialio-buffer.c
> @@ -17,6 +17,7 @@
>  #include <linux/fs.h>
>  #include <linux/cdev.h>
>  #include <linux/slab.h>
> +#include <linux/mm.h>
>  #include <linux/poll.h>
>  #include <linux/sched/signal.h>
>  
> @@ -1585,12 +1586,55 @@ static long iio_device_buffer_getfd(struct iio_dev *indio_dev, unsigned long arg
>  	return ret;
>  }
>  
> +static int iio_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
> +				     struct iio_dmabuf __user *user_buf)
> +{
> +	struct iio_dmabuf dmabuf;
> +
> +	if (!buffer->access->enqueue_dmabuf)
> +		return -EPERM;
> +
> +	if (copy_from_user(&dmabuf, user_buf, sizeof(dmabuf)))
> +		return -EFAULT;
> +
> +	if (dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
> +		return -EINVAL;
> +
> +	return buffer->access->enqueue_dmabuf(buffer, &dmabuf);
> +}
> +
> +static int iio_buffer_alloc_dmabuf(struct iio_buffer *buffer,
> +				   struct iio_dmabuf_alloc_req __user *user_req)
> +{
> +	struct iio_dmabuf_alloc_req req;
> +
> +	if (!buffer->access->alloc_dmabuf)
> +		return -EPERM;
> +
> +	if (copy_from_user(&req, user_req, sizeof(req)))
> +		return -EFAULT;
> +
> +	if (req.resv)
> +		return -EINVAL;
> +
> +	return buffer->access->alloc_dmabuf(buffer, &req);
> +}
> +
>  static long iio_device_buffer_ioctl(struct iio_dev *indio_dev, struct file *filp,
>  				    unsigned int cmd, unsigned long arg)
>  {
> +	struct iio_dev_buffer_pair *ib = filp->private_data;
> +	struct iio_buffer *buffer = ib->buffer;
> +	void __user *_arg = (void __user *)arg;
> +
>  	switch (cmd) {
>  	case IIO_BUFFER_GET_FD_IOCTL:
>  		return iio_device_buffer_getfd(indio_dev, arg);
> +	case IIO_BUFFER_DMABUF_ALLOC_IOCTL:
> +		return iio_buffer_alloc_dmabuf(buffer, _arg);
> +	case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
> +		/* TODO: support non-blocking enqueue operation */
> +		return iio_buffer_enqueue_dmabuf(buffer, _arg);
>  	default:
>  		return IIO_IOCTL_UNHANDLED;
>  	}
> diff --git a/include/linux/iio/buffer_impl.h b/include/linux/iio/buffer_impl.h
> index e2ca8ea23e19..728541bc2c63 100644
> --- a/include/linux/iio/buffer_impl.h
> +++ b/include/linux/iio/buffer_impl.h
> @@ -39,6 +39,9 @@ struct iio_buffer;
>   *                      device stops sampling. Calles are balanced with @enable.
>   * @release:		called when the last reference to the buffer is dropped,
>   *			should free all resources allocated by the buffer.
> + * @alloc_dmabuf:	called from userspace via ioctl to allocate one DMABUF.
> + * @enqueue_dmabuf:	called from userspace via ioctl to queue this DMABUF
> + *			object to this buffer. Requires a valid DMABUF fd.
>   * @modes:		Supported operating modes by this buffer type
>   * @flags:		A bitmask combination of INDIO_BUFFER_FLAG_*
>   *
> @@ -68,6 +71,11 @@ struct iio_buffer_access_funcs {
>  
>  	void (*release)(struct iio_buffer *buffer);
>  
> +	int (*alloc_dmabuf)(struct iio_buffer *buffer,
> +			    struct iio_dmabuf_alloc_req *req);
> +	int (*enqueue_dmabuf)(struct iio_buffer *buffer,
> +			      struct iio_dmabuf *block);
> +
>  	unsigned int modes;
>  	unsigned int flags;
>  };
> diff --git a/include/uapi/linux/iio/buffer.h b/include/uapi/linux/iio/buffer.h
> index 13939032b3f6..e4621b926262 100644
> --- a/include/uapi/linux/iio/buffer.h
> +++ b/include/uapi/linux/iio/buffer.h
> @@ -5,6 +5,35 @@
>  #ifndef _UAPI_IIO_BUFFER_H_
>  #define _UAPI_IIO_BUFFER_H_
>  
> +#include <linux/types.h>
> +
> +#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS	0x00000000
> +
> +/**
> + * struct iio_dmabuf_alloc_req - Descriptor for allocating IIO DMABUFs
> + * @size:	the size of a single DMABUF
> + * @resv:	reserved
> + */
> +struct iio_dmabuf_alloc_req {
> +	__u64 size;
> +	__u64 resv;
> +};
> +
> +/**
> + * struct iio_dmabuf - Descriptor for a single IIO DMABUF object
> + * @fd:		file descriptor of the DMABUF object
> + * @flags:	one or more IIO_BUFFER_DMABUF_* flags
> + * @bytes_used:	number of bytes used in this DMABUF for the data transfer.
> + *		If zero, the full buffer is used.
> + */
> +struct iio_dmabuf {
> +	__u32 fd;
> +	__u32 flags;
> +	__u64 bytes_used;
> +};
> +
>  #define IIO_BUFFER_GET_FD_IOCTL			_IOWR('i', 0x91, int)
> +#define IIO_BUFFER_DMABUF_ALLOC_IOCTL		_IOW('i', 0x92, struct iio_dmabuf_alloc_req)
> +#define IIO_BUFFER_DMABUF_ENQUEUE_IOCTL		_IOW('i', 0x93, struct iio_dmabuf)
>  
>  #endif /* _UAPI_IIO_BUFFER_H_ */


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting
  2021-11-15 14:19 ` [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting Paul Cercueil
  2021-11-18 11:45   ` Paul Cercueil
@ 2021-11-21 15:00   ` Jonathan Cameron
  2021-11-21 17:43     ` Paul Cercueil
  1 sibling, 1 reply; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-21 15:00 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Mon, 15 Nov 2021 14:19:21 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> We can be certain that the input buffers will only be accessed by
> userspace for reading, and output buffers will mostly be accessed by
> userspace for writing.

Mostly?  Perhaps a little more info on why that's not 'only'.

> 
> Therefore, it makes more sense to use only fully cached input buffers,
> and to use the write-combine cache coherency setting for output buffers.
> 
> This boosts performance, as the data written to the output buffers does
> not have to be sync'd for coherency. It will halve performance if the
> userspace application tries to read from the output buffer, but this
> should never happen.
> 
> Since we don't need to sync the cache when disabling CPU access either
> for input buffers or output buffers, the .end_cpu_access() callback can
> be dropped completely.

We have an odd mix of coherent and non coherent DMA in here as you noted,
but are you sure this is safe on all platforms?

> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>

Any numbers to support this patch?  The mapping types are performance
optimisations so nice to know how much of a difference they make.


> ---
>  drivers/iio/buffer/industrialio-buffer-dma.c | 82 +++++++++++++-------
>  1 file changed, 54 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c
> index 92356ee02f30..fb39054d8c15 100644
> --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> @@ -229,8 +229,33 @@ static int iio_buffer_dma_buf_mmap(struct dma_buf *dbuf,
>  	if (vma->vm_ops->open)
>  		vma->vm_ops->open(vma);
>  
> -	return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> -			      virt_to_page(block->vaddr));
> +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> +		/*
> +		 * With an input buffer, userspace will only read the data and
> +		 * never write. We can mmap the buffer fully cached.
> +		 */
> +		return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> +				      virt_to_page(block->vaddr));
> +	} else {
> +		/*
> +		 * With an output buffer, userspace will only write the data
> +		 * and should rarely (if never) read from it. It is better to
> +		 * use write-combine in this case.
> +		 */
> +		return dma_mmap_wc(dev, vma, block->vaddr, block->phys_addr,
> +				   vma->vm_end - vma->vm_start);
> +	}
> +}
> +
> +static void iio_dma_buffer_free_dmamem(struct iio_dma_buffer_block *block)
> +{
> +	struct device *dev = block->queue->dev;
> +	size_t size = PAGE_ALIGN(block->size);
> +
> +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> +		dma_free_coherent(dev, size, block->vaddr, block->phys_addr);
> +	else
> +		dma_free_wc(dev, size, block->vaddr, block->phys_addr);
>  }
>  
>  static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
> @@ -243,9 +268,7 @@ static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
>  
>  	mutex_lock(&queue->lock);
>  
> -	dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
> -			  block->vaddr, block->phys_addr);
> -
> +	iio_dma_buffer_free_dmamem(block);
>  	kfree(block);
>  
>  	queue->num_blocks--;
> @@ -268,19 +291,6 @@ static int iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
>  	return 0;
>  }
>  
> -static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
> -					     enum dma_data_direction dma_dir)
> -{
> -	struct iio_dma_buffer_block *block = dbuf->priv;
> -	struct device *dev = block->queue->dev;
> -
> -	/* We only need to sync the cache for output buffers */
> -	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
> -		dma_sync_single_for_device(dev, block->phys_addr, block->size, dma_dir);
> -
> -	return 0;
> -}
> -
>  static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
>  	.attach			= iio_buffer_dma_buf_attach,
>  	.map_dma_buf		= iio_buffer_dma_buf_map,
> @@ -288,9 +298,28 @@ static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
>  	.mmap			= iio_buffer_dma_buf_mmap,
>  	.release		= iio_buffer_dma_buf_release,
>  	.begin_cpu_access	= iio_buffer_dma_buf_begin_cpu_access,
> -	.end_cpu_access		= iio_buffer_dma_buf_end_cpu_access,
>  };
>  
> +static int iio_dma_buffer_alloc_dmamem(struct iio_dma_buffer_block *block)
> +{
> +	struct device *dev = block->queue->dev;
> +	size_t size = PAGE_ALIGN(block->size);
> +
> +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> +		block->vaddr = dma_alloc_coherent(dev, size,
> +						  &block->phys_addr,
> +						  GFP_KERNEL);
> +	} else {
> +		block->vaddr = dma_alloc_wc(dev, size,
> +					    &block->phys_addr,
> +					    GFP_KERNEL);
> +	}
> +	if (!block->vaddr)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
>  static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>  	struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
>  {
> @@ -303,12 +332,12 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>  	if (!block)
>  		return ERR_PTR(-ENOMEM);
>  
> -	block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
> -		&block->phys_addr, GFP_KERNEL);
> -	if (!block->vaddr) {
> -		err = -ENOMEM;
> +	block->size = size;
> +	block->queue = queue;
> +
> +	err = iio_dma_buffer_alloc_dmamem(block);
> +	if (err)
>  		goto err_free_block;
> -	}
>  
>  	einfo.ops = &iio_dma_buffer_dmabuf_ops;
>  	einfo.size = PAGE_ALIGN(size);
> @@ -322,10 +351,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>  	}
>  
>  	block->dmabuf = dmabuf;
> -	block->size = size;
>  	block->bytes_used = size;
>  	block->state = IIO_BLOCK_STATE_DONE;
> -	block->queue = queue;
>  	block->fileio = fileio;
>  	INIT_LIST_HEAD(&block->head);
>  
> @@ -338,8 +365,7 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>  	return block;
>  
>  err_free_dma:
> -	dma_free_coherent(queue->dev, PAGE_ALIGN(size),
> -			  block->vaddr, block->phys_addr);
> +	iio_dma_buffer_free_dmamem(block);
>  err_free_block:
>  	kfree(block);
>  	return ERR_PTR(err);


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 15/15] Documentation: iio: Document high-speed DMABUF based API
  2021-11-15 14:22   ` [PATCH 15/15] Documentation: iio: Document high-speed DMABUF based API Paul Cercueil
@ 2021-11-21 15:10     ` Jonathan Cameron
  2021-11-21 17:46       ` Paul Cercueil
  0 siblings, 1 reply; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-21 15:10 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Mon, 15 Nov 2021 14:22:43 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> Document the new DMABUF based API.
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>

Hi Paul,

A few trivial things inline but looks good to me if we do end up using DMABUF
anyway.

Jonathan

> ---
>  Documentation/driver-api/dma-buf.rst |  2 +
>  Documentation/iio/dmabuf_api.rst     | 94 ++++++++++++++++++++++++++++
>  Documentation/iio/index.rst          |  2 +
>  3 files changed, 98 insertions(+)
>  create mode 100644 Documentation/iio/dmabuf_api.rst
> 
> diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> index 2cd7db82d9fe..d3c9b58d2706 100644
> --- a/Documentation/driver-api/dma-buf.rst
> +++ b/Documentation/driver-api/dma-buf.rst
> @@ -1,3 +1,5 @@
> +.. _dma-buf:
> +

Why this change?

>  Buffer Sharing and Synchronization
>  ==================================
>  
> diff --git a/Documentation/iio/dmabuf_api.rst b/Documentation/iio/dmabuf_api.rst
> new file mode 100644
> index 000000000000..b4e120a4ef0c
> --- /dev/null
> +++ b/Documentation/iio/dmabuf_api.rst
> @@ -0,0 +1,94 @@
> +===================================
> +High-speed DMABUF interface for IIO
> +===================================
> +
> +1. Overview
> +===========
> +
> +The Industrial I/O subsystem supports access to buffers through a file-based
> +interface, with read() and write() access calls through the IIO device's dev
> +node.
> +
> +It additionally supports a DMABUF based interface, where the userspace
> +application can allocate and append DMABUF objects to the buffer's queue.

I would note somewhere that this interface is optional for a given IIO driver.
I don't want people to start assuming their i2c ADC will support this and
wondering why it doesn't work :)

> +
> +The advantage of this DMABUF based interface vs. the fileio
> +interface, is that it avoids an extra copy of the data between the
> +kernel and userspace. This is particularly userful for high-speed
> +devices which produce several megabytes or even gigabytes of data per
> +second.
> +
> +The data in this DMABUF interface is managed at the granularity of
> +DMABUF objects. Reducing the granularity from byte level to block level
> +is done to reduce the userspace-kernelspace synchronization overhead
> +since performing syscalls for each byte at a few Mbps is just not
> +feasible.
> +
> +This of course leads to a slightly increased latency. For this reason an
> +application can choose the size of the DMABUFs as well as how many it
> +allocates. E.g. two DMABUFs would be a traditional double buffering
> +scheme. But using a higher number might be necessary to avoid
> +underflow/overflow situations in the presence of scheduling latencies.
> +
> +2. User API
> +===========
> +
> +``IIO_BUFFER_DMABUF_ALLOC_IOCTL(struct iio_dmabuf_alloc_req *)``
> +----------------------------------------------------------------
> +
> +Each call will allocate a new DMABUF object. The return value (if not
> +a negative errno value as error) will be the file descriptor of the new
> +DMABUF.
> +
> +``IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *)``
> +--------------------------------------------------------
> +
> +Place the DMABUF object into the queue pending for hardware process.
> +
> +These two IOCTLs have to be performed on the IIO buffer's file
> +descriptor (either opened from the corresponding /dev/iio:deviceX, or
> +obtained using the `IIO_BUFFER_GET_FD_IOCTL` ioctl).
> +
> +3. Usage
> +========
> +
> +To access the data stored in a block by userspace the block must be
> +mapped to the process's memory. This is done by calling mmap() on the
> +DMABUF's file descriptor.
> +
> +Before accessing the data through the map, you must use the
> +DMA_BUF_IOCTL_SYNC(struct dma_buf_sync *) ioctl, with the
> +DMA_BUF_SYNC_START flag, to make sure that the data is available.
> +This call may block until the hardware is done with this block. Once
> +you are done reading or writing the data, you must use this ioctl again
> +with the DMA_BUF_SYNC_END flag, before enqueueing the DMABUF to the
> +kernel's queue.
> +
> +If you need to know when the hardware is done with a DMABUF, you can
> +poll its file descriptor for the EPOLLOUT event.
> +
> +Finally, to destroy a DMABUF object, simply call close() on its file
> +descriptor.
> +
> +For more information about manipulating DMABUF objects, see: :ref:`dma-buf`.
> +
> +A typical workflow for the new interface is:
> +
> +    for block in blocks:
> +      DMABUF_ALLOC block
> +      mmap block
> +
> +    enable buffer
> +
> +    while !done
> +      for block in blocks:
> +        DMABUF_ENQUEUE block
> +
> +        DMABUF_SYNC_START block
> +        process data
> +        DMABUF_SYNC_END block
> +
> +    disable buffer
> +
> +    for block in blocks:
> +      close block
> diff --git a/Documentation/iio/index.rst b/Documentation/iio/index.rst
> index 58b7a4ebac51..9ce799fbf262 100644
> --- a/Documentation/iio/index.rst
> +++ b/Documentation/iio/index.rst
> @@ -10,3 +10,5 @@ Industrial I/O
>     iio_configfs
>  
>     ep93xx_adc
> +
> +   dmabuf_api

Given this is core stuff rather than driver specific, perhaps move it up a few lines?



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-15 14:19 ` [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues Paul Cercueil
  2021-11-16  8:23   ` Alexandru Ardelean
  2021-11-21 14:05   ` Jonathan Cameron
@ 2021-11-21 16:23   ` Lars-Peter Clausen
  2021-11-21 17:52     ` Paul Cercueil
  2 siblings, 1 reply; 59+ messages in thread
From: Lars-Peter Clausen @ 2021-11-21 16:23 UTC (permalink / raw)
  To: Paul Cercueil, Jonathan Cameron
  Cc: Alexandru Ardelean, Michael Hennerich, Sumit Semwal,
	Christian König, linux-iio, linux-kernel, linux-media,
	dri-devel, linaro-mm-sig

On 11/15/21 3:19 PM, Paul Cercueil wrote:
> The buffer-dma code was using two queues, incoming and outgoing, to
> manage the state of the blocks in use.
>
> While this totally works, it adds some complexity to the code,
> especially since the code only manages 2 blocks. It is much easier to
> just check each block's state manually, and keep a counter for the next
> block to dequeue.
>
> Since the new DMABUF based API wouldn't use these incoming and outgoing
> queues anyway, getting rid of them now makes the upcoming changes
> simpler.
>
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
The outgoing queue is going to be replaced by fences, but I think we 
need to keep the incoming queue.
> [...]
> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>   static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue *queue,
>   	struct iio_dma_buffer_block *block)
>   {
> -	if (block->state == IIO_BLOCK_STATE_DEAD) {
> +	if (block->state == IIO_BLOCK_STATE_DEAD)
>   		iio_buffer_block_put(block);
> -	} else if (queue->active) {
> +	else if (queue->active)
>   		iio_dma_buffer_submit_block(queue, block);
> -	} else {
> +	else
>   		block->state = IIO_BLOCK_STATE_QUEUED;
> -		list_add_tail(&block->head, &queue->incoming);
If iio_dma_buffer_enqueue() is called with a dmabuf and the buffer is 
not active, it will be marked as queued, but we don't actually keep a 
reference to it anywhere. It will never be submitted to the DMA, and it 
will never be signaled as completed.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/15] iio: buffer-dma: Enable buffer write support
  2021-11-21 14:20   ` Jonathan Cameron
@ 2021-11-21 17:19     ` Paul Cercueil
  2021-11-27 15:17       ` Jonathan Cameron
  0 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-21 17:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

Hi Jonathan,

Le dim., nov. 21 2021 at 14:20:49 +0000, Jonathan Cameron 
<jic23@kernel.org> a écrit :
> On Mon, 15 Nov 2021 14:19:14 +0000
> Paul Cercueil <paul@crapouillou.net> wrote:
> 
>>  Adding write support to the buffer-dma code is easy - the write()
>>  function basically needs to do the exact same thing as the read()
>>  function: dequeue a block, read or write the data, enqueue the block
>>  when entirely processed.
>> 
>>  Therefore, the iio_buffer_dma_read() and the new 
>> iio_buffer_dma_write()
>>  now both call a function iio_buffer_dma_io(), which will perform 
>> this
>>  task.
>> 
>>  The .space_available() callback can return the exact same value as 
>> the
>>  .data_available() callback for input buffers, since in both cases we
>>  count the exact same thing (the number of bytes in each available
>>  block).
>> 
>>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> Hi Paul,
> 
> There are a few changes in here, such as the bytes_used value being 
> set that
> I'm not following the reasoning behind. More info on those?
> Also good to provide something about those in this patch description.
> 
> Thanks,
> 
> Jonathan
> 
> 
>>  ---
>>   drivers/iio/buffer/industrialio-buffer-dma.c | 75 
>> +++++++++++++++-----
>>   include/linux/iio/buffer-dma.h               |  7 ++
>>   2 files changed, 66 insertions(+), 16 deletions(-)
>> 
>>  diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c 
>> b/drivers/iio/buffer/industrialio-buffer-dma.c
>>  index abac88f20104..eeeed6b2e0cf 100644
>>  --- a/drivers/iio/buffer/industrialio-buffer-dma.c
>>  +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
>>  @@ -179,7 +179,8 @@ static struct iio_dma_buffer_block 
>> *iio_dma_buffer_alloc_block(
>>   	}
>> 
>>   	block->size = size;
>>  -	block->state = IIO_BLOCK_STATE_DEQUEUED;
>>  +	block->bytes_used = size;
>>  +	block->state = IIO_BLOCK_STATE_DONE;
> 
> I don't know why these are here - some more info?

When using an input buffer the block->bytes_used is unconditionally 
reset in iio_dmaengine_buffer_submit_block(), so this was fine until 
now.

When using an output buffer the block->bytes_used can actually (with 
the new API) be specified by the user, so we don't want 
iio_dmaengine_buffer_submit_block() to unconditionally override it. 
Which means that in the case where we have an output buffer in fileio 
mode, we do need block->bytes_used to be initialized to the buffer's 
size since it won't be set anywhere else.

About the change in block->state: in patch [01/15] we removed the 
incoming/outgoing queues, and while the "enqueued" state is still 
useful to know which buffers have to be submitted when the buffer is 
enabled, the "dequeued" state is not useful anymore since there is no 
more distinction vs. the "done" state.

I believe this change should be moved to patch [01/15] then, and I 
should go further and remove the IIO_BLOCK_STATE_DEQUEUED completely.

>>   	block->queue = queue;
>>   	INIT_LIST_HEAD(&block->head);
>>   	kref_init(&block->kref);
>>  @@ -195,6 +196,18 @@ static void _iio_dma_buffer_block_done(struct 
>> iio_dma_buffer_block *block)
>>   		block->state = IIO_BLOCK_STATE_DONE;
>>   }
>> 
>>  +static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue 
>> *queue)
>>  +{
>>  +	__poll_t flags;
>>  +
>>  +	if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
>>  +		flags = EPOLLIN | EPOLLRDNORM;
>>  +	else
>>  +		flags = EPOLLOUT | EPOLLWRNORM;
>>  +
>>  +	wake_up_interruptible_poll(&queue->buffer.pollq, flags);
>>  +}
>>  +
>>   /**
>>    * iio_dma_buffer_block_done() - Indicate that a block has been 
>> completed
>>    * @block: The completed block
>>  @@ -212,7 +225,7 @@ void iio_dma_buffer_block_done(struct 
>> iio_dma_buffer_block *block)
>>   	spin_unlock_irqrestore(&queue->list_lock, flags);
>> 
>>   	iio_buffer_block_put_atomic(block);
>>  -	wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | 
>> EPOLLRDNORM);
>>  +	iio_dma_buffer_queue_wake(queue);
>>   }
>>   EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
>> 
>>  @@ -241,7 +254,7 @@ void iio_dma_buffer_block_list_abort(struct 
>> iio_dma_buffer_queue *queue,
>>   	}
>>   	spin_unlock_irqrestore(&queue->list_lock, flags);
>> 
>>  -	wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | 
>> EPOLLRDNORM);
>>  +	iio_dma_buffer_queue_wake(queue);
>>   }
>>   EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
>> 
>>  @@ -334,7 +347,8 @@ int iio_dma_buffer_request_update(struct 
>> iio_buffer *buffer)
>>   			queue->fileio.blocks[i] = block;
>>   		}
>> 
>>  -		block->state = IIO_BLOCK_STATE_QUEUED;
>>  +		if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
>>  +			block->state = IIO_BLOCK_STATE_QUEUED;
> 
> Possibly worth a comment on the state being set here.  I figured it 
> out, but might
> save some brain cells in future if it's stated in the code.

Ok.

>>   	}
>> 
>>   out_unlock:
>>  @@ -467,20 +481,12 @@ static struct iio_dma_buffer_block 
>> *iio_dma_buffer_dequeue(
>>   	return block;
>>   }
>> 
>>  -/**
>>  - * iio_dma_buffer_read() - DMA buffer read callback
>>  - * @buffer: Buffer to read form
>>  - * @n: Number of bytes to read
>>  - * @user_buffer: Userspace buffer to copy the data to
>>  - *
>>  - * Should be used as the read callback for iio_buffer_access_ops
>>  - * struct for DMA buffers.
>>  - */
>>  -int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>>  -	char __user *user_buffer)
>>  +static int iio_dma_buffer_io(struct iio_buffer *buffer,
>>  +			     size_t n, char __user *user_buffer, bool is_write)
>>   {
>>   	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
>>   	struct iio_dma_buffer_block *block;
>>  +	void *addr;
>>   	int ret;
>> 
>>   	if (n < buffer->bytes_per_datum)
>>  @@ -503,8 +509,13 @@ int iio_dma_buffer_read(struct iio_buffer 
>> *buffer, size_t n,
>>   	n = rounddown(n, buffer->bytes_per_datum);
>>   	if (n > block->bytes_used - queue->fileio.pos)
>>   		n = block->bytes_used - queue->fileio.pos;
>>  +	addr = block->vaddr + queue->fileio.pos;
>> 
>>  -	if (copy_to_user(user_buffer, block->vaddr + queue->fileio.pos, 
>> n)) {
>>  +	if (is_write)
>>  +		ret = !!copy_from_user(addr, user_buffer, n);
>>  +	else
>>  +		ret = !!copy_to_user(user_buffer, addr, n);
> 
> What is the !! gaining us here?  We only care about == 0 vs != 0 so
> forcing it to be 0 or 1 isn't useful.

Right.

>>  +	if (ret) {
>>   		ret = -EFAULT;
>>   		goto out_unlock;
>>   	}
>>  @@ -513,6 +524,7 @@ int iio_dma_buffer_read(struct iio_buffer 
>> *buffer, size_t n,
>> 
>>   	if (queue->fileio.pos == block->bytes_used) {
>>   		queue->fileio.active_block = NULL;
>>  +		block->bytes_used = block->size;
> 
> This seems to be a functional change that isn't called out in the 
> patch description.

See the explanation above. Although I most likely don't need to set it 
at two different spots... I'll check that in detail next week.

Cheers,
-Paul

>>   		iio_dma_buffer_enqueue(queue, block);
>>   	}
>> 
>>  @@ -523,8 +535,39 @@ int iio_dma_buffer_read(struct iio_buffer 
>> *buffer, size_t n,
>> 
>>   	return ret;
>>   }
>>  +
>>  +/**
>>  + * iio_dma_buffer_read() - DMA buffer read callback
>>  + * @buffer: Buffer to read form
>>  + * @n: Number of bytes to read
>>  + * @user_buffer: Userspace buffer to copy the data to
>>  + *
>>  + * Should be used as the read callback for iio_buffer_access_ops
>>  + * struct for DMA buffers.
>>  + */
>>  +int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>>  +	char __user *user_buffer)
>>  +{
>>  +	return iio_dma_buffer_io(buffer, n, user_buffer, false);
>>  +}
>>   EXPORT_SYMBOL_GPL(iio_dma_buffer_read);
>> 
>>  +/**
>>  + * iio_dma_buffer_write() - DMA buffer write callback
>>  + * @buffer: Buffer to read form
>>  + * @n: Number of bytes to read
>>  + * @user_buffer: Userspace buffer to copy the data from
>>  + *
>>  + * Should be used as the write callback for iio_buffer_access_ops
>>  + * struct for DMA buffers.
>>  + */
>>  +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
>>  +			 const char __user *user_buffer)
>>  +{
>>  +	return iio_dma_buffer_io(buffer, n, (__force char *)user_buffer, 
>> true);
>>  +}
>>  +EXPORT_SYMBOL_GPL(iio_dma_buffer_write);
>>  +
>>   /**
>>    * iio_dma_buffer_data_available() - DMA buffer data_available 
>> callback
>>    * @buf: Buffer to check for data availability
>>  diff --git a/include/linux/iio/buffer-dma.h 
>> b/include/linux/iio/buffer-dma.h
>>  index a65a005c4a19..09c07d5563c0 100644
>>  --- a/include/linux/iio/buffer-dma.h
>>  +++ b/include/linux/iio/buffer-dma.h
>>  @@ -132,6 +132,8 @@ int iio_dma_buffer_disable(struct iio_buffer 
>> *buffer,
>>   	struct iio_dev *indio_dev);
>>   int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
>>   	char __user *user_buffer);
>>  +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
>>  +			 const char __user *user_buffer);
>>   size_t iio_dma_buffer_data_available(struct iio_buffer *buffer);
>>   int iio_dma_buffer_set_bytes_per_datum(struct iio_buffer *buffer, 
>> size_t bpd);
>>   int iio_dma_buffer_set_length(struct iio_buffer *buffer, unsigned 
>> int length);
>>  @@ -142,4 +144,9 @@ int iio_dma_buffer_init(struct 
>> iio_dma_buffer_queue *queue,
>>   void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
>>   void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);
>> 
>>  +static inline size_t iio_dma_buffer_space_available(struct 
>> iio_buffer *buffer)
>>  +{
>>  +	return iio_dma_buffer_data_available(buffer);
>>  +}
>>  +
>>   #endif
> 



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting
  2021-11-21 15:00   ` Jonathan Cameron
@ 2021-11-21 17:43     ` Paul Cercueil
  2021-11-25 17:29       ` Paul Cercueil
  2021-11-27 15:20       ` Jonathan Cameron
  0 siblings, 2 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-21 17:43 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

Hi Jonathan,

Le dim., nov. 21 2021 at 15:00:37 +0000, Jonathan Cameron 
<jic23@kernel.org> a écrit :
> On Mon, 15 Nov 2021 14:19:21 +0000
> Paul Cercueil <paul@crapouillou.net> wrote:
> 
>>  We can be certain that the input buffers will only be accessed by
>>  userspace for reading, and output buffers will mostly be accessed by
>>  userspace for writing.
> 
> Mostly?  Perhaps a little more info on why that's not 'only'.

Just like with a framebuffer, it really depends on what the application 
does. Most of the cases it will just read sequentially an input buffer, 
or write sequentially an output buffer. But then you get the exotic 
application that will try to do something like alpha blending, which 
means read+write. Hence "mostly".

>> 
>>  Therefore, it makes more sense to use only fully cached input 
>> buffers,
>>  and to use the write-combine cache coherency setting for output 
>> buffers.
>> 
>>  This boosts performance, as the data written to the output buffers 
>> does
>>  not have to be sync'd for coherency. It will halve performance if 
>> the
>>  userspace application tries to read from the output buffer, but this
>>  should never happen.
>> 
>>  Since we don't need to sync the cache when disabling CPU access 
>> either
>>  for input buffers or output buffers, the .end_cpu_access() callback 
>> can
>>  be dropped completely.
> 
> We have an odd mix of coherent and non coherent DMA in here as you 
> noted,
> but are you sure this is safe on all platforms?

The mix isn't safe, but using only coherent or only non-coherent should 
be safe, yes.

> 
>> 
>>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> 
> Any numbers to support this patch?  The mapping types are performance
> optimisations so nice to know how much of a difference they make.

Output buffers are definitely faster in write-combine mode. On a 
ZedBoard with a AD9361 transceiver set to 66 MSPS, and buffer/size set 
to 8192, I would get about 185 MiB/s before, 197 MiB/s after.

Input buffers... early results are mixed. On ARM32 it does look like it 
is slightly faster to read from *uncached* memory than reading from 
cached memory. The cache sync does take a long time.

Other architectures might have a different result, for instance on MIPS 
invalidating the cache is a very fast operation, so using cached 
buffers would be a huge win in performance.

Setups where the DMA operations are coherent also wouldn't require any 
cache sync and this patch would give a huge win in performance.

I'll run some more tests next week to have some fresh numbers.

Cheers,
-Paul

>>  ---
>>   drivers/iio/buffer/industrialio-buffer-dma.c | 82 
>> +++++++++++++-------
>>   1 file changed, 54 insertions(+), 28 deletions(-)
>> 
>>  diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c 
>> b/drivers/iio/buffer/industrialio-buffer-dma.c
>>  index 92356ee02f30..fb39054d8c15 100644
>>  --- a/drivers/iio/buffer/industrialio-buffer-dma.c
>>  +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
>>  @@ -229,8 +229,33 @@ static int iio_buffer_dma_buf_mmap(struct 
>> dma_buf *dbuf,
>>   	if (vma->vm_ops->open)
>>   		vma->vm_ops->open(vma);
>> 
>>  -	return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
>>  -			      virt_to_page(block->vaddr));
>>  +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
>>  +		/*
>>  +		 * With an input buffer, userspace will only read the data and
>>  +		 * never write. We can mmap the buffer fully cached.
>>  +		 */
>>  +		return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
>>  +				      virt_to_page(block->vaddr));
>>  +	} else {
>>  +		/*
>>  +		 * With an output buffer, userspace will only write the data
>>  +		 * and should rarely (if never) read from it. It is better to
>>  +		 * use write-combine in this case.
>>  +		 */
>>  +		return dma_mmap_wc(dev, vma, block->vaddr, block->phys_addr,
>>  +				   vma->vm_end - vma->vm_start);
>>  +	}
>>  +}
>>  +
>>  +static void iio_dma_buffer_free_dmamem(struct iio_dma_buffer_block 
>> *block)
>>  +{
>>  +	struct device *dev = block->queue->dev;
>>  +	size_t size = PAGE_ALIGN(block->size);
>>  +
>>  +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
>>  +		dma_free_coherent(dev, size, block->vaddr, block->phys_addr);
>>  +	else
>>  +		dma_free_wc(dev, size, block->vaddr, block->phys_addr);
>>   }
>> 
>>   static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
>>  @@ -243,9 +268,7 @@ static void iio_buffer_dma_buf_release(struct 
>> dma_buf *dbuf)
>> 
>>   	mutex_lock(&queue->lock);
>> 
>>  -	dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
>>  -			  block->vaddr, block->phys_addr);
>>  -
>>  +	iio_dma_buffer_free_dmamem(block);
>>   	kfree(block);
>> 
>>   	queue->num_blocks--;
>>  @@ -268,19 +291,6 @@ static int 
>> iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
>>   	return 0;
>>   }
>> 
>>  -static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
>>  -					     enum dma_data_direction dma_dir)
>>  -{
>>  -	struct iio_dma_buffer_block *block = dbuf->priv;
>>  -	struct device *dev = block->queue->dev;
>>  -
>>  -	/* We only need to sync the cache for output buffers */
>>  -	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
>>  -		dma_sync_single_for_device(dev, block->phys_addr, block->size, 
>> dma_dir);
>>  -
>>  -	return 0;
>>  -}
>>  -
>>   static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
>>   	.attach			= iio_buffer_dma_buf_attach,
>>   	.map_dma_buf		= iio_buffer_dma_buf_map,
>>  @@ -288,9 +298,28 @@ static const struct dma_buf_ops 
>> iio_dma_buffer_dmabuf_ops = {
>>   	.mmap			= iio_buffer_dma_buf_mmap,
>>   	.release		= iio_buffer_dma_buf_release,
>>   	.begin_cpu_access	= iio_buffer_dma_buf_begin_cpu_access,
>>  -	.end_cpu_access		= iio_buffer_dma_buf_end_cpu_access,
>>   };
>> 
>>  +static int iio_dma_buffer_alloc_dmamem(struct iio_dma_buffer_block 
>> *block)
>>  +{
>>  +	struct device *dev = block->queue->dev;
>>  +	size_t size = PAGE_ALIGN(block->size);
>>  +
>>  +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
>>  +		block->vaddr = dma_alloc_coherent(dev, size,
>>  +						  &block->phys_addr,
>>  +						  GFP_KERNEL);
>>  +	} else {
>>  +		block->vaddr = dma_alloc_wc(dev, size,
>>  +					    &block->phys_addr,
>>  +					    GFP_KERNEL);
>>  +	}
>>  +	if (!block->vaddr)
>>  +		return -ENOMEM;
>>  +
>>  +	return 0;
>>  +}
>>  +
>>   static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
>>   	struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
>>   {
>>  @@ -303,12 +332,12 @@ static struct iio_dma_buffer_block 
>> *iio_dma_buffer_alloc_block(
>>   	if (!block)
>>   		return ERR_PTR(-ENOMEM);
>> 
>>  -	block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
>>  -		&block->phys_addr, GFP_KERNEL);
>>  -	if (!block->vaddr) {
>>  -		err = -ENOMEM;
>>  +	block->size = size;
>>  +	block->queue = queue;
>>  +
>>  +	err = iio_dma_buffer_alloc_dmamem(block);
>>  +	if (err)
>>   		goto err_free_block;
>>  -	}
>> 
>>   	einfo.ops = &iio_dma_buffer_dmabuf_ops;
>>   	einfo.size = PAGE_ALIGN(size);
>>  @@ -322,10 +351,8 @@ static struct iio_dma_buffer_block 
>> *iio_dma_buffer_alloc_block(
>>   	}
>> 
>>   	block->dmabuf = dmabuf;
>>  -	block->size = size;
>>   	block->bytes_used = size;
>>   	block->state = IIO_BLOCK_STATE_DONE;
>>  -	block->queue = queue;
>>   	block->fileio = fileio;
>>   	INIT_LIST_HEAD(&block->head);
>> 
>>  @@ -338,8 +365,7 @@ static struct iio_dma_buffer_block 
>> *iio_dma_buffer_alloc_block(
>>   	return block;
>> 
>>   err_free_dma:
>>  -	dma_free_coherent(queue->dev, PAGE_ALIGN(size),
>>  -			  block->vaddr, block->phys_addr);
>>  +	iio_dma_buffer_free_dmamem(block);
>>   err_free_block:
>>   	kfree(block);
>>   	return ERR_PTR(err);
> 



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 15/15] Documentation: iio: Document high-speed DMABUF based API
  2021-11-21 15:10     ` Jonathan Cameron
@ 2021-11-21 17:46       ` Paul Cercueil
  0 siblings, 0 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-21 17:46 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

Hi Jonathan,

Le dim., nov. 21 2021 at 15:10:26 +0000, Jonathan Cameron 
<jic23@kernel.org> a écrit :
> On Mon, 15 Nov 2021 14:22:43 +0000
> Paul Cercueil <paul@crapouillou.net> wrote:
> 
>>  Document the new DMABUF based API.
>> 
>>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> 
> Hi Paul,
> 
> A few trivial things inline but looks good to me if we do end up 
> using DMABUF
> anyway.
> 
> Jonathan
> 
>>  ---
>>   Documentation/driver-api/dma-buf.rst |  2 +
>>   Documentation/iio/dmabuf_api.rst     | 94 
>> ++++++++++++++++++++++++++++
>>   Documentation/iio/index.rst          |  2 +
>>   3 files changed, 98 insertions(+)
>>   create mode 100644 Documentation/iio/dmabuf_api.rst
>> 
>>  diff --git a/Documentation/driver-api/dma-buf.rst 
>> b/Documentation/driver-api/dma-buf.rst
>>  index 2cd7db82d9fe..d3c9b58d2706 100644
>>  --- a/Documentation/driver-api/dma-buf.rst
>>  +++ b/Documentation/driver-api/dma-buf.rst
>>  @@ -1,3 +1,5 @@
>>  +.. _dma-buf:
>>  +
> 
> Why this change?

I have this line in the file:
For more information about manipulating DMABUF objects, see: 
:ref:`dma-buf`.

For the :ref: to work I need a label at the reference point, if I 
understood correctly.

>>   Buffer Sharing and Synchronization
>>   ==================================
>> 
>>  diff --git a/Documentation/iio/dmabuf_api.rst 
>> b/Documentation/iio/dmabuf_api.rst
>>  new file mode 100644
>>  index 000000000000..b4e120a4ef0c
>>  --- /dev/null
>>  +++ b/Documentation/iio/dmabuf_api.rst
>>  @@ -0,0 +1,94 @@
>>  +===================================
>>  +High-speed DMABUF interface for IIO
>>  +===================================
>>  +
>>  +1. Overview
>>  +===========
>>  +
>>  +The Industrial I/O subsystem supports access to buffers through a 
>> file-based
>>  +interface, with read() and write() access calls through the IIO 
>> device's dev
>>  +node.
>>  +
>>  +It additionally supports a DMABUF based interface, where the 
>> userspace
>>  +application can allocate and append DMABUF objects to the buffer's 
>> queue.
> 
> I would note somewhere that this interface is optional for a given 
> IIO driver.
> I don't want people to start assuming their i2c ADC will support this 
> and
> wondering why it doesn't work :)

Their I2C ADC will support it, as long as the driver supports the 
dmaengine buffer interface. I can make that explicit, yes.

>>  +
>>  +The advantage of this DMABUF based interface vs. the fileio
>>  +interface, is that it avoids an extra copy of the data between the
>>  +kernel and userspace. This is particularly userful for high-speed
>>  +devices which produce several megabytes or even gigabytes of data 
>> per
>>  +second.
>>  +
>>  +The data in this DMABUF interface is managed at the granularity of
>>  +DMABUF objects. Reducing the granularity from byte level to block 
>> level
>>  +is done to reduce the userspace-kernelspace synchronization 
>> overhead
>>  +since performing syscalls for each byte at a few Mbps is just not
>>  +feasible.
>>  +
>>  +This of course leads to a slightly increased latency. For this 
>> reason an
>>  +application can choose the size of the DMABUFs as well as how many 
>> it
>>  +allocates. E.g. two DMABUFs would be a traditional double buffering
>>  +scheme. But using a higher number might be necessary to avoid
>>  +underflow/overflow situations in the presence of scheduling 
>> latencies.
>>  +
>>  +2. User API
>>  +===========
>>  +
>>  +``IIO_BUFFER_DMABUF_ALLOC_IOCTL(struct iio_dmabuf_alloc_req *)``
>>  +----------------------------------------------------------------
>>  +
>>  +Each call will allocate a new DMABUF object. The return value (if 
>> not
>>  +a negative errno value as error) will be the file descriptor of 
>> the new
>>  +DMABUF.
>>  +
>>  +``IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *)``
>>  +--------------------------------------------------------
>>  +
>>  +Place the DMABUF object into the queue pending for hardware 
>> process.
>>  +
>>  +These two IOCTLs have to be performed on the IIO buffer's file
>>  +descriptor (either opened from the corresponding /dev/iio:deviceX, 
>> or
>>  +obtained using the `IIO_BUFFER_GET_FD_IOCTL` ioctl).
>>  +
>>  +3. Usage
>>  +========
>>  +
>>  +To access the data stored in a block by userspace the block must be
>>  +mapped to the process's memory. This is done by calling mmap() on 
>> the
>>  +DMABUF's file descriptor.
>>  +
>>  +Before accessing the data through the map, you must use the
>>  +DMA_BUF_IOCTL_SYNC(struct dma_buf_sync *) ioctl, with the
>>  +DMA_BUF_SYNC_START flag, to make sure that the data is available.
>>  +This call may block until the hardware is done with this block. 
>> Once
>>  +you are done reading or writing the data, you must use this ioctl 
>> again
>>  +with the DMA_BUF_SYNC_END flag, before enqueueing the DMABUF to the
>>  +kernel's queue.
>>  +
>>  +If you need to know when the hardware is done with a DMABUF, you 
>> can
>>  +poll its file descriptor for the EPOLLOUT event.
>>  +
>>  +Finally, to destroy a DMABUF object, simply call close() on its 
>> file
>>  +descriptor.
>>  +
>>  +For more information about manipulating DMABUF objects, see: 
>> :ref:`dma-buf`.
>>  +
>>  +A typical workflow for the new interface is:
>>  +
>>  +    for block in blocks:
>>  +      DMABUF_ALLOC block
>>  +      mmap block
>>  +
>>  +    enable buffer
>>  +
>>  +    while !done
>>  +      for block in blocks:
>>  +        DMABUF_ENQUEUE block
>>  +
>>  +        DMABUF_SYNC_START block
>>  +        process data
>>  +        DMABUF_SYNC_END block
>>  +
>>  +    disable buffer
>>  +
>>  +    for block in blocks:
>>  +      close block
>>  diff --git a/Documentation/iio/index.rst 
>> b/Documentation/iio/index.rst
>>  index 58b7a4ebac51..9ce799fbf262 100644
>>  --- a/Documentation/iio/index.rst
>>  +++ b/Documentation/iio/index.rst
>>  @@ -10,3 +10,5 @@ Industrial I/O
>>      iio_configfs
>> 
>>      ep93xx_adc
>>  +
>>  +   dmabuf_api
> 
> Given this is core stuff rather than driver specific, perhaps move it 
> up a few lines?

Alright.

Cheers,
-Paul



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-21 16:23   ` Lars-Peter Clausen
@ 2021-11-21 17:52     ` Paul Cercueil
  2021-11-21 18:49       ` Lars-Peter Clausen
  0 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-21 17:52 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: Jonathan Cameron, Alexandru Ardelean, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

Hi Lars,

Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen 
<lars@metafoo.de> a écrit :
> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>> The buffer-dma code was using two queues, incoming and outgoing, to
>> manage the state of the blocks in use.
>> 
>> While this totally works, it adds some complexity to the code,
>> especially since the code only manages 2 blocks. It is much easier to
>> just check each block's state manually, and keep a counter for the 
>> next
>> block to dequeue.
>> 
>> Since the new DMABUF based API wouldn't use these incoming and 
>> outgoing
>> queues anyway, getting rid of them now makes the upcoming changes
>> simpler.
>> 
>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> The outgoing queue is going to be replaced by fences, but I think we 
> need to keep the incoming queue.

Blocks are always accessed in sequential order, so we now have a 
"queue->next_dequeue" that cycles between the buffers allocated for 
fileio.

>> [...]
>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>   static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue 
>> *queue,
>>   	struct iio_dma_buffer_block *block)
>>   {
>> -	if (block->state == IIO_BLOCK_STATE_DEAD) {
>> +	if (block->state == IIO_BLOCK_STATE_DEAD)
>>   		iio_buffer_block_put(block);
>> -	} else if (queue->active) {
>> +	else if (queue->active)
>>   		iio_dma_buffer_submit_block(queue, block);
>> -	} else {
>> +	else
>>   		block->state = IIO_BLOCK_STATE_QUEUED;
>> -		list_add_tail(&block->head, &queue->incoming);
> If iio_dma_buffer_enqueue() is called with a dmabuf and the buffer is 
> not active, it will be marked as queued, but we don't actually keep a 
> reference to it anywhere. It will never be submitted to the DMA, and 
> it will never be signaled as completed.

We do keep a reference to the buffers, in the queue->fileio.blocks 
array. When the buffer is enabled, all the blocks in that array that 
are in the "queued" state will be submitted to the DMA.

Cheers,
-Paul



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-21 17:52     ` Paul Cercueil
@ 2021-11-21 18:49       ` Lars-Peter Clausen
  2021-11-21 20:08         ` Paul Cercueil
  0 siblings, 1 reply; 59+ messages in thread
From: Lars-Peter Clausen @ 2021-11-21 18:49 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Alexandru Ardelean, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On 11/21/21 6:52 PM, Paul Cercueil wrote:
> Hi Lars,
>
> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen 
> <lars@metafoo.de> a écrit :
>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>> The buffer-dma code was using two queues, incoming and outgoing, to
>>> manage the state of the blocks in use.
>>>
>>> While this totally works, it adds some complexity to the code,
>>> especially since the code only manages 2 blocks. It is much easier to
>>> just check each block's state manually, and keep a counter for the next
>>> block to dequeue.
>>>
>>> Since the new DMABUF based API wouldn't use these incoming and outgoing
>>> queues anyway, getting rid of them now makes the upcoming changes
>>> simpler.
>>>
>>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>> The outgoing queue is going to be replaced by fences, but I think we 
>> need to keep the incoming queue.
>
> Blocks are always accessed in sequential order, so we now have a 
> "queue->next_dequeue" that cycles between the buffers allocated for 
> fileio.
>
>>> [...]
>>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>   static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue 
>>> *queue,
>>>       struct iio_dma_buffer_block *block)
>>>   {
>>> -    if (block->state == IIO_BLOCK_STATE_DEAD) {
>>> +    if (block->state == IIO_BLOCK_STATE_DEAD)
>>>           iio_buffer_block_put(block);
>>> -    } else if (queue->active) {
>>> +    else if (queue->active)
>>>           iio_dma_buffer_submit_block(queue, block);
>>> -    } else {
>>> +    else
>>>           block->state = IIO_BLOCK_STATE_QUEUED;
>>> -        list_add_tail(&block->head, &queue->incoming);
>> If iio_dma_buffer_enqueue() is called with a dmabuf and the buffer is 
>> not active, it will be marked as queued, but we don't actually keep a 
>> reference to it anywhere. It will never be submitted to the DMA, and 
>> it will never be signaled as completed.
>
> We do keep a reference to the buffers, in the queue->fileio.blocks 
> array. When the buffer is enabled, all the blocks in that array that 
> are in the "queued" state will be submitted to the DMA.
>
But not when used in combination with the DMA buf changes later in this 
series.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-21 18:49       ` Lars-Peter Clausen
@ 2021-11-21 20:08         ` Paul Cercueil
  2021-11-22 15:08           ` Lars-Peter Clausen
  0 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-21 20:08 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: Jonathan Cameron, Alexandru Ardelean, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig



Le dim., nov. 21 2021 at 19:49:03 +0100, Lars-Peter Clausen 
<lars@metafoo.de> a écrit :
> On 11/21/21 6:52 PM, Paul Cercueil wrote:
>> Hi Lars,
>> 
>> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen 
>> \x7f<lars@metafoo.de> a écrit :
>>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>>> The buffer-dma code was using two queues, incoming and outgoing, to
>>>> manage the state of the blocks in use.
>>>> 
>>>> While this totally works, it adds some complexity to the code,
>>>> especially since the code only manages 2 blocks. It is much easier 
>>>> to
>>>> just check each block's state manually, and keep a counter for the 
>>>> next
>>>> block to dequeue.
>>>> 
>>>> Since the new DMABUF based API wouldn't use these incoming and 
>>>> outgoing
>>>> queues anyway, getting rid of them now makes the upcoming changes
>>>> simpler.
>>>> 
>>>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>>> The outgoing queue is going to be replaced by fences, but I think 
>>> we \x7f\x7fneed to keep the incoming queue.
>> 
>> Blocks are always accessed in sequential order, so we now have a 
>> \x7f"queue->next_dequeue" that cycles between the buffers allocated for 
>> \x7ffileio.
>> 
>>>> [...]
>>>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>>   static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue 
>>>> \x7f\x7f\x7f*queue,
>>>>       struct iio_dma_buffer_block *block)
>>>>   {
>>>> -    if (block->state == IIO_BLOCK_STATE_DEAD) {
>>>> +    if (block->state == IIO_BLOCK_STATE_DEAD)
>>>>           iio_buffer_block_put(block);
>>>> -    } else if (queue->active) {
>>>> +    else if (queue->active)
>>>>           iio_dma_buffer_submit_block(queue, block);
>>>> -    } else {
>>>> +    else
>>>>           block->state = IIO_BLOCK_STATE_QUEUED;
>>>> -        list_add_tail(&block->head, &queue->incoming);
>>> If iio_dma_buffer_enqueue() is called with a dmabuf and the buffer 
>>> is \x7f\x7fnot active, it will be marked as queued, but we don't actually 
>>> keep a \x7f\x7freference to it anywhere. It will never be submitted to 
>>> the DMA, and \x7f\x7fit will never be signaled as completed.
>> 
>> We do keep a reference to the buffers, in the queue->fileio.blocks 
>> \x7farray. When the buffer is enabled, all the blocks in that array 
>> that \x7fare in the "queued" state will be submitted to the DMA.
>> 
> But not when used in combination with the DMA buf changes later in 
> this series.
> 

That's still the case after the DMABUF changes of the series. Or can 
you point me exactly what you think is broken?

-Paul



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown()
  2021-11-21 14:08   ` Jonathan Cameron
@ 2021-11-22 10:00     ` Paul Cercueil
  2021-11-27 15:15       ` Jonathan Cameron
  0 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-22 10:00 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

Hi Jonathan,

Le dim., nov. 21 2021 at 14:08:23 +0000, Jonathan Cameron 
<jic23@kernel.org> a écrit :
> On Mon, 15 Nov 2021 14:19:13 +0000
> Paul Cercueil <paul@crapouillou.net> wrote:
> 
>>  We know that the buffer's alignment will always be a power of two;
>>  therefore, we can use the faster round_down() macro.
>> 
>>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> *groan*.  I don't want to know where the naming of these two came 
> from but that
> is spectacular...
> 
> Anyhow, happy to pick up 1-3 now if you like as all are good cleanup 
> of
> existing code.

I think you can pick 2-3 now; I will do some changes to patch [01/15] 
in the V2.

Cheers,
-Paul



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-21 20:08         ` Paul Cercueil
@ 2021-11-22 15:08           ` Lars-Peter Clausen
  2021-11-22 15:16             ` Paul Cercueil
  0 siblings, 1 reply; 59+ messages in thread
From: Lars-Peter Clausen @ 2021-11-22 15:08 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Alexandru Ardelean, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On 11/21/21 9:08 PM, Paul Cercueil wrote:
>
>
> Le dim., nov. 21 2021 at 19:49:03 +0100, Lars-Peter Clausen 
> <lars@metafoo.de> a écrit :
>> On 11/21/21 6:52 PM, Paul Cercueil wrote:
>>> Hi Lars,
>>>
>>> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen 
>>> \x7f<lars@metafoo.de> a écrit :
>>>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>>>> The buffer-dma code was using two queues, incoming and outgoing, to
>>>>> manage the state of the blocks in use.
>>>>>
>>>>> While this totally works, it adds some complexity to the code,
>>>>> especially since the code only manages 2 blocks. It is much easier to
>>>>> just check each block's state manually, and keep a counter for the 
>>>>> next
>>>>> block to dequeue.
>>>>>
>>>>> Since the new DMABUF based API wouldn't use these incoming and 
>>>>> outgoing
>>>>> queues anyway, getting rid of them now makes the upcoming changes
>>>>> simpler.
>>>>>
>>>>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>>>> The outgoing queue is going to be replaced by fences, but I think 
>>>> we \x7f\x7fneed to keep the incoming queue.
>>>
>>> Blocks are always accessed in sequential order, so we now have a 
>>> \x7f"queue->next_dequeue" that cycles between the buffers allocated for 
>>> \x7ffileio.
>>>
>>>>> [...]
>>>>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>>>   static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue 
>>>>> \x7f\x7f\x7f*queue,
>>>>>       struct iio_dma_buffer_block *block)
>>>>>   {
>>>>> -    if (block->state == IIO_BLOCK_STATE_DEAD) {
>>>>> +    if (block->state == IIO_BLOCK_STATE_DEAD)
>>>>>           iio_buffer_block_put(block);
>>>>> -    } else if (queue->active) {
>>>>> +    else if (queue->active)
>>>>>           iio_dma_buffer_submit_block(queue, block);
>>>>> -    } else {
>>>>> +    else
>>>>>           block->state = IIO_BLOCK_STATE_QUEUED;
>>>>> -        list_add_tail(&block->head, &queue->incoming);
>>>> If iio_dma_buffer_enqueue() is called with a dmabuf and the buffer 
>>>> is \x7f\x7fnot active, it will be marked as queued, but we don't actually 
>>>> keep a \x7f\x7freference to it anywhere. It will never be submitted to 
>>>> the DMA, and \x7f\x7fit will never be signaled as completed.
>>>
>>> We do keep a reference to the buffers, in the queue->fileio.blocks 
>>> \x7farray. When the buffer is enabled, all the blocks in that array 
>>> that \x7fare in the "queued" state will be submitted to the DMA.
>>>
>> But not when used in combination with the DMA buf changes later in 
>> this series.
>>
>
> That's still the case after the DMABUF changes of the series. Or can 
> you point me exactly what you think is broken?
>
When you allocate a DMABUF with the allocate IOCTL and then submit it 
with the enqueue IOCTL before the buffer is enabled it will end up 
marked as queued, but not actually be queued anywhere.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-22 15:08           ` Lars-Peter Clausen
@ 2021-11-22 15:16             ` Paul Cercueil
  2021-11-22 15:17               ` Lars-Peter Clausen
  0 siblings, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-22 15:16 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: Jonathan Cameron, Alexandru Ardelean, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

Hi Lars,

Le lun., nov. 22 2021 at 16:08:51 +0100, Lars-Peter Clausen 
<lars@metafoo.de> a écrit :
> On 11/21/21 9:08 PM, Paul Cercueil wrote:
>> 
>> 
>> Le dim., nov. 21 2021 at 19:49:03 +0100, Lars-Peter Clausen 
>> \x7f<lars@metafoo.de> a écrit :
>>> On 11/21/21 6:52 PM, Paul Cercueil wrote:
>>>> Hi Lars,
>>>> 
>>>> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen 
>>>> \x7f\x7f\x7f\x7f<lars@metafoo.de> a écrit :
>>>>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>>>>> The buffer-dma code was using two queues, incoming and outgoing, 
>>>>>> to
>>>>>> manage the state of the blocks in use.
>>>>>> 
>>>>>> While this totally works, it adds some complexity to the code,
>>>>>> especially since the code only manages 2 blocks. It is much 
>>>>>> easier to
>>>>>> just check each block's state manually, and keep a counter for 
>>>>>> the \x7f\x7f\x7f\x7f\x7fnext
>>>>>> block to dequeue.
>>>>>> 
>>>>>> Since the new DMABUF based API wouldn't use these incoming and 
>>>>>> \x7f\x7f\x7f\x7f\x7foutgoing
>>>>>> queues anyway, getting rid of them now makes the upcoming changes
>>>>>> simpler.
>>>>>> 
>>>>>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>>>>> The outgoing queue is going to be replaced by fences, but I think 
>>>>> \x7f\x7f\x7f\x7fwe \x7f\x7fneed to keep the incoming queue.
>>>> 
>>>> Blocks are always accessed in sequential order, so we now have a 
>>>> \x7f\x7f\x7f\x7f"queue->next_dequeue" that cycles between the buffers 
>>>> allocated for \x7f\x7f\x7f\x7ffileio.
>>>> 
>>>>>> [...]
>>>>>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>>>>   static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue 
>>>>>> \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f*queue,
>>>>>>       struct iio_dma_buffer_block *block)
>>>>>>   {
>>>>>> -    if (block->state == IIO_BLOCK_STATE_DEAD) {
>>>>>> +    if (block->state == IIO_BLOCK_STATE_DEAD)
>>>>>>           iio_buffer_block_put(block);
>>>>>> -    } else if (queue->active) {
>>>>>> +    else if (queue->active)
>>>>>>           iio_dma_buffer_submit_block(queue, block);
>>>>>> -    } else {
>>>>>> +    else
>>>>>>           block->state = IIO_BLOCK_STATE_QUEUED;
>>>>>> -        list_add_tail(&block->head, &queue->incoming);
>>>>> If iio_dma_buffer_enqueue() is called with a dmabuf and the 
>>>>> buffer \x7f\x7f\x7f\x7fis \x7f\x7fnot active, it will be marked as queued, but we 
>>>>> don't actually \x7f\x7f\x7f\x7fkeep a \x7f\x7freference to it anywhere. It will 
>>>>> never be submitted to \x7f\x7f\x7f\x7fthe DMA, and \x7f\x7fit will never be 
>>>>> signaled as completed.
>>>> 
>>>> We do keep a reference to the buffers, in the queue->fileio.blocks 
>>>> \x7f\x7f\x7f\x7farray. When the buffer is enabled, all the blocks in that 
>>>> array \x7f\x7f\x7fthat \x7fare in the "queued" state will be submitted to the 
>>>> DMA.
>>>> 
>>> But not when used in combination with the DMA buf changes later in 
>>> \x7f\x7fthis series.
>>> 
>> 
>> That's still the case after the DMABUF changes of the series. Or can 
>> \x7fyou point me exactly what you think is broken?
>> 
> When you allocate a DMABUF with the allocate IOCTL and then submit it 
> with the enqueue IOCTL before the buffer is enabled it will end up 
> marked as queued, but not actually be queued anywhere.
> 

Ok, it works for me because I never enqueue blocks before enabling the 
buffer. I can add a requirement that blocks must be enqueued only after 
the buffer is enabled.

Cheers,
-Paul



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-22 15:16             ` Paul Cercueil
@ 2021-11-22 15:17               ` Lars-Peter Clausen
  2021-11-22 15:27                 ` Paul Cercueil
  0 siblings, 1 reply; 59+ messages in thread
From: Lars-Peter Clausen @ 2021-11-22 15:17 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Alexandru Ardelean, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On 11/22/21 4:16 PM, Paul Cercueil wrote:
> Hi Lars,
>
> Le lun., nov. 22 2021 at 16:08:51 +0100, Lars-Peter Clausen 
> <lars@metafoo.de> a écrit :
>> On 11/21/21 9:08 PM, Paul Cercueil wrote:
>>>
>>>
>>> Le dim., nov. 21 2021 at 19:49:03 +0100, Lars-Peter Clausen 
>>> \x7f<lars@metafoo.de> a écrit :
>>>> On 11/21/21 6:52 PM, Paul Cercueil wrote:
>>>>> Hi Lars,
>>>>>
>>>>> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen 
>>>>> \x7f\x7f\x7f\x7f<lars@metafoo.de> a écrit :
>>>>>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>>>>>> The buffer-dma code was using two queues, incoming and outgoing, to
>>>>>>> manage the state of the blocks in use.
>>>>>>>
>>>>>>> While this totally works, it adds some complexity to the code,
>>>>>>> especially since the code only manages 2 blocks. It is much 
>>>>>>> easier to
>>>>>>> just check each block's state manually, and keep a counter for 
>>>>>>> the \x7f\x7f\x7f\x7f\x7fnext
>>>>>>> block to dequeue.
>>>>>>>
>>>>>>> Since the new DMABUF based API wouldn't use these incoming and 
>>>>>>> \x7f\x7f\x7f\x7f\x7foutgoing
>>>>>>> queues anyway, getting rid of them now makes the upcoming changes
>>>>>>> simpler.
>>>>>>>
>>>>>>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>>>>>> The outgoing queue is going to be replaced by fences, but I think 
>>>>>> \x7f\x7f\x7f\x7fwe \x7f\x7fneed to keep the incoming queue.
>>>>>
>>>>> Blocks are always accessed in sequential order, so we now have a 
>>>>> \x7f\x7f\x7f\x7f"queue->next_dequeue" that cycles between the buffers 
>>>>> allocated for \x7f\x7f\x7f\x7ffileio.
>>>>>
>>>>>>> [...]
>>>>>>> @@ -442,28 +435,33 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>>>>>   static void iio_dma_buffer_enqueue(struct iio_dma_buffer_queue 
>>>>>>> \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f*queue,
>>>>>>>       struct iio_dma_buffer_block *block)
>>>>>>>   {
>>>>>>> -    if (block->state == IIO_BLOCK_STATE_DEAD) {
>>>>>>> +    if (block->state == IIO_BLOCK_STATE_DEAD)
>>>>>>>           iio_buffer_block_put(block);
>>>>>>> -    } else if (queue->active) {
>>>>>>> +    else if (queue->active)
>>>>>>>           iio_dma_buffer_submit_block(queue, block);
>>>>>>> -    } else {
>>>>>>> +    else
>>>>>>>           block->state = IIO_BLOCK_STATE_QUEUED;
>>>>>>> -        list_add_tail(&block->head, &queue->incoming);
>>>>>> If iio_dma_buffer_enqueue() is called with a dmabuf and the 
>>>>>> buffer \x7f\x7f\x7f\x7fis \x7f\x7fnot active, it will be marked as queued, but we 
>>>>>> don't actually \x7f\x7f\x7f\x7fkeep a \x7f\x7freference to it anywhere. It will 
>>>>>> never be submitted to \x7f\x7f\x7f\x7fthe DMA, and \x7f\x7fit will never be 
>>>>>> signaled as completed.
>>>>>
>>>>> We do keep a reference to the buffers, in the queue->fileio.blocks 
>>>>> \x7f\x7f\x7f\x7farray. When the buffer is enabled, all the blocks in that 
>>>>> array \x7f\x7f\x7fthat \x7fare in the "queued" state will be submitted to the 
>>>>> DMA.
>>>>>
>>>> But not when used in combination with the DMA buf changes later in 
>>>> \x7f\x7fthis series.
>>>>
>>>
>>> That's still the case after the DMABUF changes of the series. Or can 
>>> \x7fyou point me exactly what you think is broken?
>>>
>> When you allocate a DMABUF with the allocate IOCTL and then submit it 
>> with the enqueue IOCTL before the buffer is enabled it will end up 
>> marked as queued, but not actually be queued anywhere.
>>
>
> Ok, it works for me because I never enqueue blocks before enabling the 
> buffer. I can add a requirement that blocks must be enqueued only 
> after the buffer is enabled.

I don't think that is a good idea. This way you are going to potentially 
drop data at the begining of your stream when the DMA isn't ready yet.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues
  2021-11-22 15:17               ` Lars-Peter Clausen
@ 2021-11-22 15:27                 ` Paul Cercueil
  0 siblings, 0 replies; 59+ messages in thread
From: Paul Cercueil @ 2021-11-22 15:27 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: Jonathan Cameron, Alexandru Ardelean, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig



Le lun., nov. 22 2021 at 16:17:59 +0100, Lars-Peter Clausen 
<lars@metafoo.de> a écrit :
> On 11/22/21 4:16 PM, Paul Cercueil wrote:
>> Hi Lars,
>> 
>> Le lun., nov. 22 2021 at 16:08:51 +0100, Lars-Peter Clausen 
>> \x7f<lars@metafoo.de> a écrit :
>>> On 11/21/21 9:08 PM, Paul Cercueil wrote:
>>>> 
>>>> 
>>>> Le dim., nov. 21 2021 at 19:49:03 +0100, Lars-Peter Clausen 
>>>> \x7f\x7f\x7f\x7f<lars@metafoo.de> a écrit :
>>>>> On 11/21/21 6:52 PM, Paul Cercueil wrote:
>>>>>> Hi Lars,
>>>>>> 
>>>>>> Le dim., nov. 21 2021 at 17:23:35 +0100, Lars-Peter Clausen 
>>>>>> \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f<lars@metafoo.de> a écrit :
>>>>>>> On 11/15/21 3:19 PM, Paul Cercueil wrote:
>>>>>>>> The buffer-dma code was using two queues, incoming and 
>>>>>>>> outgoing, to
>>>>>>>> manage the state of the blocks in use.
>>>>>>>> 
>>>>>>>> While this totally works, it adds some complexity to the code,
>>>>>>>> especially since the code only manages 2 blocks. It is much 
>>>>>>>> \x7f\x7f\x7f\x7f\x7f\x7f\x7feasier to
>>>>>>>> just check each block's state manually, and keep a counter for 
>>>>>>>> \x7f\x7f\x7f\x7f\x7f\x7f\x7fthe \x7f\x7f\x7f\x7f\x7fnext
>>>>>>>> block to dequeue.
>>>>>>>> 
>>>>>>>> Since the new DMABUF based API wouldn't use these incoming and 
>>>>>>>> \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7foutgoing
>>>>>>>> queues anyway, getting rid of them now makes the upcoming 
>>>>>>>> changes
>>>>>>>> simpler.
>>>>>>>> 
>>>>>>>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>>>>>>> The outgoing queue is going to be replaced by fences, but I 
>>>>>>> think \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7fwe \x7f\x7fneed to keep the incoming queue.
>>>>>> 
>>>>>> Blocks are always accessed in sequential order, so we now have a 
>>>>>> \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"queue->next_dequeue" that cycles between the buffers 
>>>>>> \x7f\x7f\x7f\x7f\x7fallocated for \x7f\x7f\x7f\x7ffileio.
>>>>>> 
>>>>>>>> [...]
>>>>>>>> @@ -442,28 +435,33 @@ 
>>>>>>>> EXPORT_SYMBOL_GPL(iio_dma_buffer_disable);
>>>>>>>>   static void iio_dma_buffer_enqueue(struct 
>>>>>>>> iio_dma_buffer_queue \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f*queue,
>>>>>>>>       struct iio_dma_buffer_block *block)
>>>>>>>>   {
>>>>>>>> -    if (block->state == IIO_BLOCK_STATE_DEAD) {
>>>>>>>> +    if (block->state == IIO_BLOCK_STATE_DEAD)
>>>>>>>>           iio_buffer_block_put(block);
>>>>>>>> -    } else if (queue->active) {
>>>>>>>> +    else if (queue->active)
>>>>>>>>           iio_dma_buffer_submit_block(queue, block);
>>>>>>>> -    } else {
>>>>>>>> +    else
>>>>>>>>           block->state = IIO_BLOCK_STATE_QUEUED;
>>>>>>>> -        list_add_tail(&block->head, &queue->incoming);
>>>>>>> If iio_dma_buffer_enqueue() is called with a dmabuf and the 
>>>>>>> \x7f\x7f\x7f\x7f\x7f\x7fbuffer \x7f\x7f\x7f\x7fis \x7f\x7fnot active, it will be marked as queued, 
>>>>>>> but we \x7f\x7f\x7f\x7f\x7f\x7fdon't actually \x7f\x7f\x7f\x7fkeep a \x7f\x7freference to it 
>>>>>>> anywhere. It will \x7f\x7f\x7f\x7f\x7f\x7fnever be submitted to \x7f\x7f\x7f\x7fthe DMA, and 
>>>>>>> \x7f\x7fit will never be \x7f\x7f\x7f\x7f\x7f\x7fsignaled as completed.
>>>>>> 
>>>>>> We do keep a reference to the buffers, in the 
>>>>>> queue->fileio.blocks \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7farray. When the buffer is enabled, 
>>>>>> all the blocks in that \x7f\x7f\x7f\x7f\x7farray \x7f\x7f\x7fthat \x7fare in the "queued" 
>>>>>> state will be submitted to the \x7f\x7f\x7f\x7f\x7fDMA.
>>>>>> 
>>>>> But not when used in combination with the DMA buf changes later 
>>>>> in \x7f\x7f\x7f\x7f\x7f\x7fthis series.
>>>>> 
>>>> 
>>>> That's still the case after the DMABUF changes of the series. Or 
>>>> can \x7f\x7f\x7f\x7fyou point me exactly what you think is broken?
>>>> 
>>> When you allocate a DMABUF with the allocate IOCTL and then submit 
>>> it \x7f\x7fwith the enqueue IOCTL before the buffer is enabled it will 
>>> end up \x7f\x7fmarked as queued, but not actually be queued anywhere.
>>> 
>> 
>> Ok, it works for me because I never enqueue blocks before enabling 
>> the \x7fbuffer. I can add a requirement that blocks must be enqueued 
>> only \x7fafter the buffer is enabled.
> 
> I don't think that is a good idea. This way you are going to 
> potentially drop data at the begining of your stream when the DMA 
> isn't ready yet.
> 

You wouldn't drop data, but it could cause an underrun, yes. Is it such 
a big deal, knowing that the buffer was just enabled? I don't think you 
can disable then enable the buffer without causing a discontinuity in 
the stream.

-Paul



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting
  2021-11-21 17:43     ` Paul Cercueil
@ 2021-11-25 17:29       ` Paul Cercueil
  2021-11-27 16:05         ` Jonathan Cameron
  2021-11-27 15:20       ` Jonathan Cameron
  1 sibling, 1 reply; 59+ messages in thread
From: Paul Cercueil @ 2021-11-25 17:29 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

Hi Jonathan,

Le dim., nov. 21 2021 at 17:43:20 +0000, Paul Cercueil 
<paul@crapouillou.net> a écrit :
> Hi Jonathan,
> 
> Le dim., nov. 21 2021 at 15:00:37 +0000, Jonathan Cameron 
> <jic23@kernel.org> a écrit :
>> On Mon, 15 Nov 2021 14:19:21 +0000
>> Paul Cercueil <paul@crapouillou.net> wrote:
>> 
>>>  We can be certain that the input buffers will only be accessed by
>>>  userspace for reading, and output buffers will mostly be accessed 
>>> by
>>>  userspace for writing.
>> 
>> Mostly?  Perhaps a little more info on why that's not 'only'.
> 
> Just like with a framebuffer, it really depends on what the 
> application does. Most of the cases it will just read sequentially an 
> input buffer, or write sequentially an output buffer. But then you 
> get the exotic application that will try to do something like alpha 
> blending, which means read+write. Hence "mostly".
> 
>>> 
>>>  Therefore, it makes more sense to use only fully cached input 
>>> \x7f\x7fbuffers,
>>>  and to use the write-combine cache coherency setting for output 
>>> \x7f\x7fbuffers.
>>> 
>>>  This boosts performance, as the data written to the output buffers 
>>> \x7f\x7fdoes
>>>  not have to be sync'd for coherency. It will halve performance if 
>>> \x7f\x7fthe
>>>  userspace application tries to read from the output buffer, but 
>>> this
>>>  should never happen.
>>> 
>>>  Since we don't need to sync the cache when disabling CPU access 
>>> \x7f\x7feither
>>>  for input buffers or output buffers, the .end_cpu_access() 
>>> callback \x7f\x7fcan
>>>  be dropped completely.
>> 
>> We have an odd mix of coherent and non coherent DMA in here as you 
>> \x7fnoted,
>> but are you sure this is safe on all platforms?
> 
> The mix isn't safe, but using only coherent or only non-coherent 
> should be safe, yes.
> 
>> 
>>> 
>>>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>> 
>> Any numbers to support this patch?  The mapping types are performance
>> optimisations so nice to know how much of a difference they make.
> 
> Output buffers are definitely faster in write-combine mode. On a 
> ZedBoard with a AD9361 transceiver set to 66 MSPS, and buffer/size 
> set to 8192, I would get about 185 MiB/s before, 197 MiB/s after.
> 
> Input buffers... early results are mixed. On ARM32 it does look like 
> it is slightly faster to read from *uncached* memory than reading 
> from cached memory. The cache sync does take a long time.
> 
> Other architectures might have a different result, for instance on 
> MIPS invalidating the cache is a very fast operation, so using cached 
> buffers would be a huge win in performance.
> 
> Setups where the DMA operations are coherent also wouldn't require 
> any cache sync and this patch would give a huge win in performance.
> 
> I'll run some more tests next week to have some fresh numbers.

I think I mixed things up before, because I get different results now.

Here are some fresh benchmarks, triple-checked, using libiio's 
iio_readdev and iio_writedev tools, with 64K samples buffers at 61.44 
MSPS (max. theorical throughput: 234 MiB/s):
  iio_readdev -b 65536 cf-ad9361-lpc voltage0 voltage1 | pv > /dev/null
  pv /dev/zero | iio_writedev -b 65536 cf-ad9361-dds-core-lpc voltage0 
voltage1

Coherent mapping:
- fileio:
    read:	125 MiB/s
    write:	141 MiB/s
- dmabuf:
    read:	171 MiB/s
    write:	210 MiB/s

Coherent reads + Write-combine writes:
- fileio:
    read:	125 MiB/s
    write:	141 MiB/s
- dmabuf:
    read:	171 MiB/s
    write:	210 MiB/s

Non-coherent mapping:
- fileio:
    read:	119 MiB/s
    write:	124 MiB/s
- dmabuf:
    read:	159 MiB/s
    write:	124 MiB/s

Non-coherent reads + write-combine writes:
- fileio:
    read:	119 MiB/s
    write:	140 MiB/s
- dmabuf:
    read:	159 MiB/s
    write:	210 MiB/s

Non-coherent mapping with no cache sync:
- fileio:
    read:	156 MiB/s
    write:	123 MiB/s
- dmabuf:
    read:	234 MiB/s (capped by sample rate)
    write:	182 MiB/s

Non-coherent reads with no cache sync + write-combine writes:
- fileio:
    read:	156 MiB/s
    write:	140 MiB/s
- dmabuf:
    read:	234 MiB/s (capped by sample rate)
    write:	210 MiB/s


A few things we can deduce from this:

* Write-combine is not available on Zynq/ARM? If it was working, it 
should give a better performance than the coherent mapping, but it 
doesn't seem to do anything at all. At least it doesn't harm 
performance.

* Non-coherent + cache invalidation is definitely a good deal slower 
than using coherent mapping, at least on ARM32. However, when the cache 
sync is disabled (e.g. if the DMA operations are coherent) the reads 
are much faster.

* The new dma-buf based API is a great deal faster than the fileio API.

So in the future we could use coherent reads + write-combine writes, 
unless we know the DMA operations are coherent, and in this case use 
non-coherent reads + write-combine writes.

Regarding this patch, unfortunately I cannot prove that write-combine 
is faster, so I'll just drop this patch for now.

Cheers,
-Paul



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown()
  2021-11-22 10:00     ` Paul Cercueil
@ 2021-11-27 15:15       ` Jonathan Cameron
  0 siblings, 0 replies; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-27 15:15 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Mon, 22 Nov 2021 10:00:19 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> Hi Jonathan,
> 
> Le dim., nov. 21 2021 at 14:08:23 +0000, Jonathan Cameron 
> <jic23@kernel.org> a écrit :
> > On Mon, 15 Nov 2021 14:19:13 +0000
> > Paul Cercueil <paul@crapouillou.net> wrote:
> >   
> >>  We know that the buffer's alignment will always be a power of two;
> >>  therefore, we can use the faster round_down() macro.
> >> 
> >>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>  
> > *groan*.  I don't want to know where the naming of these two came 
> > from but that
> > is spectacular...
> > 
> > Anyhow, happy to pick up 1-3 now if you like as all are good cleanup 
> > of
> > existing code.  
> 
> I think you can pick 2-3 now; I will do some changes to patch [01/15] 
> in the V2.

Applied, 2+3 to the togreg branch of iio.git and pushed out as testing for all
the normal reasons.

Thanks,

Jonathan

> 
> Cheers,
> -Paul
> 
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/15] iio: buffer-dma: Enable buffer write support
  2021-11-21 17:19     ` Paul Cercueil
@ 2021-11-27 15:17       ` Jonathan Cameron
  0 siblings, 0 replies; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-27 15:17 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Sun, 21 Nov 2021 17:19:32 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> Hi Jonathan,
> 
> Le dim., nov. 21 2021 at 14:20:49 +0000, Jonathan Cameron 
> <jic23@kernel.org> a écrit :
> > On Mon, 15 Nov 2021 14:19:14 +0000
> > Paul Cercueil <paul@crapouillou.net> wrote:
> >   
> >>  Adding write support to the buffer-dma code is easy - the write()
> >>  function basically needs to do the exact same thing as the read()
> >>  function: dequeue a block, read or write the data, enqueue the block
> >>  when entirely processed.
> >> 
> >>  Therefore, the iio_buffer_dma_read() and the new 
> >> iio_buffer_dma_write()
> >>  now both call a function iio_buffer_dma_io(), which will perform 
> >> this
> >>  task.
> >> 
> >>  The .space_available() callback can return the exact same value as 
> >> the
> >>  .data_available() callback for input buffers, since in both cases we
> >>  count the exact same thing (the number of bytes in each available
> >>  block).
> >> 
> >>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>  
> > Hi Paul,
> > 
> > There are a few changes in here, such as the bytes_used value being 
> > set that
> > I'm not following the reasoning behind. More info on those?
> > Also good to provide something about those in this patch description.
> > 
> > Thanks,
> > 
> > Jonathan
> > 
> >   
> >>  ---
> >>   drivers/iio/buffer/industrialio-buffer-dma.c | 75 
> >> +++++++++++++++-----
> >>   include/linux/iio/buffer-dma.h               |  7 ++
> >>   2 files changed, 66 insertions(+), 16 deletions(-)
> >> 
> >>  diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c 
> >> b/drivers/iio/buffer/industrialio-buffer-dma.c
> >>  index abac88f20104..eeeed6b2e0cf 100644
> >>  --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> >>  +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> >>  @@ -179,7 +179,8 @@ static struct iio_dma_buffer_block 
> >> *iio_dma_buffer_alloc_block(
> >>   	}
> >> 
> >>   	block->size = size;
> >>  -	block->state = IIO_BLOCK_STATE_DEQUEUED;
> >>  +	block->bytes_used = size;
> >>  +	block->state = IIO_BLOCK_STATE_DONE;  
> > 
> > I don't know why these are here - some more info?  
> 
> When using an input buffer the block->bytes_used is unconditionally 
> reset in iio_dmaengine_buffer_submit_block(), so this was fine until 
> now.
> 
> When using an output buffer the block->bytes_used can actually (with 
> the new API) be specified by the user, so we don't want 
> iio_dmaengine_buffer_submit_block() to unconditionally override it. 
> Which means that in the case where we have an output buffer in fileio 
> mode, we do need block->bytes_used to be initialized to the buffer's 
> size since it won't be set anywhere else.

Makes sense.  Thanks for the explanation.

> 
> About the change in block->state: in patch [01/15] we removed the 
> incoming/outgoing queues, and while the "enqueued" state is still 
> useful to know which buffers have to be submitted when the buffer is 
> enabled, the "dequeued" state is not useful anymore since there is no 
> more distinction vs. the "done" state.
> 
> I believe this change should be moved to patch [01/15] then, and I 
> should go further and remove the IIO_BLOCK_STATE_DEQUEUED completely.

Complete removal would indeed be a more 'obvious' change, so I'd support
that assuming now disadvantages we havent' thought of yet.
> 
> >>   	block->queue = queue;
> >>   	INIT_LIST_HEAD(&block->head);
> >>   	kref_init(&block->kref);
> >>  @@ -195,6 +196,18 @@ static void _iio_dma_buffer_block_done(struct 
> >> iio_dma_buffer_block *block)
> >>   		block->state = IIO_BLOCK_STATE_DONE;
> >>   }
> >> 
> >>  +static void iio_dma_buffer_queue_wake(struct iio_dma_buffer_queue 
> >> *queue)
> >>  +{
> >>  +	__poll_t flags;
> >>  +
> >>  +	if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> >>  +		flags = EPOLLIN | EPOLLRDNORM;
> >>  +	else
> >>  +		flags = EPOLLOUT | EPOLLWRNORM;
> >>  +
> >>  +	wake_up_interruptible_poll(&queue->buffer.pollq, flags);
> >>  +}
> >>  +
> >>   /**
> >>    * iio_dma_buffer_block_done() - Indicate that a block has been 
> >> completed
> >>    * @block: The completed block
> >>  @@ -212,7 +225,7 @@ void iio_dma_buffer_block_done(struct 
> >> iio_dma_buffer_block *block)
> >>   	spin_unlock_irqrestore(&queue->list_lock, flags);
> >> 
> >>   	iio_buffer_block_put_atomic(block);
> >>  -	wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | 
> >> EPOLLRDNORM);
> >>  +	iio_dma_buffer_queue_wake(queue);
> >>   }
> >>   EXPORT_SYMBOL_GPL(iio_dma_buffer_block_done);
> >> 
> >>  @@ -241,7 +254,7 @@ void iio_dma_buffer_block_list_abort(struct 
> >> iio_dma_buffer_queue *queue,
> >>   	}
> >>   	spin_unlock_irqrestore(&queue->list_lock, flags);
> >> 
> >>  -	wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | 
> >> EPOLLRDNORM);
> >>  +	iio_dma_buffer_queue_wake(queue);
> >>   }
> >>   EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort);
> >> 
> >>  @@ -334,7 +347,8 @@ int iio_dma_buffer_request_update(struct 
> >> iio_buffer *buffer)
> >>   			queue->fileio.blocks[i] = block;
> >>   		}
> >> 
> >>  -		block->state = IIO_BLOCK_STATE_QUEUED;
> >>  +		if (queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> >>  +			block->state = IIO_BLOCK_STATE_QUEUED;  
> > 
> > Possibly worth a comment on the state being set here.  I figured it 
> > out, but might
> > save some brain cells in future if it's stated in the code.  
> 
> Ok.
> 
> >>   	}
> >> 
> >>   out_unlock:
> >>  @@ -467,20 +481,12 @@ static struct iio_dma_buffer_block 
> >> *iio_dma_buffer_dequeue(
> >>   	return block;
> >>   }
> >> 
> >>  -/**
> >>  - * iio_dma_buffer_read() - DMA buffer read callback
> >>  - * @buffer: Buffer to read form
> >>  - * @n: Number of bytes to read
> >>  - * @user_buffer: Userspace buffer to copy the data to
> >>  - *
> >>  - * Should be used as the read callback for iio_buffer_access_ops
> >>  - * struct for DMA buffers.
> >>  - */
> >>  -int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> >>  -	char __user *user_buffer)
> >>  +static int iio_dma_buffer_io(struct iio_buffer *buffer,
> >>  +			     size_t n, char __user *user_buffer, bool is_write)
> >>   {
> >>   	struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
> >>   	struct iio_dma_buffer_block *block;
> >>  +	void *addr;
> >>   	int ret;
> >> 
> >>   	if (n < buffer->bytes_per_datum)
> >>  @@ -503,8 +509,13 @@ int iio_dma_buffer_read(struct iio_buffer 
> >> *buffer, size_t n,
> >>   	n = rounddown(n, buffer->bytes_per_datum);
> >>   	if (n > block->bytes_used - queue->fileio.pos)
> >>   		n = block->bytes_used - queue->fileio.pos;
> >>  +	addr = block->vaddr + queue->fileio.pos;
> >> 
> >>  -	if (copy_to_user(user_buffer, block->vaddr + queue->fileio.pos, 
> >> n)) {
> >>  +	if (is_write)
> >>  +		ret = !!copy_from_user(addr, user_buffer, n);
> >>  +	else
> >>  +		ret = !!copy_to_user(user_buffer, addr, n);  
> > 
> > What is the !! gaining us here?  We only care about == 0 vs != 0 so
> > forcing it to be 0 or 1 isn't useful.  
> 
> Right.
> 
> >>  +	if (ret) {
> >>   		ret = -EFAULT;
> >>   		goto out_unlock;
> >>   	}
> >>  @@ -513,6 +524,7 @@ int iio_dma_buffer_read(struct iio_buffer 
> >> *buffer, size_t n,
> >> 
> >>   	if (queue->fileio.pos == block->bytes_used) {
> >>   		queue->fileio.active_block = NULL;
> >>  +		block->bytes_used = block->size;  
> > 
> > This seems to be a functional change that isn't called out in the 
> > patch description.  
> 
> See the explanation above. Although I most likely don't need to set it 
> at two different spots... I'll check that in detail next week.
> 
> Cheers,
> -Paul
> 
> >>   		iio_dma_buffer_enqueue(queue, block);
> >>   	}
> >> 
> >>  @@ -523,8 +535,39 @@ int iio_dma_buffer_read(struct iio_buffer 
> >> *buffer, size_t n,
> >> 
> >>   	return ret;
> >>   }
> >>  +
> >>  +/**
> >>  + * iio_dma_buffer_read() - DMA buffer read callback
> >>  + * @buffer: Buffer to read form
> >>  + * @n: Number of bytes to read
> >>  + * @user_buffer: Userspace buffer to copy the data to
> >>  + *
> >>  + * Should be used as the read callback for iio_buffer_access_ops
> >>  + * struct for DMA buffers.
> >>  + */
> >>  +int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> >>  +	char __user *user_buffer)
> >>  +{
> >>  +	return iio_dma_buffer_io(buffer, n, user_buffer, false);
> >>  +}
> >>   EXPORT_SYMBOL_GPL(iio_dma_buffer_read);
> >> 
> >>  +/**
> >>  + * iio_dma_buffer_write() - DMA buffer write callback
> >>  + * @buffer: Buffer to read form
> >>  + * @n: Number of bytes to read
> >>  + * @user_buffer: Userspace buffer to copy the data from
> >>  + *
> >>  + * Should be used as the write callback for iio_buffer_access_ops
> >>  + * struct for DMA buffers.
> >>  + */
> >>  +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> >>  +			 const char __user *user_buffer)
> >>  +{
> >>  +	return iio_dma_buffer_io(buffer, n, (__force char *)user_buffer, 
> >> true);
> >>  +}
> >>  +EXPORT_SYMBOL_GPL(iio_dma_buffer_write);
> >>  +
> >>   /**
> >>    * iio_dma_buffer_data_available() - DMA buffer data_available 
> >> callback
> >>    * @buf: Buffer to check for data availability
> >>  diff --git a/include/linux/iio/buffer-dma.h 
> >> b/include/linux/iio/buffer-dma.h
> >>  index a65a005c4a19..09c07d5563c0 100644
> >>  --- a/include/linux/iio/buffer-dma.h
> >>  +++ b/include/linux/iio/buffer-dma.h
> >>  @@ -132,6 +132,8 @@ int iio_dma_buffer_disable(struct iio_buffer 
> >> *buffer,
> >>   	struct iio_dev *indio_dev);
> >>   int iio_dma_buffer_read(struct iio_buffer *buffer, size_t n,
> >>   	char __user *user_buffer);
> >>  +int iio_dma_buffer_write(struct iio_buffer *buffer, size_t n,
> >>  +			 const char __user *user_buffer);
> >>   size_t iio_dma_buffer_data_available(struct iio_buffer *buffer);
> >>   int iio_dma_buffer_set_bytes_per_datum(struct iio_buffer *buffer, 
> >> size_t bpd);
> >>   int iio_dma_buffer_set_length(struct iio_buffer *buffer, unsigned 
> >> int length);
> >>  @@ -142,4 +144,9 @@ int iio_dma_buffer_init(struct 
> >> iio_dma_buffer_queue *queue,
> >>   void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue);
> >>   void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);
> >> 
> >>  +static inline size_t iio_dma_buffer_space_available(struct 
> >> iio_buffer *buffer)
> >>  +{
> >>  +	return iio_dma_buffer_data_available(buffer);
> >>  +}
> >>  +
> >>   #endif  
> >   
> 
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting
  2021-11-21 17:43     ` Paul Cercueil
  2021-11-25 17:29       ` Paul Cercueil
@ 2021-11-27 15:20       ` Jonathan Cameron
  1 sibling, 0 replies; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-27 15:20 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Sun, 21 Nov 2021 17:43:20 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> Hi Jonathan,
> 
> Le dim., nov. 21 2021 at 15:00:37 +0000, Jonathan Cameron 
> <jic23@kernel.org> a écrit :
> > On Mon, 15 Nov 2021 14:19:21 +0000
> > Paul Cercueil <paul@crapouillou.net> wrote:
> >   
> >>  We can be certain that the input buffers will only be accessed by
> >>  userspace for reading, and output buffers will mostly be accessed by
> >>  userspace for writing.  
> > 
> > Mostly?  Perhaps a little more info on why that's not 'only'.  
> 
> Just like with a framebuffer, it really depends on what the application 
> does. Most of the cases it will just read sequentially an input buffer, 
> or write sequentially an output buffer. But then you get the exotic 
> application that will try to do something like alpha blending, which 
> means read+write. Hence "mostly".

Ok. That makes sense though I hope no one actually does it, we can't
prevent them doing so.


> 
> >> 
> >>  Therefore, it makes more sense to use only fully cached input 
> >> buffers,
> >>  and to use the write-combine cache coherency setting for output 
> >> buffers.
> >> 
> >>  This boosts performance, as the data written to the output buffers 
> >> does
> >>  not have to be sync'd for coherency. It will halve performance if 
> >> the
> >>  userspace application tries to read from the output buffer, but this
> >>  should never happen.
> >> 
> >>  Since we don't need to sync the cache when disabling CPU access 
> >> either
> >>  for input buffers or output buffers, the .end_cpu_access() callback 
> >> can
> >>  be dropped completely.  
> > 
> > We have an odd mix of coherent and non coherent DMA in here as you 
> > noted,
> > but are you sure this is safe on all platforms?  
> 
> The mix isn't safe, but using only coherent or only non-coherent should 
> be safe, yes.

yup

> 
> >   
> >> 
> >>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>  
> > 
> > Any numbers to support this patch?  The mapping types are performance
> > optimisations so nice to know how much of a difference they make.  
> 
> Output buffers are definitely faster in write-combine mode. On a 
> ZedBoard with a AD9361 transceiver set to 66 MSPS, and buffer/size set 
> to 8192, I would get about 185 MiB/s before, 197 MiB/s after.
> 
> Input buffers... early results are mixed. On ARM32 it does look like it 
> is slightly faster to read from *uncached* memory than reading from 
> cached memory. The cache sync does take a long time.
> 
> Other architectures might have a different result, for instance on MIPS 
> invalidating the cache is a very fast operation, so using cached 
> buffers would be a huge win in performance.
> 
> Setups where the DMA operations are coherent also wouldn't require any 
> cache sync and this patch would give a huge win in performance.
> 
> I'll run some more tests next week to have some fresh numbers.

Great.

Thanks,

Jonathan

> 
> Cheers,
> -Paul
> 
> >>  ---
> >>   drivers/iio/buffer/industrialio-buffer-dma.c | 82 
> >> +++++++++++++-------
> >>   1 file changed, 54 insertions(+), 28 deletions(-)
> >> 
> >>  diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c 
> >> b/drivers/iio/buffer/industrialio-buffer-dma.c
> >>  index 92356ee02f30..fb39054d8c15 100644
> >>  --- a/drivers/iio/buffer/industrialio-buffer-dma.c
> >>  +++ b/drivers/iio/buffer/industrialio-buffer-dma.c
> >>  @@ -229,8 +229,33 @@ static int iio_buffer_dma_buf_mmap(struct 
> >> dma_buf *dbuf,
> >>   	if (vma->vm_ops->open)
> >>   		vma->vm_ops->open(vma);
> >> 
> >>  -	return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> >>  -			      virt_to_page(block->vaddr));
> >>  +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> >>  +		/*
> >>  +		 * With an input buffer, userspace will only read the data and
> >>  +		 * never write. We can mmap the buffer fully cached.
> >>  +		 */
> >>  +		return dma_mmap_pages(dev, vma, vma->vm_end - vma->vm_start,
> >>  +				      virt_to_page(block->vaddr));
> >>  +	} else {
> >>  +		/*
> >>  +		 * With an output buffer, userspace will only write the data
> >>  +		 * and should rarely (if never) read from it. It is better to
> >>  +		 * use write-combine in this case.
> >>  +		 */
> >>  +		return dma_mmap_wc(dev, vma, block->vaddr, block->phys_addr,
> >>  +				   vma->vm_end - vma->vm_start);
> >>  +	}
> >>  +}
> >>  +
> >>  +static void iio_dma_buffer_free_dmamem(struct iio_dma_buffer_block 
> >> *block)
> >>  +{
> >>  +	struct device *dev = block->queue->dev;
> >>  +	size_t size = PAGE_ALIGN(block->size);
> >>  +
> >>  +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN)
> >>  +		dma_free_coherent(dev, size, block->vaddr, block->phys_addr);
> >>  +	else
> >>  +		dma_free_wc(dev, size, block->vaddr, block->phys_addr);
> >>   }
> >> 
> >>   static void iio_buffer_dma_buf_release(struct dma_buf *dbuf)
> >>  @@ -243,9 +268,7 @@ static void iio_buffer_dma_buf_release(struct 
> >> dma_buf *dbuf)
> >> 
> >>   	mutex_lock(&queue->lock);
> >> 
> >>  -	dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
> >>  -			  block->vaddr, block->phys_addr);
> >>  -
> >>  +	iio_dma_buffer_free_dmamem(block);
> >>   	kfree(block);
> >> 
> >>   	queue->num_blocks--;
> >>  @@ -268,19 +291,6 @@ static int 
> >> iio_buffer_dma_buf_begin_cpu_access(struct dma_buf *dbuf,
> >>   	return 0;
> >>   }
> >> 
> >>  -static int iio_buffer_dma_buf_end_cpu_access(struct dma_buf *dbuf,
> >>  -					     enum dma_data_direction dma_dir)
> >>  -{
> >>  -	struct iio_dma_buffer_block *block = dbuf->priv;
> >>  -	struct device *dev = block->queue->dev;
> >>  -
> >>  -	/* We only need to sync the cache for output buffers */
> >>  -	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_OUT)
> >>  -		dma_sync_single_for_device(dev, block->phys_addr, block->size, 
> >> dma_dir);
> >>  -
> >>  -	return 0;
> >>  -}
> >>  -
> >>   static const struct dma_buf_ops iio_dma_buffer_dmabuf_ops = {
> >>   	.attach			= iio_buffer_dma_buf_attach,
> >>   	.map_dma_buf		= iio_buffer_dma_buf_map,
> >>  @@ -288,9 +298,28 @@ static const struct dma_buf_ops 
> >> iio_dma_buffer_dmabuf_ops = {
> >>   	.mmap			= iio_buffer_dma_buf_mmap,
> >>   	.release		= iio_buffer_dma_buf_release,
> >>   	.begin_cpu_access	= iio_buffer_dma_buf_begin_cpu_access,
> >>  -	.end_cpu_access		= iio_buffer_dma_buf_end_cpu_access,
> >>   };
> >> 
> >>  +static int iio_dma_buffer_alloc_dmamem(struct iio_dma_buffer_block 
> >> *block)
> >>  +{
> >>  +	struct device *dev = block->queue->dev;
> >>  +	size_t size = PAGE_ALIGN(block->size);
> >>  +
> >>  +	if (block->queue->buffer.direction == IIO_BUFFER_DIRECTION_IN) {
> >>  +		block->vaddr = dma_alloc_coherent(dev, size,
> >>  +						  &block->phys_addr,
> >>  +						  GFP_KERNEL);
> >>  +	} else {
> >>  +		block->vaddr = dma_alloc_wc(dev, size,
> >>  +					    &block->phys_addr,
> >>  +					    GFP_KERNEL);
> >>  +	}
> >>  +	if (!block->vaddr)
> >>  +		return -ENOMEM;
> >>  +
> >>  +	return 0;
> >>  +}
> >>  +
> >>   static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
> >>   	struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
> >>   {
> >>  @@ -303,12 +332,12 @@ static struct iio_dma_buffer_block 
> >> *iio_dma_buffer_alloc_block(
> >>   	if (!block)
> >>   		return ERR_PTR(-ENOMEM);
> >> 
> >>  -	block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
> >>  -		&block->phys_addr, GFP_KERNEL);
> >>  -	if (!block->vaddr) {
> >>  -		err = -ENOMEM;
> >>  +	block->size = size;
> >>  +	block->queue = queue;
> >>  +
> >>  +	err = iio_dma_buffer_alloc_dmamem(block);
> >>  +	if (err)
> >>   		goto err_free_block;
> >>  -	}
> >> 
> >>   	einfo.ops = &iio_dma_buffer_dmabuf_ops;
> >>   	einfo.size = PAGE_ALIGN(size);
> >>  @@ -322,10 +351,8 @@ static struct iio_dma_buffer_block 
> >> *iio_dma_buffer_alloc_block(
> >>   	}
> >> 
> >>   	block->dmabuf = dmabuf;
> >>  -	block->size = size;
> >>   	block->bytes_used = size;
> >>   	block->state = IIO_BLOCK_STATE_DONE;
> >>  -	block->queue = queue;
> >>   	block->fileio = fileio;
> >>   	INIT_LIST_HEAD(&block->head);
> >> 
> >>  @@ -338,8 +365,7 @@ static struct iio_dma_buffer_block 
> >> *iio_dma_buffer_alloc_block(
> >>   	return block;
> >> 
> >>   err_free_dma:
> >>  -	dma_free_coherent(queue->dev, PAGE_ALIGN(size),
> >>  -			  block->vaddr, block->phys_addr);
> >>  +	iio_dma_buffer_free_dmamem(block);
> >>   err_free_block:
> >>   	kfree(block);
> >>   	return ERR_PTR(err);  
> >   
> 
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting
  2021-11-25 17:29       ` Paul Cercueil
@ 2021-11-27 16:05         ` Jonathan Cameron
  2021-11-28 13:25           ` Lars-Peter Clausen
  0 siblings, 1 reply; 59+ messages in thread
From: Jonathan Cameron @ 2021-11-27 16:05 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Alexandru Ardelean, Lars-Peter Clausen, Michael Hennerich,
	Sumit Semwal, Christian König, linux-iio, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig

On Thu, 25 Nov 2021 17:29:58 +0000
Paul Cercueil <paul@crapouillou.net> wrote:

> Hi Jonathan,
> 
> Le dim., nov. 21 2021 at 17:43:20 +0000, Paul Cercueil 
> <paul@crapouillou.net> a écrit :
> > Hi Jonathan,
> > 
> > Le dim., nov. 21 2021 at 15:00:37 +0000, Jonathan Cameron 
> > <jic23@kernel.org> a écrit :  
> >> On Mon, 15 Nov 2021 14:19:21 +0000
> >> Paul Cercueil <paul@crapouillou.net> wrote:
> >>   
> >>>  We can be certain that the input buffers will only be accessed by
> >>>  userspace for reading, and output buffers will mostly be accessed 
> >>> by
> >>>  userspace for writing.  
> >> 
> >> Mostly?  Perhaps a little more info on why that's not 'only'.  
> > 
> > Just like with a framebuffer, it really depends on what the 
> > application does. Most of the cases it will just read sequentially an 
> > input buffer, or write sequentially an output buffer. But then you 
> > get the exotic application that will try to do something like alpha 
> > blending, which means read+write. Hence "mostly".
> >   
> >>> 
> >>>  Therefore, it makes more sense to use only fully cached input 
> >>> \x7f\x7fbuffers,
> >>>  and to use the write-combine cache coherency setting for output 
> >>> \x7f\x7fbuffers.
> >>> 
> >>>  This boosts performance, as the data written to the output buffers 
> >>> \x7f\x7fdoes
> >>>  not have to be sync'd for coherency. It will halve performance if 
> >>> \x7f\x7fthe
> >>>  userspace application tries to read from the output buffer, but 
> >>> this
> >>>  should never happen.
> >>> 
> >>>  Since we don't need to sync the cache when disabling CPU access 
> >>> \x7f\x7feither
> >>>  for input buffers or output buffers, the .end_cpu_access() 
> >>> callback \x7f\x7fcan
> >>>  be dropped completely.  
> >> 
> >> We have an odd mix of coherent and non coherent DMA in here as you 
> >> \x7fnoted,
> >> but are you sure this is safe on all platforms?  
> > 
> > The mix isn't safe, but using only coherent or only non-coherent 
> > should be safe, yes.
> >   
> >>   
> >>> 
> >>>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>  
> >> 
> >> Any numbers to support this patch?  The mapping types are performance
> >> optimisations so nice to know how much of a difference they make.  
> > 
> > Output buffers are definitely faster in write-combine mode. On a 
> > ZedBoard with a AD9361 transceiver set to 66 MSPS, and buffer/size 
> > set to 8192, I would get about 185 MiB/s before, 197 MiB/s after.
> > 
> > Input buffers... early results are mixed. On ARM32 it does look like 
> > it is slightly faster to read from *uncached* memory than reading 
> > from cached memory. The cache sync does take a long time.
> > 
> > Other architectures might have a different result, for instance on 
> > MIPS invalidating the cache is a very fast operation, so using cached 
> > buffers would be a huge win in performance.
> > 
> > Setups where the DMA operations are coherent also wouldn't require 
> > any cache sync and this patch would give a huge win in performance.
> > 
> > I'll run some more tests next week to have some fresh numbers.  
> 
> I think I mixed things up before, because I get different results now.
> 
> Here are some fresh benchmarks, triple-checked, using libiio's 
> iio_readdev and iio_writedev tools, with 64K samples buffers at 61.44 
> MSPS (max. theorical throughput: 234 MiB/s):
>   iio_readdev -b 65536 cf-ad9361-lpc voltage0 voltage1 | pv > /dev/null
>   pv /dev/zero | iio_writedev -b 65536 cf-ad9361-dds-core-lpc voltage0 
> voltage1

There is a bit of a terminology confusion going on here.  I think
for the mappings you mean cacheable vs non-cacheable but maybe
I'm misunderstanding.  That doesn't necessarily correspond to
coherency.  Non cached memory is always coherent because all caches
miss.

Non-cacheable can be related to coherency of course. Also beware that given
hardware might not implement non-cacheable if it knows all possible
accesses are IO-coherent.  Affect is the same and if implemented
correctly it will not hurt performance significantly.

firmware should be letting the OS know if the device does coherent
DMA or not... dma-coherent in dt.  It might be optional for a given
piece of DMA engine but I've not seen that..

I'm not sure I see how you can do a mixture of cacheable for reads
and write combine (which means uncacheable) for writes...

> 
> Coherent mapping:
> - fileio:
>     read:	125 MiB/s
>     write:	141 MiB/s
> - dmabuf:
>     read:	171 MiB/s
>     write:	210 MiB/s
> 
> Coherent reads + Write-combine writes:
> - fileio:
>     read:	125 MiB/s
>     write:	141 MiB/s
> - dmabuf:
>     read:	171 MiB/s
>     write:	210 MiB/s
> 
> Non-coherent mapping:
> - fileio:
>     read:	119 MiB/s
>     write:	124 MiB/s
> - dmabuf:
>     read:	159 MiB/s
>     write:	124 MiB/s
> 
> Non-coherent reads + write-combine writes:
> - fileio:
>     read:	119 MiB/s
>     write:	140 MiB/s
> - dmabuf:
>     read:	159 MiB/s
>     write:	210 MiB/s
> 


> Non-coherent mapping with no cache sync:
> - fileio:
>     read:	156 MiB/s
>     write:	123 MiB/s
> - dmabuf:
>     read:	234 MiB/s (capped by sample rate)
>     write:	182 MiB/s
> 
> Non-coherent reads with no cache sync + write-combine writes:
> - fileio:
>     read:	156 MiB/s
>     write:	140 MiB/s
> - dmabuf:
>     read:	234 MiB/s (capped by sample rate)
>     write:	210 MiB/s
> 
> 
> A few things we can deduce from this:
> 
> * Write-combine is not available on Zynq/ARM? If it was working, it 
> should give a better performance than the coherent mapping, but it 
> doesn't seem to do anything at all. At least it doesn't harm 
> performance.

I'm not sure it's very relevant to this sort of streaming write.
If you write a sequence of addresses then nothing stops them getting combined
into a single write whether or not it is write-combining.

You may be right that the particular path to memory doesn't support it anyway.
Also some cache architectures will rapidly detect streaming writes and
elect not to cache them whether coherent or not.




> 
> * Non-coherent + cache invalidation is definitely a good deal slower 
> than using coherent mapping, at least on ARM32. However, when the cache 
> sync is disabled (e.g. if the DMA operations are coherent) the reads 
> are much faster.

If you are running with cache sync then it better not be cached
as such it's coherent in the sense of there being no entries in the cache
in either direction.

> 
> * The new dma-buf based API is a great deal faster than the fileio API.

:)

> 
> So in the future we could use coherent reads + write-combine writes, 
> unless we know the DMA operations are coherent, and in this case use 
> non-coherent reads + write-combine writes.

Not following this argument at all, but anyway we can revisit when it mattrs.  

> 
> Regarding this patch, unfortunately I cannot prove that write-combine 
> is faster, so I'll just drop this patch for now.

Sure, thanks for checking.  It's worth noting that WC usage in kernel
is vanishingly rare and I suspect that's mostly because it doesn't
do anything on many implementations.

Jonathan

> 
> Cheers,
> -Paul
> 
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting
  2021-11-27 16:05         ` Jonathan Cameron
@ 2021-11-28 13:25           ` Lars-Peter Clausen
  0 siblings, 0 replies; 59+ messages in thread
From: Lars-Peter Clausen @ 2021-11-28 13:25 UTC (permalink / raw)
  To: Jonathan Cameron, Paul Cercueil
  Cc: Alexandru Ardelean, Michael Hennerich, Sumit Semwal,
	Christian König, linux-iio, linux-kernel, linux-media,
	dri-devel, linaro-mm-sig

On 11/27/21 5:05 PM, Jonathan Cameron wrote:
>> Non-coherent mapping with no cache sync:
>> - fileio:
>>      read:	156 MiB/s
>>      write:	123 MiB/s
>> - dmabuf:
>>      read:	234 MiB/s (capped by sample rate)
>>      write:	182 MiB/s
>>
>> Non-coherent reads with no cache sync + write-combine writes:
>> - fileio:
>>      read:	156 MiB/s
>>      write:	140 MiB/s
>> - dmabuf:
>>      read:	234 MiB/s (capped by sample rate)
>>      write:	210 MiB/s
>>
>>
>> A few things we can deduce from this:
>>
>> * Write-combine is not available on Zynq/ARM? If it was working, it
>> should give a better performance than the coherent mapping, but it
>> doesn't seem to do anything at all. At least it doesn't harm
>> performance.
> I'm not sure it's very relevant to this sort of streaming write.
> If you write a sequence of addresses then nothing stops them getting combined
> into a single write whether or not it is write-combining.

There is a difference at which point they can get combined. With 
write-combine they can be coalesced into a single transaction anywhere 
in the interconnect, as early as the CPU itself. Without write-cobmine 
the DDR controller might decide to combine them, but not earlier. This 
can make a difference especially if the write is a narrow write, i.e. 
the access size is smaller than the buswidth.

Lets say you do 32-bit writes, but your bus is 64 bits wide. With WC two 
32-bits can be combined into a 64-bit write. Without WC that is not 
possible and you are potentially not using the bus to its fullest 
capacity. This is especially true if the memory bus is wider than the 
widest access size of the CPU.



^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2021-11-28 13:28 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
2021-11-15 14:19 ` [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues Paul Cercueil
2021-11-16  8:23   ` Alexandru Ardelean
2021-11-21 14:05   ` Jonathan Cameron
2021-11-21 16:23   ` Lars-Peter Clausen
2021-11-21 17:52     ` Paul Cercueil
2021-11-21 18:49       ` Lars-Peter Clausen
2021-11-21 20:08         ` Paul Cercueil
2021-11-22 15:08           ` Lars-Peter Clausen
2021-11-22 15:16             ` Paul Cercueil
2021-11-22 15:17               ` Lars-Peter Clausen
2021-11-22 15:27                 ` Paul Cercueil
2021-11-15 14:19 ` [PATCH 02/15] iio: buffer-dma: Remove unused iio_buffer_block struct Paul Cercueil
2021-11-16  8:22   ` Alexandru Ardelean
2021-11-15 14:19 ` [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown() Paul Cercueil
2021-11-16  8:26   ` Alexandru Ardelean
2021-11-21 14:08   ` Jonathan Cameron
2021-11-22 10:00     ` Paul Cercueil
2021-11-27 15:15       ` Jonathan Cameron
2021-11-15 14:19 ` [PATCH 04/15] iio: buffer-dma: Enable buffer write support Paul Cercueil
2021-11-16  8:52   ` Alexandru Ardelean
2021-11-21 14:20   ` Jonathan Cameron
2021-11-21 17:19     ` Paul Cercueil
2021-11-27 15:17       ` Jonathan Cameron
2021-11-15 14:19 ` [PATCH 05/15] iio: buffer-dmaengine: Support specifying buffer direction Paul Cercueil
2021-11-16  8:53   ` Alexandru Ardelean
2021-11-15 14:19 ` [PATCH 06/15] iio: buffer-dmaengine: Enable write support Paul Cercueil
2021-11-16  8:55   ` Alexandru Ardelean
2021-11-15 14:19 ` [PATCH 07/15] iio: core: Add new DMABUF interface infrastructure Paul Cercueil
2021-11-21 14:31   ` Jonathan Cameron
2021-11-15 14:19 ` [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function Paul Cercueil
2021-11-16 10:59   ` Alexandru Ardelean
2021-11-21 13:49     ` Jonathan Cameron
2021-11-15 14:19 ` [PATCH 09/15] iio: buffer-dma: Use DMABUFs instead of custom solution Paul Cercueil
2021-11-15 14:19 ` [PATCH 10/15] iio: buffer-dma: Implement new DMABUF based userspace API Paul Cercueil
2021-11-15 14:19 ` [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting Paul Cercueil
2021-11-18 11:45   ` Paul Cercueil
2021-11-21 15:00   ` Jonathan Cameron
2021-11-21 17:43     ` Paul Cercueil
2021-11-25 17:29       ` Paul Cercueil
2021-11-27 16:05         ` Jonathan Cameron
2021-11-28 13:25           ` Lars-Peter Clausen
2021-11-27 15:20       ` Jonathan Cameron
2021-11-15 14:22 ` [PATCH 12/15] iio: buffer-dmaengine: Support new DMABUF based userspace API Paul Cercueil
2021-11-15 14:22   ` [PATCH 13/15] iio: core: Add support for cyclic buffers Paul Cercueil
2021-11-16  9:50     ` Alexandru Ardelean
2021-11-15 14:22   ` [PATCH 14/15] iio: buffer-dmaengine: " Paul Cercueil
2021-11-16  9:50     ` Alexandru Ardelean
2021-11-15 14:22   ` [PATCH 15/15] Documentation: iio: Document high-speed DMABUF based API Paul Cercueil
2021-11-21 15:10     ` Jonathan Cameron
2021-11-21 17:46       ` Paul Cercueil
2021-11-15 14:37 ` [PATCH 00/15] iio: buffer-dma: write() and new " Daniel Vetter
2021-11-15 14:57   ` Paul Cercueil
2021-11-16 16:02     ` Daniel Vetter
2021-11-16 16:31       ` Laurent Pinchart
2021-11-17  8:48         ` Christian König
2021-11-17 12:50       ` Paul Cercueil
2021-11-17 13:42         ` Hennerich, Michael
2021-11-21 13:57 ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).