LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size
@ 2021-08-28 15:36 Sven Peter
  2021-08-28 15:36 ` [PATCH v2 1/8] iommu/dma: Align size for untrusted devs to IOVA granule Sven Peter
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Sven Peter @ 2021-08-28 15:36 UTC (permalink / raw)
  To: iommu
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Robin Murphy,
	Arnd Bergmann, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

RFC Patch: https://lore.kernel.org/linux-iommu/20210806155523.50429-1-sven@svenpeter.dev/

Hi,

After a very helpful discussion with Robin Murphy on the RFC, here's v2 that is slowly
starting to look sane.
I've been running this code for two weeks now and mainly tested it with usb storage devices
connected to dwc3 and to xhci over pcie on the M1.

Some background: On the Apple M1 the IOMMUs are hardwired to only support 16 KB pages.
We'd still like to boot Linux with 4KB pages though because that's what most distros
ship these days. This patch series adds support for that setup to the IOMMU DMA API.

This is essentially done by always mapping the encapsulating IOMMU page and adjusting
the returned iova offset. There are also changes to only allow DMA domains to make use
of this and prevent UNMANAGED domains from encountering unexpected situations.

For untrusted devices the allocation size is simply aligned to iovad->granule if they
don't already go through the swiotlb path. I have not been able to test that part
so far though since there's no Thunderbolt support for the M1 yet.

The series is based on top of iommu/next (and without the last commit probably also on
iommu/core). It won't apply cleanly on apple/dart since it already takes Robin's DMA domain
cleanup series into account.


Best,

Sven
 
Sven Peter (8):
  iommu/dma: Align size for untrusted devs to IOVA granule
  iommu/dma: Fail unaligned map requests for untrusted devs
  iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  iommu/dma: Support granule > PAGE_SIZE in dma_map_sg
  iommu/dma: Support PAGE_SIZE < iovad->granule allocations
  iommu: Move IOMMU pagesize check to attach_device
  iommu: Introduce __IOMMU_DOMAIN_LP
  iommu/dart: Remove force_bypass logic

 drivers/iommu/apple-dart.c |  14 +--
 drivers/iommu/dma-iommu.c  | 172 ++++++++++++++++++++++++++++++++-----
 drivers/iommu/iommu.c      |  36 +++++++-
 drivers/iommu/iova.c       |   7 +-
 include/linux/iommu.h      |   8 +-
 5 files changed, 197 insertions(+), 40 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/8] iommu/dma: Align size for untrusted devs to IOVA granule
  2021-08-28 15:36 [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Sven Peter
@ 2021-08-28 15:36 ` Sven Peter
  2021-08-28 15:36 ` [PATCH v2 2/8] iommu/dma: Fail unaligned map requests for untrusted devs Sven Peter
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Sven Peter @ 2021-08-28 15:36 UTC (permalink / raw)
  To: iommu
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Robin Murphy,
	Arnd Bergmann, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

Up until now PAGE_SIZE was always a multiple of iovad->granule
such that adjacent pages were never exposed to untrusted devices
due to allocations done as part of the coherent DMA API.
With PAGE_SIZE < iovad->granule however all these allocations
must also be aligned to iovad->granule.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d0bc8c06e1a4..e8eae34e9e4f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -735,10 +735,16 @@ static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
 		unsigned long attrs)
 {
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iova_domain *iovad = &cookie->iovad;
 	struct page **pages;
 	struct sg_table sgt;
 	void *vaddr;
 
+	if (dev_is_untrusted(dev))
+		size = iova_align(iovad, size);
+
 	pages = __iommu_dma_alloc_noncontiguous(dev, size, &sgt, gfp, prot,
 						attrs);
 	if (!pages)
@@ -762,12 +768,18 @@ static struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev,
 		size_t size, enum dma_data_direction dir, gfp_t gfp,
 		unsigned long attrs)
 {
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iova_domain *iovad = &cookie->iovad;
 	struct dma_sgt_handle *sh;
 
 	sh = kmalloc(sizeof(*sh), gfp);
 	if (!sh)
 		return NULL;
 
+	if (dev_is_untrusted(dev))
+		size = iova_align(iovad, size);
+
 	sh->pages = __iommu_dma_alloc_noncontiguous(dev, size, &sh->sgt, gfp,
 						    PAGE_KERNEL, attrs);
 	if (!sh->pages) {
@@ -780,8 +792,15 @@ static struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev,
 static void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
 		struct sg_table *sgt, enum dma_data_direction dir)
 {
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iova_domain *iovad = &cookie->iovad;
 	struct dma_sgt_handle *sh = sgt_handle(sgt);
 
+
+	if (dev_is_untrusted(dev))
+		size = iova_align(iovad, size);
+
 	__iommu_dma_unmap(dev, sgt->sgl->dma_address, size);
 	__iommu_dma_free_pages(sh->pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
 	sg_free_table(&sh->sgt);
@@ -1127,10 +1146,17 @@ static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
 
 static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr)
 {
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iova_domain *iovad = &cookie->iovad;
 	size_t alloc_size = PAGE_ALIGN(size);
-	int count = alloc_size >> PAGE_SHIFT;
+	int count;
 	struct page *page = NULL, **pages = NULL;
 
+	if (dev_is_untrusted(dev))
+		alloc_size = iova_align(iovad, alloc_size);
+	count = alloc_size >> PAGE_SHIFT;
+
 	/* Non-coherent atomic allocation? Easy */
 	if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
 	    dma_free_from_pool(dev, cpu_addr, alloc_size))
@@ -1166,12 +1192,18 @@ static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
 static void *iommu_dma_alloc_pages(struct device *dev, size_t size,
 		struct page **pagep, gfp_t gfp, unsigned long attrs)
 {
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iova_domain *iovad = &cookie->iovad;
 	bool coherent = dev_is_dma_coherent(dev);
 	size_t alloc_size = PAGE_ALIGN(size);
 	int node = dev_to_node(dev);
 	struct page *page = NULL;
 	void *cpu_addr;
 
+	if (dev_is_untrusted(dev))
+		alloc_size = iova_align(iovad, alloc_size);
+
 	page = dma_alloc_contiguous(dev, alloc_size, gfp);
 	if (!page)
 		page = alloc_pages_node(node, gfp, get_order(alloc_size));
@@ -1203,6 +1235,9 @@ static void *iommu_dma_alloc_pages(struct device *dev, size_t size,
 static void *iommu_dma_alloc(struct device *dev, size_t size,
 		dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
 {
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iova_domain *iovad = &cookie->iovad;
 	bool coherent = dev_is_dma_coherent(dev);
 	int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
 	struct page *page = NULL;
@@ -1216,6 +1251,9 @@ static void *iommu_dma_alloc(struct device *dev, size_t size,
 				dma_pgprot(dev, PAGE_KERNEL, attrs), attrs);
 	}
 
+	if (dev_is_untrusted(dev))
+		size = iova_align(iovad, size);
+
 	if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
 	    !gfpflags_allow_blocking(gfp) && !coherent)
 		page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 2/8] iommu/dma: Fail unaligned map requests for untrusted devs
  2021-08-28 15:36 [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Sven Peter
  2021-08-28 15:36 ` [PATCH v2 1/8] iommu/dma: Align size for untrusted devs to IOVA granule Sven Peter
@ 2021-08-28 15:36 ` Sven Peter
  2021-08-28 19:00   ` Sven Peter
  2021-08-28 15:36 ` [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE Sven Peter
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Sven Peter @ 2021-08-28 15:36 UTC (permalink / raw)
  To: iommu
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Robin Murphy,
	Arnd Bergmann, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

If swiotlb is enabled we should never try to create any mappings that
would expose more memory than requested to the device.
WARN_ON and refuse those mappings just in case.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/dma-iommu.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e8eae34e9e4f..d6e273ec3de6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -534,13 +534,20 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	size_t iova_off = iova_offset(iovad, phys);
+	size_t size_aligned = iova_align(iovad, size + iova_off);
 	dma_addr_t iova;
 
 	if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
 	    iommu_deferred_attach(dev, domain))
 		return DMA_MAPPING_ERROR;
 
-	size = iova_align(iovad, size + iova_off);
+	if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev)) {
+		if (WARN_ON(iova_off))
+			return DMA_MAPPING_ERROR;
+		if (WARN_ON(size_aligned != size))
+			return DMA_MAPPING_ERROR;
+	}
+	size = size_aligned;
 
 	iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
 	if (!iova)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-08-28 15:36 [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Sven Peter
  2021-08-28 15:36 ` [PATCH v2 1/8] iommu/dma: Align size for untrusted devs to IOVA granule Sven Peter
  2021-08-28 15:36 ` [PATCH v2 2/8] iommu/dma: Fail unaligned map requests for untrusted devs Sven Peter
@ 2021-08-28 15:36 ` Sven Peter
  2021-08-31 21:30   ` Alyssa Rosenzweig
  2021-08-28 15:36 ` [PATCH v2 4/8] iommu/dma: Support granule > PAGE_SIZE in dma_map_sg Sven Peter
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Sven Peter @ 2021-08-28 15:36 UTC (permalink / raw)
  To: iommu
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Robin Murphy,
	Arnd Bergmann, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

Pretend that iommu_dma_get_sgtable is not implemented when
granule > PAGE_SIZE since I can neither test this function right now
nor do I fully understand how it is used.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/dma-iommu.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d6e273ec3de6..64fbd9236820 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1315,9 +1315,15 @@ static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs)
 {
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iova_domain *iovad = &cookie->iovad;
 	struct page *page;
 	int ret;
 
+	if (iovad->granule > PAGE_SIZE)
+		return -ENXIO;
+
 	if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
 		struct page **pages = dma_common_find_pages(cpu_addr);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 4/8] iommu/dma: Support granule > PAGE_SIZE in dma_map_sg
  2021-08-28 15:36 [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Sven Peter
                   ` (2 preceding siblings ...)
  2021-08-28 15:36 ` [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE Sven Peter
@ 2021-08-28 15:36 ` Sven Peter
  2021-08-28 15:36 ` [PATCH v2 5/8] iommu/dma: Support PAGE_SIZE < iovad->granule allocations Sven Peter
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Sven Peter @ 2021-08-28 15:36 UTC (permalink / raw)
  To: iommu
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Robin Murphy,
	Arnd Bergmann, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

Add support to iommu_dma_map_sg's impedance matching to also align
sg_lists correctly when the IOMMU granule is larger than PAGE_SIZE.

Co-developed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/dma-iommu.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 64fbd9236820..a091cff5829d 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -932,8 +932,8 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
 		unsigned int s_length = sg_dma_len(s);
 		unsigned int s_iova_len = s->length;
 
-		s->offset += s_iova_off;
-		s->length = s_length;
+		sg_set_page(s, phys_to_page(sg_phys(s) + s_iova_off), s_length,
+			    s_iova_off & ~PAGE_MASK);
 		sg_dma_address(s) = DMA_MAPPING_ERROR;
 		sg_dma_len(s) = 0;
 
@@ -977,10 +977,11 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (sg_dma_address(s) != DMA_MAPPING_ERROR)
-			s->offset += sg_dma_address(s);
 		if (sg_dma_len(s))
-			s->length = sg_dma_len(s);
+			sg_set_page(s,
+				    phys_to_page(sg_phys(s) + sg_dma_address(s)),
+				    sg_dma_len(s),
+				    sg_dma_address(s) & ~PAGE_MASK);
 		sg_dma_address(s) = DMA_MAPPING_ERROR;
 		sg_dma_len(s) = 0;
 	}
@@ -1056,15 +1057,16 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	 * stashing the unaligned parts in the as-yet-unused DMA fields.
 	 */
 	for_each_sg(sg, s, nents, i) {
-		size_t s_iova_off = iova_offset(iovad, s->offset);
+		phys_addr_t s_phys = sg_phys(s);
+		size_t s_iova_off = iova_offset(iovad, s_phys);
 		size_t s_length = s->length;
 		size_t pad_len = (mask - iova_len + 1) & mask;
 
 		sg_dma_address(s) = s_iova_off;
 		sg_dma_len(s) = s_length;
-		s->offset -= s_iova_off;
 		s_length = iova_align(iovad, s_length + s_iova_off);
-		s->length = s_length;
+		sg_set_page(s, phys_to_page(s_phys - s_iova_off),
+			    s_length, s->offset & ~s_iova_off);
 
 		/*
 		 * Due to the alignment of our single IOVA allocation, we can
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 5/8] iommu/dma: Support PAGE_SIZE < iovad->granule allocations
  2021-08-28 15:36 [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Sven Peter
                   ` (3 preceding siblings ...)
  2021-08-28 15:36 ` [PATCH v2 4/8] iommu/dma: Support granule > PAGE_SIZE in dma_map_sg Sven Peter
@ 2021-08-28 15:36 ` Sven Peter
  2021-08-28 15:36 ` [PATCH v2 6/8] iommu: Move IOMMU pagesize check to attach_device Sven Peter
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Sven Peter @ 2021-08-28 15:36 UTC (permalink / raw)
  To: iommu
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Robin Murphy,
	Arnd Bergmann, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

Noncontiguous allocations must be made up of individual blocks
in a way that allows those blocks to be mapped contiguously in IOVA space.
For IOMMU page sizes larger than the CPU page size this can be done
by allocating all individual blocks from pools with
order >= get_order(iovad->granule). Some spillover pages might be
allocated at the end, which can however immediately be freed.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/dma-iommu.c | 99 +++++++++++++++++++++++++++++++++++----
 1 file changed, 89 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a091cff5829d..e57966bcfae1 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -17,6 +17,7 @@
 #include <linux/iommu.h>
 #include <linux/iova.h>
 #include <linux/irq.h>
+#include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
 #include <linux/pci.h>
@@ -618,6 +619,9 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
 {
 	struct page **pages;
 	unsigned int i = 0, nid = dev_to_node(dev);
+	unsigned int j;
+	unsigned long min_order = __fls(order_mask);
+	unsigned int min_order_size = 1U << min_order;
 
 	order_mask &= (2U << MAX_ORDER) - 1;
 	if (!order_mask)
@@ -657,15 +661,37 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
 				split_page(page, order);
 			break;
 		}
-		if (!page) {
-			__iommu_dma_free_pages(pages, i);
-			return NULL;
+
+		/*
+		 * If we have no valid page here we might be trying to allocate
+		 * the last block consisting of 1<<order pages (to guarantee
+		 * alignment) but actually need less pages than that.
+		 * In that case we just try to allocate the entire block and
+		 * directly free the spillover pages again.
+		 */
+		if (!page && !order_mask && count < min_order_size) {
+			page = alloc_pages_node(nid, gfp, min_order);
+			if (!page)
+				goto free_pages;
+			split_page(page, min_order);
+
+			for (j = count; j < min_order_size; ++j)
+				__free_page(page + j);
+
+			order_size = count;
 		}
+
+		if (!page)
+			goto free_pages;
 		count -= order_size;
 		while (order_size--)
 			pages[i++] = page++;
 	}
 	return pages;
+
+free_pages:
+	__iommu_dma_free_pages(pages, i);
+	return NULL;
 }
 
 /*
@@ -682,15 +708,26 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
 	bool coherent = dev_is_dma_coherent(dev);
 	int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
 	unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
+	struct scatterlist *last_sg;
 	struct page **pages;
 	dma_addr_t iova;
+	phys_addr_t orig_s_phys;
+	size_t orig_s_len, orig_s_off, s_iova_off, iova_size;
 
 	if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
 	    iommu_deferred_attach(dev, domain))
 		return NULL;
 
 	min_size = alloc_sizes & -alloc_sizes;
-	if (min_size < PAGE_SIZE) {
+	if (iovad->granule > PAGE_SIZE) {
+		if (size < iovad->granule) {
+			/* ensure a single contiguous allocation */
+			min_size = ALIGN(size, PAGE_SIZE*(1U<<get_order(size)));
+			alloc_sizes = min_size;
+		}
+
+		size = PAGE_ALIGN(size);
+	} else if (min_size < PAGE_SIZE) {
 		min_size = PAGE_SIZE;
 		alloc_sizes |= PAGE_SIZE;
 	} else {
@@ -705,12 +742,14 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
 	if (!pages)
 		return NULL;
 
-	size = iova_align(iovad, size);
-	iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev);
+	iova_size = iova_align(iovad, size);
+	iova = iommu_dma_alloc_iova(domain, iova_size, dev->coherent_dma_mask, dev);
 	if (!iova)
 		goto out_free_pages;
 
-	if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))
+	last_sg = __sg_alloc_table_from_pages(sgt, pages, count, 0, iova_size,
+					      UINT_MAX, NULL, 0, GFP_KERNEL);
+	if (IS_ERR(last_sg))
 		goto out_free_iova;
 
 	if (!(ioprot & IOMMU_CACHE)) {
@@ -721,18 +760,58 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
 			arch_dma_prep_coherent(sg_page(sg), sg->length);
 	}
 
+	if (iovad->granule > PAGE_SIZE) {
+		if (size < iovad->granule) {
+			/*
+			 * we only have a single sg list entry here that is
+			 * likely not aligned to iovad->granule. adjust the
+			 * entry to represent the encapsulating IOMMU page
+			 * and then later restore everything to its original
+			 * values, similar to the impedance matching done in
+			 * iommu_dma_map_sg.
+			 */
+			orig_s_phys = sg_phys(sgt->sgl);
+			orig_s_len = sgt->sgl->length;
+			orig_s_off = sgt->sgl->offset;
+			s_iova_off = iova_offset(iovad, orig_s_phys);
+
+			sg_set_page(sgt->sgl,
+				phys_to_page(orig_s_phys - s_iova_off),
+				iova_align(iovad, orig_s_len + s_iova_off),
+				sgt->sgl->offset & ~s_iova_off);
+		} else {
+			/*
+			 * convince iommu_map_sg_atomic to map the last block
+			 * even though it may be too small.
+			 */
+			orig_s_len = last_sg->length;
+			last_sg->length = iova_align(iovad, last_sg->length);
+		}
+	}
+
 	if (iommu_map_sg_atomic(domain, iova, sgt->sgl, sgt->orig_nents, ioprot)
-			< size)
+			< iova_size)
 		goto out_free_sg;
 
+	if (iovad->granule > PAGE_SIZE) {
+		if (size < iovad->granule) {
+			sg_set_page(sgt->sgl, phys_to_page(orig_s_phys),
+				orig_s_len, orig_s_off);
+
+			iova += s_iova_off;
+		} else {
+			last_sg->length = orig_s_len;
+		}
+	}
+
 	sgt->sgl->dma_address = iova;
-	sgt->sgl->dma_length = size;
+	sgt->sgl->dma_length = iova_size;
 	return pages;
 
 out_free_sg:
 	sg_free_table(sgt);
 out_free_iova:
-	iommu_dma_free_iova(cookie, iova, size, NULL);
+	iommu_dma_free_iova(cookie, iova, iova_size, NULL);
 out_free_pages:
 	__iommu_dma_free_pages(pages, count);
 	return NULL;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 6/8] iommu: Move IOMMU pagesize check to attach_device
  2021-08-28 15:36 [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Sven Peter
                   ` (4 preceding siblings ...)
  2021-08-28 15:36 ` [PATCH v2 5/8] iommu/dma: Support PAGE_SIZE < iovad->granule allocations Sven Peter
@ 2021-08-28 15:36 ` Sven Peter
  2021-08-31 21:39   ` Alyssa Rosenzweig
  2021-08-28 15:36 ` [PATCH v2 7/8] iommu: Introduce __IOMMU_DOMAIN_LP Sven Peter
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Sven Peter @ 2021-08-28 15:36 UTC (permalink / raw)
  To: iommu
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Robin Murphy,
	Arnd Bergmann, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

The iova allocator is capable of handling any granularity which is a power
of two. Remove the much stronger condition that the granularity must be
smaller or equal to the CPU page size from a BUG_ON there.
Instead, check this condition during __iommu_attach_device and fail
gracefully.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/iommu.c | 34 +++++++++++++++++++++++++++++++---
 drivers/iommu/iova.c  |  7 ++++---
 2 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b4499b1915fa..f02b727d3054 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -79,6 +79,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 						 unsigned type);
 static int __iommu_attach_device(struct iommu_domain *domain,
 				 struct device *dev);
+static void __iommu_detach_device(struct iommu_domain *domain,
+				  struct device *dev);
 static int __iommu_attach_group(struct iommu_domain *domain,
 				struct iommu_group *group);
 static void __iommu_detach_group(struct iommu_domain *domain,
@@ -1974,6 +1976,18 @@ void iommu_domain_free(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
 
+static int iommu_check_page_size(struct iommu_domain *domain)
+{
+	if (!(domain->type & __IOMMU_DOMAIN_PAGING))
+		return 0;
+
+	if ((1 << __ffs(domain->pgsize_bitmap)) > PAGE_SIZE) {
+		pr_warn("IOMMU page size cannot represent CPU pages.\n");
+		return -EFAULT;
+	}
+
+	return 0;
+}
 static int __iommu_attach_device(struct iommu_domain *domain,
 				 struct device *dev)
 {
@@ -1983,9 +1997,23 @@ static int __iommu_attach_device(struct iommu_domain *domain,
 		return -ENODEV;
 
 	ret = domain->ops->attach_dev(domain, dev);
-	if (!ret)
-		trace_attach_device_to_domain(dev);
-	return ret;
+	if (ret)
+		return ret;
+
+	/*
+	 * Check that CPU pages can be represented by the IOVA granularity.
+	 * This has to be done after ops->attach_dev since many IOMMU drivers
+	 * only limit domain->pgsize_bitmap after having attached the first
+	 * device.
+	 */
+	ret = iommu_check_page_size(domain);
+	if (ret) {
+		__iommu_detach_device(domain, dev);
+		return ret;
+	}
+
+	trace_attach_device_to_domain(dev);
+	return 0;
 }
 
 int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 0af42fb93a49..302e6dfa7cdc 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -50,10 +50,11 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,
 {
 	/*
 	 * IOVA granularity will normally be equal to the smallest
-	 * supported IOMMU page size; both *must* be capable of
-	 * representing individual CPU pages exactly.
+	 * supported IOMMU page size; while both usually are capable of
+	 * representing individual CPU pages exactly the IOVA allocator
+	 * supports any granularities that are an exact power of two.
 	 */
-	BUG_ON((granule > PAGE_SIZE) || !is_power_of_2(granule));
+	BUG_ON(!is_power_of_2(granule));
 
 	spin_lock_init(&iovad->iova_rbtree_lock);
 	iovad->rbroot = RB_ROOT;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 7/8] iommu: Introduce __IOMMU_DOMAIN_LP
  2021-08-28 15:36 [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Sven Peter
                   ` (5 preceding siblings ...)
  2021-08-28 15:36 ` [PATCH v2 6/8] iommu: Move IOMMU pagesize check to attach_device Sven Peter
@ 2021-08-28 15:36 ` Sven Peter
  2021-08-28 15:36 ` [PATCH v2 8/8] iommu/dart: Remove force_bypass logic Sven Peter
  2021-08-31 21:32 ` [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Alyssa Rosenzweig
  8 siblings, 0 replies; 24+ messages in thread
From: Sven Peter @ 2021-08-28 15:36 UTC (permalink / raw)
  To: iommu
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Robin Murphy,
	Arnd Bergmann, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

__IOMMU_DOMAIN_LP (large pages) indicates that a domain can handle
conditions where PAGE_SIZE might be smaller than the IOMMU page size.
Always allow attaching devices to such domains and set the flag for
IOMMU_DOMAIN_DMA, which can now handle these situations.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/iommu.c | 2 ++
 include/linux/iommu.h | 8 ++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f02b727d3054..77d1ee14c7d0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1980,6 +1980,8 @@ static int iommu_check_page_size(struct iommu_domain *domain)
 {
 	if (!(domain->type & __IOMMU_DOMAIN_PAGING))
 		return 0;
+	if (domain->type & __IOMMU_DOMAIN_LP)
+		return 0;
 
 	if ((1 << __ffs(domain->pgsize_bitmap)) > PAGE_SIZE) {
 		pr_warn("IOMMU page size cannot represent CPU pages.\n");
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6633040a13f9..40c1ad6be4e7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -62,6 +62,8 @@ struct iommu_domain_geometry {
 					      implementation              */
 #define __IOMMU_DOMAIN_PT	(1U << 2)  /* Domain is identity mapped   */
 #define __IOMMU_DOMAIN_DMA_FQ	(1U << 3)  /* DMA-API uses flush queue    */
+#define __IOMMU_DOMAIN_LP	(1U << 4)  /* Support for PAGE_SIZE smaller
+					      than IOMMU page size        */
 
 /*
  * This are the possible domain-types
@@ -81,10 +83,12 @@ struct iommu_domain_geometry {
 #define IOMMU_DOMAIN_IDENTITY	(__IOMMU_DOMAIN_PT)
 #define IOMMU_DOMAIN_UNMANAGED	(__IOMMU_DOMAIN_PAGING)
 #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
-				 __IOMMU_DOMAIN_DMA_API)
+				 __IOMMU_DOMAIN_DMA_API |       \
+				 __IOMMU_DOMAIN_LP)
 #define IOMMU_DOMAIN_DMA_FQ	(__IOMMU_DOMAIN_PAGING |	\
 				 __IOMMU_DOMAIN_DMA_API |	\
-				 __IOMMU_DOMAIN_DMA_FQ)
+				 __IOMMU_DOMAIN_DMA_FQ |        \
+				 __IOMMU_DOMAIN_LP)
 
 struct iommu_domain {
 	unsigned type;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 8/8] iommu/dart: Remove force_bypass logic
  2021-08-28 15:36 [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Sven Peter
                   ` (6 preceding siblings ...)
  2021-08-28 15:36 ` [PATCH v2 7/8] iommu: Introduce __IOMMU_DOMAIN_LP Sven Peter
@ 2021-08-28 15:36 ` Sven Peter
  2021-08-31 21:40   ` Alyssa Rosenzweig
  2021-08-31 21:32 ` [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Alyssa Rosenzweig
  8 siblings, 1 reply; 24+ messages in thread
From: Sven Peter @ 2021-08-28 15:36 UTC (permalink / raw)
  To: iommu
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Robin Murphy,
	Arnd Bergmann, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

Now that the dma-iommu API supports IOMMU granules which are larger than
the CPU page size and that the kernel no longer runs into a BUG_ON when
devices are attached to a domain with such a granule there's no need to
force bypass mode anymore.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/apple-dart.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 559db9259e65..c37fb4790e8a 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -90,7 +90,6 @@
  * @lock: lock for hardware operations involving this dart
  * @pgsize: pagesize supported by this DART
  * @supports_bypass: indicates if this DART supports bypass mode
- * @force_bypass: force bypass mode due to pagesize mismatch?
  * @sid2group: maps stream ids to iommu_groups
  * @iommu: iommu core device
  */
@@ -107,7 +106,6 @@ struct apple_dart {
 
 	u32 pgsize;
 	u32 supports_bypass : 1;
-	u32 force_bypass : 1;
 
 	struct iommu_group *sid2group[DART_MAX_STREAMS];
 	struct iommu_device iommu;
@@ -506,9 +504,6 @@ static int apple_dart_attach_dev(struct iommu_domain *domain,
 	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
 	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
 
-	if (cfg->stream_maps[0].dart->force_bypass &&
-	    domain->type != IOMMU_DOMAIN_IDENTITY)
-		return -EINVAL;
 	if (!cfg->stream_maps[0].dart->supports_bypass &&
 	    domain->type == IOMMU_DOMAIN_IDENTITY)
 		return -EINVAL;
@@ -638,8 +633,6 @@ static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)
 	if (cfg_dart) {
 		if (cfg_dart->supports_bypass != dart->supports_bypass)
 			return -EINVAL;
-		if (cfg_dart->force_bypass != dart->force_bypass)
-			return -EINVAL;
 		if (cfg_dart->pgsize != dart->pgsize)
 			return -EINVAL;
 	}
@@ -713,8 +706,6 @@ static int apple_dart_def_domain_type(struct device *dev)
 {
 	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
 
-	if (cfg->stream_maps[0].dart->force_bypass)
-		return IOMMU_DOMAIN_IDENTITY;
 	if (!cfg->stream_maps[0].dart->supports_bypass)
 		return IOMMU_DOMAIN_DMA;
 
@@ -844,7 +835,6 @@ static int apple_dart_probe(struct platform_device *pdev)
 	dart_params[1] = readl(dart->regs + DART_PARAMS2);
 	dart->pgsize = 1 << FIELD_GET(DART_PARAMS_PAGE_SHIFT, dart_params[0]);
 	dart->supports_bypass = dart_params[1] & DART_PARAMS_BYPASS_SUPPORT;
-	dart->force_bypass = dart->pgsize > PAGE_SIZE;
 
 	ret = request_irq(dart->irq, apple_dart_irq, IRQF_SHARED,
 			  "apple-dart fault handler", dart);
@@ -868,8 +858,8 @@ static int apple_dart_probe(struct platform_device *pdev)
 
 	dev_info(
 		&pdev->dev,
-		"DART [pagesize %x, bypass support: %d, bypass forced: %d] initialized\n",
-		dart->pgsize, dart->supports_bypass, dart->force_bypass);
+		"DART [pagesize %x, bypass support: %d] initialized\n",
+		dart->pgsize, dart->supports_bypass);
 	return 0;
 
 err_sysfs_remove:
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/8] iommu/dma: Fail unaligned map requests for untrusted devs
  2021-08-28 15:36 ` [PATCH v2 2/8] iommu/dma: Fail unaligned map requests for untrusted devs Sven Peter
@ 2021-08-28 19:00   ` Sven Peter
  0 siblings, 0 replies; 24+ messages in thread
From: Sven Peter @ 2021-08-28 19:00 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

and ofc shortly after submitting this series I realized this doesn't quite work yet:
swiotlb_tbl_map_single can return a 16KB buffer that's only aligned to a 4KB boundary. 
v3 will need at least another change to ensure that the result will be aligned to
a 16KB boundary as well.


Sven


On Sat, Aug 28, 2021, at 17:36, Sven Peter wrote:
> If swiotlb is enabled we should never try to create any mappings that
> would expose more memory than requested to the device.
> WARN_ON and refuse those mappings just in case.
> 
> Signed-off-by: Sven Peter <sven@svenpeter.dev>
> ---
>  drivers/iommu/dma-iommu.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index e8eae34e9e4f..d6e273ec3de6 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -534,13 +534,20 @@ static dma_addr_t __iommu_dma_map(struct device 
> *dev, phys_addr_t phys,
>  	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>  	struct iova_domain *iovad = &cookie->iovad;
>  	size_t iova_off = iova_offset(iovad, phys);
> +	size_t size_aligned = iova_align(iovad, size + iova_off);
>  	dma_addr_t iova;
>  
>  	if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
>  	    iommu_deferred_attach(dev, domain))
>  		return DMA_MAPPING_ERROR;
>  
> -	size = iova_align(iovad, size + iova_off);
> +	if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev)) {
> +		if (WARN_ON(iova_off))
> +			return DMA_MAPPING_ERROR;
> +		if (WARN_ON(size_aligned != size))
> +			return DMA_MAPPING_ERROR;
> +	}
> +	size = size_aligned;
>  
>  	iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
>  	if (!iova)
> -- 
> 2.25.1
> 
> 


-- 
Sven Peter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-08-28 15:36 ` [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE Sven Peter
@ 2021-08-31 21:30   ` Alyssa Rosenzweig
  2021-09-01 17:06     ` Sven Peter
  0 siblings, 1 reply; 24+ messages in thread
From: Alyssa Rosenzweig @ 2021-08-31 21:30 UTC (permalink / raw)
  To: Sven Peter
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel

I use this function for cross-device sharing on the M1 display driver.
Arguably this is unsafe but it works on 16k kernels and if you want to
test the function on 4k, you know where my code is.

On Sat, Aug 28, 2021 at 05:36:37PM +0200, Sven Peter wrote:
> Pretend that iommu_dma_get_sgtable is not implemented when
> granule > PAGE_SIZE since I can neither test this function right now
> nor do I fully understand how it is used.
> 
> Signed-off-by: Sven Peter <sven@svenpeter.dev>
> ---
>  drivers/iommu/dma-iommu.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index d6e273ec3de6..64fbd9236820 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1315,9 +1315,15 @@ static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
>  		void *cpu_addr, dma_addr_t dma_addr, size_t size,
>  		unsigned long attrs)
>  {
> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
> +	struct iommu_dma_cookie *cookie = domain->iova_cookie;
> +	struct iova_domain *iovad = &cookie->iovad;
>  	struct page *page;
>  	int ret;
>  
> +	if (iovad->granule > PAGE_SIZE)
> +		return -ENXIO;
> +
>  	if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
>  		struct page **pages = dma_common_find_pages(cpu_addr);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size
  2021-08-28 15:36 [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Sven Peter
                   ` (7 preceding siblings ...)
  2021-08-28 15:36 ` [PATCH v2 8/8] iommu/dart: Remove force_bypass logic Sven Peter
@ 2021-08-31 21:32 ` Alyssa Rosenzweig
  8 siblings, 0 replies; 24+ messages in thread
From: Alyssa Rosenzweig @ 2021-08-31 21:32 UTC (permalink / raw)
  To: Sven Peter
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel

> Some background: On the Apple M1 the IOMMUs are hardwired to only support 16 KB pages.
> We'd still like to boot Linux with 4KB pages though because that's what most distros
> ship these days. This patch series adds support for that setup to the IOMMU DMA API.

This isn't just a distro issue -- efficient x86_64 emulation will rely
on 4KB as well (Rosetta, FEX, ...). Telling distros to use 16KB kernels
for Apple parts is just kicking the can down the road -- we want this
series even for kernels we build ourselves.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/8] iommu: Move IOMMU pagesize check to attach_device
  2021-08-28 15:36 ` [PATCH v2 6/8] iommu: Move IOMMU pagesize check to attach_device Sven Peter
@ 2021-08-31 21:39   ` Alyssa Rosenzweig
  2021-09-01 17:14     ` Sven Peter
  0 siblings, 1 reply; 24+ messages in thread
From: Alyssa Rosenzweig @ 2021-08-31 21:39 UTC (permalink / raw)
  To: Sven Peter
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel

> +	if ((1 << __ffs(domain->pgsize_bitmap)) > PAGE_SIZE) {

Not a fan of this construction. Could you assign `(1 <<
__ffs(domain->pgsize_bitmap))` to an appropriately named temporary (e.g
min_io_pgsize) so it's clearer what's going on?

> +		pr_warn("IOMMU page size cannot represent CPU pages.\n");

"Represent" how?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 8/8] iommu/dart: Remove force_bypass logic
  2021-08-28 15:36 ` [PATCH v2 8/8] iommu/dart: Remove force_bypass logic Sven Peter
@ 2021-08-31 21:40   ` Alyssa Rosenzweig
  0 siblings, 0 replies; 24+ messages in thread
From: Alyssa Rosenzweig @ 2021-08-31 21:40 UTC (permalink / raw)
  To: Sven Peter
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel

> Now that the dma-iommu API supports IOMMU granules which are larger than
> the CPU page size and that the kernel no longer runs into a BUG_ON when
> devices are attached to a domain with such a granule there's no need to
> force bypass mode anymore.
> 
> Signed-off-by: Sven Peter <sven@svenpeter.dev>
> ---
>  drivers/iommu/apple-dart.c | 14 ++------------
>  1 file changed, 2 insertions(+), 12 deletions(-)

This was such an ugly hack, glad to see it go. This patch is

	Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-08-31 21:30   ` Alyssa Rosenzweig
@ 2021-09-01 17:06     ` Sven Peter
  2021-09-01 21:10       ` Alyssa Rosenzweig
  0 siblings, 1 reply; 24+ messages in thread
From: Sven Peter @ 2021-09-01 17:06 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel



On Tue, Aug 31, 2021, at 23:30, Alyssa Rosenzweig wrote:
> I use this function for cross-device sharing on the M1 display driver.
> Arguably this is unsafe but it works on 16k kernels and if you want to
> test the function on 4k, you know where my code is.
> 

My biggest issue is that I do not understand how this function is supposed
to be used correctly. It would work fine as-is if it only ever gets passed buffers
allocated by the coherent API but there's not way to check or guarantee that.
There may also be callers making assumptions that no longer hold when
iovad->granule > PAGE_SIZE.


Regarding your case: I'm not convinced the function is meant to be used there.
If I understand it correctly, your code first allocates memory with dma_alloc_coherent
(which possibly creates a sgt internally and then maps it with iommu_map_sg),
then coerces that back into a sgt with dma_get_sgtable, and then maps that sgt to
another iommu domain with dma_map_sg while assuming that the result will be contiguous
in IOVA space. It'll work out because dma_alloc_coherent is the very thing
meant to allocate pages that can be mapped into kernel and device VA space
as a single contiguous block and because both of your IOMMUs are different
instances of the same HW block. Anything allocated by dma_alloc_coherent for the
first IOMMU will have the right shape that will allow it to be mapped as
a single contiguous block for the second IOMMU.

What could be done in your case is to instead use the IOMMU API,
allocate the pages yourself (while ensuring the sgt your create is made up
of blocks with size and physaddr aligned to max(domain_a->granule, domain_b->granule))
and then just use iommu_map_sg for both domains which actually comes with the
guarantee that the result will be a single contiguous block in IOVA space and
doesn't required the sgt roundtrip.



Sven


> On Sat, Aug 28, 2021 at 05:36:37PM +0200, Sven Peter wrote:
> > Pretend that iommu_dma_get_sgtable is not implemented when
> > granule > PAGE_SIZE since I can neither test this function right now
> > nor do I fully understand how it is used.
> > 
> > Signed-off-by: Sven Peter <sven@svenpeter.dev>
> > ---
> >  drivers/iommu/dma-iommu.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index d6e273ec3de6..64fbd9236820 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -1315,9 +1315,15 @@ static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
> >  		void *cpu_addr, dma_addr_t dma_addr, size_t size,
> >  		unsigned long attrs)
> >  {
> > +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
> > +	struct iommu_dma_cookie *cookie = domain->iova_cookie;
> > +	struct iova_domain *iovad = &cookie->iovad;
> >  	struct page *page;
> >  	int ret;
> >  
> > +	if (iovad->granule > PAGE_SIZE)
> > +		return -ENXIO;
> > +
> >  	if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
> >  		struct page **pages = dma_common_find_pages(cpu_addr);
> >  
> > -- 
> > 2.25.1
> > 
> 


-- 
Sven Peter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/8] iommu: Move IOMMU pagesize check to attach_device
  2021-08-31 21:39   ` Alyssa Rosenzweig
@ 2021-09-01 17:14     ` Sven Peter
  2021-09-01 18:53       ` Robin Murphy
  0 siblings, 1 reply; 24+ messages in thread
From: Sven Peter @ 2021-09-01 17:14 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel



On Tue, Aug 31, 2021, at 23:39, Alyssa Rosenzweig wrote:
> > +	if ((1 << __ffs(domain->pgsize_bitmap)) > PAGE_SIZE) {
> 
> Not a fan of this construction. Could you assign `(1 <<
> __ffs(domain->pgsize_bitmap))` to an appropriately named temporary (e.g
> min_io_pgsize) so it's clearer what's going on?

Good point, will do that for the next version.

> 
> > +		pr_warn("IOMMU page size cannot represent CPU pages.\n");
> 
> "Represent" how?
> 

Looks like I dropped an "exactly" there when taking this line from iova.c :)



Thanks,


Sven

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/8] iommu: Move IOMMU pagesize check to attach_device
  2021-09-01 17:14     ` Sven Peter
@ 2021-09-01 18:53       ` Robin Murphy
  0 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2021-09-01 18:53 UTC (permalink / raw)
  To: Sven Peter, Alyssa Rosenzweig
  Cc: iommu, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel

On 2021-09-01 18:14, Sven Peter wrote:
> 
> 
> On Tue, Aug 31, 2021, at 23:39, Alyssa Rosenzweig wrote:
>>> +	if ((1 << __ffs(domain->pgsize_bitmap)) > PAGE_SIZE) {
>>
>> Not a fan of this construction. Could you assign `(1 <<
>> __ffs(domain->pgsize_bitmap))` to an appropriately named temporary (e.g
>> min_io_pgsize) so it's clearer what's going on?
> 
> Good point, will do that for the next version.

Or maybe just test "__ffs(domain->pgsize_bitmap) > PAGE_SHIFT", or 
perhaps even avoid shifts altogether with something like 
"domain->pgsize_bitmap & (PAGE_SIZE | PAGE_SIZE - 1)".

Robin.

> 
>>
>>> +		pr_warn("IOMMU page size cannot represent CPU pages.\n");
>>
>> "Represent" how?
>>
> 
> Looks like I dropped an "exactly" there when taking this line from iova.c :)
> 
> 
> 
> Thanks,
> 
> 
> Sven
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-09-01 17:06     ` Sven Peter
@ 2021-09-01 21:10       ` Alyssa Rosenzweig
  2021-09-02 18:19         ` Sven Peter
  0 siblings, 1 reply; 24+ messages in thread
From: Alyssa Rosenzweig @ 2021-09-01 21:10 UTC (permalink / raw)
  To: Sven Peter
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel

> My biggest issue is that I do not understand how this function is supposed
> to be used correctly. It would work fine as-is if it only ever gets passed buffers
> allocated by the coherent API but there's not way to check or guarantee that.
> There may also be callers making assumptions that no longer hold when
> iovad->granule > PAGE_SIZE.
> 
> Regarding your case: I'm not convinced the function is meant to be used there.
> If I understand it correctly, your code first allocates memory with dma_alloc_coherent
> (which possibly creates a sgt internally and then maps it with iommu_map_sg),
> then coerces that back into a sgt with dma_get_sgtable, and then maps that sgt to
> another iommu domain with dma_map_sg while assuming that the result will be contiguous
> in IOVA space. It'll work out because dma_alloc_coherent is the very thing
> meant to allocate pages that can be mapped into kernel and device VA space
> as a single contiguous block and because both of your IOMMUs are different
> instances of the same HW block. Anything allocated by dma_alloc_coherent for the
> first IOMMU will have the right shape that will allow it to be mapped as
> a single contiguous block for the second IOMMU.
> 
> What could be done in your case is to instead use the IOMMU API,
> allocate the pages yourself (while ensuring the sgt your create is made up
> of blocks with size and physaddr aligned to max(domain_a->granule, domain_b->granule))
> and then just use iommu_map_sg for both domains which actually comes with the
> guarantee that the result will be a single contiguous block in IOVA space and
> doesn't required the sgt roundtrip.

In principle I agree. I am getting the sense this function can't be used
correctly in general, and yet is the function that's meant to be used.
If my interpretation of prior LKML discussion holds, the problems are
far deeper than my code or indeed page size problems...

If the right way to handle this is with the IOMMU and IOVA APIs, I really wish
that dance were wrapped up in a safe helper function instead of open
coding it in every driver that does cross device sharing.

We might even call that helper... hmm... dma_map_sg.... *ducks*

For context for people-other-than-Sven, the code in question from my
tree appears below the break.

---------------------------------------------------------------------------------

/*
 * Allocate an IOVA contiguous buffer mapped to the DCP. The buffer need not be
 * physically contigiuous, however we should save the sgtable in case the
 * buffer needs to be later mapped for PIODMA.
 */
static bool dcpep_cb_allocate_buffer(struct apple_dcp *dcp, void *out, void *in)
{
        struct dcp_allocate_buffer_resp *resp = out;
        struct dcp_allocate_buffer_req *req = in;
        void *buf;

        resp->dva_size = ALIGN(req->size, 4096);
        resp->mem_desc_id = ++dcp->nr_mappings;

        if (resp->mem_desc_id >= ARRAY_SIZE(dcp->mappings)) {
                dev_warn(dcp->dev, "DCP overflowed mapping table, ignoring");
                return true;
        }

        buf = dma_alloc_coherent(dcp->dev, resp->dva_size, &resp->dva,
                                 GFP_KERNEL);

        dma_get_sgtable(dcp->dev, &dcp->mappings[resp->mem_desc_id], buf,
                        resp->dva, resp->dva_size);

        WARN_ON(resp->mem_desc_id == 0);
        return true;
}

/*
 * Callback to map a buffer allocated with allocate_buf for PIODMA usage.
 * PIODMA is separate from the main DCP and uses own IOVA space on a dedicated
 * stream of the display DART, rather than the expected DCP DART.
 *
 * XXX: This relies on dma_get_sgtable in concert with dma_map_sgtable, which
 * is a "fundamentally unsafe" operation according to the docs. And yet
 * everyone does it...
 */
static bool dcpep_cb_map_piodma(struct apple_dcp *dcp, void *out, void *in)
{
        struct dcp_map_buf_resp *resp = out;
        struct dcp_map_buf_req *req = in;
        struct sg_table *map;

        if (req->buffer >= ARRAY_SIZE(dcp->mappings))
                goto reject;

        map = &dcp->mappings[req->buffer];

        if (!map->sgl)
                goto reject;

        /* XNU leaks a kernel VA here, breaking kASLR. Don't do that. */
        resp->vaddr = 0;

        /* Use PIODMA device instead of DCP to map against the right IOMMU. */
        resp->ret = dma_map_sgtable(dcp->piodma, map, DMA_BIDIRECTIONAL, 0);

        if (resp->ret)
                dev_warn(dcp->dev, "failed to map for piodma %d\n", resp->ret);
        else
                resp->dva = sg_dma_address(map->sgl);

        resp->ret = 0;
        return true;

reject:
        dev_warn(dcp->dev, "denying map of invalid buffer %llx for pidoma\n",
                 req->buffer);
        resp->ret = EINVAL;
        return true;
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-09-01 21:10       ` Alyssa Rosenzweig
@ 2021-09-02 18:19         ` Sven Peter
  2021-09-02 19:42           ` Robin Murphy
  0 siblings, 1 reply; 24+ messages in thread
From: Sven Peter @ 2021-09-02 18:19 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Robin Murphy,
	Arnd Bergmann, Mohamed Mediouni, Alexander Graf, Hector Martin,
	linux-kernel



On Wed, Sep 1, 2021, at 23:10, Alyssa Rosenzweig wrote:
> > My biggest issue is that I do not understand how this function is supposed
> > to be used correctly. It would work fine as-is if it only ever gets passed buffers
> > allocated by the coherent API but there's not way to check or guarantee that.
> > There may also be callers making assumptions that no longer hold when
> > iovad->granule > PAGE_SIZE.
> > 
> > Regarding your case: I'm not convinced the function is meant to be used there.
> > If I understand it correctly, your code first allocates memory with dma_alloc_coherent
> > (which possibly creates a sgt internally and then maps it with iommu_map_sg),
> > then coerces that back into a sgt with dma_get_sgtable, and then maps that sgt to
> > another iommu domain with dma_map_sg while assuming that the result will be contiguous
> > in IOVA space. It'll work out because dma_alloc_coherent is the very thing
> > meant to allocate pages that can be mapped into kernel and device VA space
> > as a single contiguous block and because both of your IOMMUs are different
> > instances of the same HW block. Anything allocated by dma_alloc_coherent for the
> > first IOMMU will have the right shape that will allow it to be mapped as
> > a single contiguous block for the second IOMMU.
> > 
> > What could be done in your case is to instead use the IOMMU API,
> > allocate the pages yourself (while ensuring the sgt your create is made up
> > of blocks with size and physaddr aligned to max(domain_a->granule, domain_b->granule))
> > and then just use iommu_map_sg for both domains which actually comes with the
> > guarantee that the result will be a single contiguous block in IOVA space and
> > doesn't required the sgt roundtrip.
> 
> In principle I agree. I am getting the sense this function can't be used
> correctly in general, and yet is the function that's meant to be used.
> If my interpretation of prior LKML discussion holds, the problems are
> far deeper than my code or indeed page size problems...

Right, which makes reasoning about this function and its behavior if the
IOMMU pages size is unexpected very hard for me. I'm not opposed to just
keeping this function as-is when there's a mismatch between PAGE_SIZE and
the IOMMU page size (and it will probably work that way) but I'd like to
be sure that won't introduce unexpected behavior. 

> 
> If the right way to handle this is with the IOMMU and IOVA APIs, I really wish
> that dance were wrapped up in a safe helper function instead of open
> coding it in every driver that does cross device sharing.
> 
> We might even call that helper... hmm... dma_map_sg.... *ducks*
> 

There might be another way to do this correctly. I'm likely just a little
bit biased because I've spent the past weeks wrapping my head around the
IOMMU and DMA APIs and when all you have is a hammer everything looks like
a nail.

But dma_map_sg operates at the DMA API level and at that point the dma-ops
for two different devices could be vastly different. 
In the worst case one of them could be behind an IOMMU that can easily map
non-contiguous pages while the other one is directly connected to the bus and
can't even access >4G pages without swiotlb support.
It's really only possible to guarantee that it will map N buffers to <= N
DMA-addressable buffers (possibly by using an IOMMU or swiotlb internally) at
that point.

On the IOMMU API level you have much more information available about the actual
hardware and can prepare the buffers in a way that makes both devices happy.
That's why iommu_map_sgtable combined with iovad->granule aligned sgt entries
can actually guarantee to map the entire list to a single contiguous IOVA block.


Sven

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-09-02 18:19         ` Sven Peter
@ 2021-09-02 19:42           ` Robin Murphy
  2021-09-03 13:11             ` Alyssa Rosenzweig
  2021-09-03 15:16             ` Sven Peter
  0 siblings, 2 replies; 24+ messages in thread
From: Robin Murphy @ 2021-09-02 19:42 UTC (permalink / raw)
  To: Sven Peter, Alyssa Rosenzweig
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel

On 2021-09-02 19:19, Sven Peter wrote:
> 
> 
> On Wed, Sep 1, 2021, at 23:10, Alyssa Rosenzweig wrote:
>>> My biggest issue is that I do not understand how this function is supposed
>>> to be used correctly. It would work fine as-is if it only ever gets passed buffers
>>> allocated by the coherent API but there's not way to check or guarantee that.
>>> There may also be callers making assumptions that no longer hold when
>>> iovad->granule > PAGE_SIZE.
>>>
>>> Regarding your case: I'm not convinced the function is meant to be used there.
>>> If I understand it correctly, your code first allocates memory with dma_alloc_coherent
>>> (which possibly creates a sgt internally and then maps it with iommu_map_sg),
>>> then coerces that back into a sgt with dma_get_sgtable, and then maps that sgt to
>>> another iommu domain with dma_map_sg while assuming that the result will be contiguous
>>> in IOVA space. It'll work out because dma_alloc_coherent is the very thing
>>> meant to allocate pages that can be mapped into kernel and device VA space
>>> as a single contiguous block and because both of your IOMMUs are different
>>> instances of the same HW block. Anything allocated by dma_alloc_coherent for the
>>> first IOMMU will have the right shape that will allow it to be mapped as
>>> a single contiguous block for the second IOMMU.
>>>
>>> What could be done in your case is to instead use the IOMMU API,
>>> allocate the pages yourself (while ensuring the sgt your create is made up
>>> of blocks with size and physaddr aligned to max(domain_a->granule, domain_b->granule))
>>> and then just use iommu_map_sg for both domains which actually comes with the
>>> guarantee that the result will be a single contiguous block in IOVA space and
>>> doesn't required the sgt roundtrip.
>>
>> In principle I agree. I am getting the sense this function can't be used
>> correctly in general, and yet is the function that's meant to be used.
>> If my interpretation of prior LKML discussion holds, the problems are
>> far deeper than my code or indeed page size problems...
> 
> Right, which makes reasoning about this function and its behavior if the
> IOMMU pages size is unexpected very hard for me. I'm not opposed to just
> keeping this function as-is when there's a mismatch between PAGE_SIZE and
> the IOMMU page size (and it will probably work that way) but I'd like to
> be sure that won't introduce unexpected behavior.
> 
>>
>> If the right way to handle this is with the IOMMU and IOVA APIs, I really wish
>> that dance were wrapped up in a safe helper function instead of open
>> coding it in every driver that does cross device sharing.
>>
>> We might even call that helper... hmm... dma_map_sg.... *ducks*
>>
> 
> There might be another way to do this correctly. I'm likely just a little
> bit biased because I've spent the past weeks wrapping my head around the
> IOMMU and DMA APIs and when all you have is a hammer everything looks like
> a nail.
> 
> But dma_map_sg operates at the DMA API level and at that point the dma-ops
> for two different devices could be vastly different.
> In the worst case one of them could be behind an IOMMU that can easily map
> non-contiguous pages while the other one is directly connected to the bus and
> can't even access >4G pages without swiotlb support.
> It's really only possible to guarantee that it will map N buffers to <= N
> DMA-addressable buffers (possibly by using an IOMMU or swiotlb internally) at
> that point.
> 
> On the IOMMU API level you have much more information available about the actual
> hardware and can prepare the buffers in a way that makes both devices happy.
> That's why iommu_map_sgtable combined with iovad->granule aligned sgt entries
> can actually guarantee to map the entire list to a single contiguous IOVA block.

Essentially there are two reasonable options, and doing pretend dma-buf 
export/import between two devices effectively owned by the same driver 
is neither of them. Handily, DRM happens to be exactly where all the 
precedent is, too; unsurprisingly this is not a new concern.

One is to go full IOMMU API, like rockchip or tegra, attaching the 
relevant devices to your own unmanaged domain(s) and mapping pages 
exactly where you choose. You still make dma_map/dma_unmap calls for the 
sake of cache maintenance and other housekeeping on the underlying 
memory, but you ignore the provided DMA addresses in favour of your own 
IOVAs when it comes to programming the devices.

The lazier option if you can rely on all relevant devices having equal 
DMA and IOMMU capabilities is to follow exynos, and herd the devices 
into a common default domain. Instead of allocating you own domain, you 
grab the current domain for one device (which will be its default 
domain) and manually attach the other devices to that. Then you forget 
all about IOMMUs but make sure to do all your regular DMA API calls 
using that first device, and the DMA addresses which come back should be 
magically valid for the other devices too. It was a bit of a cheeky hack 
TBH, but I'd still much prefer more of that over any usage of 
get_sgtable outside of actual dma-buf...

Note that where multiple IOMMU instances are involved, the latter 
approach does depend on the IOMMU driver being able to support sharing a 
single domain across them; I think that might sort-of-work for DART 
already, but may need a little more attention.

Robin.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-09-02 19:42           ` Robin Murphy
@ 2021-09-03 13:11             ` Alyssa Rosenzweig
  2021-09-03 15:16             ` Sven Peter
  1 sibling, 0 replies; 24+ messages in thread
From: Alyssa Rosenzweig @ 2021-09-03 13:11 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Sven Peter, Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel

> > On the IOMMU API level you have much more information available about the actual
> > hardware and can prepare the buffers in a way that makes both devices happy.
> > That's why iommu_map_sgtable combined with iovad->granule aligned sgt entries
> > can actually guarantee to map the entire list to a single contiguous IOVA block.
> 
> Essentially there are two reasonable options, and doing pretend dma-buf
> export/import between two devices effectively owned by the same driver is
> neither of them. Handily, DRM happens to be exactly where all the precedent
> is, too; unsurprisingly this is not a new concern.
> 
> One is to go full IOMMU API, like rockchip or tegra, attaching the relevant
> devices to your own unmanaged domain(s) and mapping pages exactly where you
> choose. You still make dma_map/dma_unmap calls for the sake of cache
> maintenance and other housekeeping on the underlying memory, but you ignore
> the provided DMA addresses in favour of your own IOVAs when it comes to
> programming the devices.

I guess this is the way to go for DCP.

> The lazier option if you can rely on all relevant devices having equal DMA
> and IOMMU capabilities is to follow exynos, and herd the devices into a
> common default domain. Instead of allocating you own domain, you grab the
> current domain for one device (which will be its default domain) and
> manually attach the other devices to that. Then you forget all about IOMMUs
> but make sure to do all your regular DMA API calls using that first device,
> and the DMA addresses which come back should be magically valid for the
> other devices too. It was a bit of a cheeky hack TBH, but I'd still much
> prefer more of that over any usage of get_sgtable outside of actual
> dma-buf...

It'd probably be *possible* to get away with this for DCP but it'd
probably involve more hacks, since the DARTs are not 100% symmetric and
there are some contraints on the different DARTs involved.

It'd also be less desirable -- there is no reason for the display
coprocessor to know the actual *contents* of the framebuffer, only the
IOVA valid only for the actual display hardware. These are two devices
in hardware with two independent DARTs, by modeling as such we reduce
the amount we need to trust the coprocessor firmware blob.

> Note that where multiple IOMMU instances are involved, the latter approach
> does depend on the IOMMU driver being able to support sharing a single
> domain across them; I think that might sort-of-work for DART already, but
> may need a little more attention.

I think this already works (for USB-C).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-09-02 19:42           ` Robin Murphy
  2021-09-03 13:11             ` Alyssa Rosenzweig
@ 2021-09-03 15:16             ` Sven Peter
  2021-09-03 15:45               ` Robin Murphy
  1 sibling, 1 reply; 24+ messages in thread
From: Sven Peter @ 2021-09-03 15:16 UTC (permalink / raw)
  To: Robin Murphy, Alyssa Rosenzweig
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel



On Thu, Sep 2, 2021, at 21:42, Robin Murphy wrote:
> On 2021-09-02 19:19, Sven Peter wrote:
> > 
> > 
> > On Wed, Sep 1, 2021, at 23:10, Alyssa Rosenzweig wrote:
> >>> My biggest issue is that I do not understand how this function is supposed
> >>> to be used correctly. It would work fine as-is if it only ever gets passed buffers
> >>> allocated by the coherent API but there's not way to check or guarantee that.
> >>> There may also be callers making assumptions that no longer hold when
> >>> iovad->granule > PAGE_SIZE.
> >>>
> >>> Regarding your case: I'm not convinced the function is meant to be used there.
> >>> If I understand it correctly, your code first allocates memory with dma_alloc_coherent
> >>> (which possibly creates a sgt internally and then maps it with iommu_map_sg),
> >>> then coerces that back into a sgt with dma_get_sgtable, and then maps that sgt to
> >>> another iommu domain with dma_map_sg while assuming that the result will be contiguous
> >>> in IOVA space. It'll work out because dma_alloc_coherent is the very thing
> >>> meant to allocate pages that can be mapped into kernel and device VA space
> >>> as a single contiguous block and because both of your IOMMUs are different
> >>> instances of the same HW block. Anything allocated by dma_alloc_coherent for the
> >>> first IOMMU will have the right shape that will allow it to be mapped as
> >>> a single contiguous block for the second IOMMU.
> >>>
> >>> What could be done in your case is to instead use the IOMMU API,
> >>> allocate the pages yourself (while ensuring the sgt your create is made up
> >>> of blocks with size and physaddr aligned to max(domain_a->granule, domain_b->granule))
> >>> and then just use iommu_map_sg for both domains which actually comes with the
> >>> guarantee that the result will be a single contiguous block in IOVA space and
> >>> doesn't required the sgt roundtrip.
> >>
> >> In principle I agree. I am getting the sense this function can't be used
> >> correctly in general, and yet is the function that's meant to be used.
> >> If my interpretation of prior LKML discussion holds, the problems are
> >> far deeper than my code or indeed page size problems...
> > 
> > Right, which makes reasoning about this function and its behavior if the
> > IOMMU pages size is unexpected very hard for me. I'm not opposed to just
> > keeping this function as-is when there's a mismatch between PAGE_SIZE and
> > the IOMMU page size (and it will probably work that way) but I'd like to
> > be sure that won't introduce unexpected behavior.
> > 
> >>
> >> If the right way to handle this is with the IOMMU and IOVA APIs, I really wish
> >> that dance were wrapped up in a safe helper function instead of open
> >> coding it in every driver that does cross device sharing.
> >>
> >> We might even call that helper... hmm... dma_map_sg.... *ducks*
> >>
> > 
> > There might be another way to do this correctly. I'm likely just a little
> > bit biased because I've spent the past weeks wrapping my head around the
> > IOMMU and DMA APIs and when all you have is a hammer everything looks like
> > a nail.
> > 
> > But dma_map_sg operates at the DMA API level and at that point the dma-ops
> > for two different devices could be vastly different.
> > In the worst case one of them could be behind an IOMMU that can easily map
> > non-contiguous pages while the other one is directly connected to the bus and
> > can't even access >4G pages without swiotlb support.
> > It's really only possible to guarantee that it will map N buffers to <= N
> > DMA-addressable buffers (possibly by using an IOMMU or swiotlb internally) at
> > that point.
> > 
> > On the IOMMU API level you have much more information available about the actual
> > hardware and can prepare the buffers in a way that makes both devices happy.
> > That's why iommu_map_sgtable combined with iovad->granule aligned sgt entries
> > can actually guarantee to map the entire list to a single contiguous IOVA block.
> 
> Essentially there are two reasonable options, and doing pretend dma-buf 
> export/import between two devices effectively owned by the same driver 
> is neither of them. Handily, DRM happens to be exactly where all the 
> precedent is, too; unsurprisingly this is not a new concern.
> 
> One is to go full IOMMU API, like rockchip or tegra, attaching the 
> relevant devices to your own unmanaged domain(s) and mapping pages 
> exactly where you choose. You still make dma_map/dma_unmap calls for the 
> sake of cache maintenance and other housekeeping on the underlying 
> memory, but you ignore the provided DMA addresses in favour of your own 
> IOVAs when it comes to programming the devices.
> 
> The lazier option if you can rely on all relevant devices having equal 
> DMA and IOMMU capabilities is to follow exynos, and herd the devices 
> into a common default domain. Instead of allocating you own domain, you 
> grab the current domain for one device (which will be its default 
> domain) and manually attach the other devices to that. Then you forget 
> all about IOMMUs but make sure to do all your regular DMA API calls 
> using that first device, and the DMA addresses which come back should be 
> magically valid for the other devices too. It was a bit of a cheeky hack 
> TBH, but I'd still much prefer more of that over any usage of 
> get_sgtable outside of actual dma-buf...
> 
> Note that where multiple IOMMU instances are involved, the latter 
> approach does depend on the IOMMU driver being able to support sharing a 
> single domain across them; I think that might sort-of-work for DART 
> already, but may need a little more attention.

It'll work for two streams inside the same DART but needs some
attention for streams from two separate DARTs.

Then there's also this amazing "feature" that the display controller DART
pagetable pointer register is read-only so that we have to reuse the memory
Apple configured for first level table. That needs some changes anyway
but might make adding multiple devices from different groups more complex.



Sven

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-09-03 15:16             ` Sven Peter
@ 2021-09-03 15:45               ` Robin Murphy
  2021-09-03 16:51                 ` Sven Peter
  0 siblings, 1 reply; 24+ messages in thread
From: Robin Murphy @ 2021-09-03 15:45 UTC (permalink / raw)
  To: Sven Peter, Alyssa Rosenzweig
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel

On 2021-09-03 16:16, Sven Peter wrote:
> 
> 
> On Thu, Sep 2, 2021, at 21:42, Robin Murphy wrote:
>> On 2021-09-02 19:19, Sven Peter wrote:
>>>
>>>
>>> On Wed, Sep 1, 2021, at 23:10, Alyssa Rosenzweig wrote:
>>>>> My biggest issue is that I do not understand how this function is supposed
>>>>> to be used correctly. It would work fine as-is if it only ever gets passed buffers
>>>>> allocated by the coherent API but there's not way to check or guarantee that.
>>>>> There may also be callers making assumptions that no longer hold when
>>>>> iovad->granule > PAGE_SIZE.
>>>>>
>>>>> Regarding your case: I'm not convinced the function is meant to be used there.
>>>>> If I understand it correctly, your code first allocates memory with dma_alloc_coherent
>>>>> (which possibly creates a sgt internally and then maps it with iommu_map_sg),
>>>>> then coerces that back into a sgt with dma_get_sgtable, and then maps that sgt to
>>>>> another iommu domain with dma_map_sg while assuming that the result will be contiguous
>>>>> in IOVA space. It'll work out because dma_alloc_coherent is the very thing
>>>>> meant to allocate pages that can be mapped into kernel and device VA space
>>>>> as a single contiguous block and because both of your IOMMUs are different
>>>>> instances of the same HW block. Anything allocated by dma_alloc_coherent for the
>>>>> first IOMMU will have the right shape that will allow it to be mapped as
>>>>> a single contiguous block for the second IOMMU.
>>>>>
>>>>> What could be done in your case is to instead use the IOMMU API,
>>>>> allocate the pages yourself (while ensuring the sgt your create is made up
>>>>> of blocks with size and physaddr aligned to max(domain_a->granule, domain_b->granule))
>>>>> and then just use iommu_map_sg for both domains which actually comes with the
>>>>> guarantee that the result will be a single contiguous block in IOVA space and
>>>>> doesn't required the sgt roundtrip.
>>>>
>>>> In principle I agree. I am getting the sense this function can't be used
>>>> correctly in general, and yet is the function that's meant to be used.
>>>> If my interpretation of prior LKML discussion holds, the problems are
>>>> far deeper than my code or indeed page size problems...
>>>
>>> Right, which makes reasoning about this function and its behavior if the
>>> IOMMU pages size is unexpected very hard for me. I'm not opposed to just
>>> keeping this function as-is when there's a mismatch between PAGE_SIZE and
>>> the IOMMU page size (and it will probably work that way) but I'd like to
>>> be sure that won't introduce unexpected behavior.
>>>
>>>>
>>>> If the right way to handle this is with the IOMMU and IOVA APIs, I really wish
>>>> that dance were wrapped up in a safe helper function instead of open
>>>> coding it in every driver that does cross device sharing.
>>>>
>>>> We might even call that helper... hmm... dma_map_sg.... *ducks*
>>>>
>>>
>>> There might be another way to do this correctly. I'm likely just a little
>>> bit biased because I've spent the past weeks wrapping my head around the
>>> IOMMU and DMA APIs and when all you have is a hammer everything looks like
>>> a nail.
>>>
>>> But dma_map_sg operates at the DMA API level and at that point the dma-ops
>>> for two different devices could be vastly different.
>>> In the worst case one of them could be behind an IOMMU that can easily map
>>> non-contiguous pages while the other one is directly connected to the bus and
>>> can't even access >4G pages without swiotlb support.
>>> It's really only possible to guarantee that it will map N buffers to <= N
>>> DMA-addressable buffers (possibly by using an IOMMU or swiotlb internally) at
>>> that point.
>>>
>>> On the IOMMU API level you have much more information available about the actual
>>> hardware and can prepare the buffers in a way that makes both devices happy.
>>> That's why iommu_map_sgtable combined with iovad->granule aligned sgt entries
>>> can actually guarantee to map the entire list to a single contiguous IOVA block.
>>
>> Essentially there are two reasonable options, and doing pretend dma-buf
>> export/import between two devices effectively owned by the same driver
>> is neither of them. Handily, DRM happens to be exactly where all the
>> precedent is, too; unsurprisingly this is not a new concern.
>>
>> One is to go full IOMMU API, like rockchip or tegra, attaching the
>> relevant devices to your own unmanaged domain(s) and mapping pages
>> exactly where you choose. You still make dma_map/dma_unmap calls for the
>> sake of cache maintenance and other housekeeping on the underlying
>> memory, but you ignore the provided DMA addresses in favour of your own
>> IOVAs when it comes to programming the devices.
>>
>> The lazier option if you can rely on all relevant devices having equal
>> DMA and IOMMU capabilities is to follow exynos, and herd the devices
>> into a common default domain. Instead of allocating you own domain, you
>> grab the current domain for one device (which will be its default
>> domain) and manually attach the other devices to that. Then you forget
>> all about IOMMUs but make sure to do all your regular DMA API calls
>> using that first device, and the DMA addresses which come back should be
>> magically valid for the other devices too. It was a bit of a cheeky hack
>> TBH, but I'd still much prefer more of that over any usage of
>> get_sgtable outside of actual dma-buf...
>>
>> Note that where multiple IOMMU instances are involved, the latter
>> approach does depend on the IOMMU driver being able to support sharing a
>> single domain across them; I think that might sort-of-work for DART
>> already, but may need a little more attention.
> 
> It'll work for two streams inside the same DART but needs some
> attention for streams from two separate DARTs.
> 
> Then there's also this amazing "feature" that the display controller DART
> pagetable pointer register is read-only so that we have to reuse the memory
> Apple configured for first level table. That needs some changes anyway
> but might make adding multiple devices from different groups more complex.

OK, I was thinking the dual-DART accommodation is already at least some 
of the way there, but I guess it's still tied to a single device's cfg. 
One upside to generalising further might be that the dual-DART case 
stops being particularly special :)

Not being able to physically share pagetables shouldn't be too big a 
deal, just a bit more work to sync iommu_map/iommu_unmap calls across 
all the relevant instances for the given domain.

Robin.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-09-03 15:45               ` Robin Murphy
@ 2021-09-03 16:51                 ` Sven Peter
  0 siblings, 0 replies; 24+ messages in thread
From: Sven Peter @ 2021-09-03 16:51 UTC (permalink / raw)
  To: Robin Murphy, Alyssa Rosenzweig
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Mohamed Mediouni, Alexander Graf, Hector Martin, linux-kernel



On Fri, Sep 3, 2021, at 17:45, Robin Murphy wrote:
> On 2021-09-03 16:16, Sven Peter wrote:
> > 
> > 
> > On Thu, Sep 2, 2021, at 21:42, Robin Murphy wrote:
> >> On 2021-09-02 19:19, Sven Peter wrote:
> >>>
> >>>
> >>> On Wed, Sep 1, 2021, at 23:10, Alyssa Rosenzweig wrote:
> >>>>> My biggest issue is that I do not understand how this function is supposed
> >>>>> to be used correctly. It would work fine as-is if it only ever gets passed buffers
> >>>>> allocated by the coherent API but there's not way to check or guarantee that.
> >>>>> There may also be callers making assumptions that no longer hold when
> >>>>> iovad->granule > PAGE_SIZE.
> >>>>>
> >>>>> Regarding your case: I'm not convinced the function is meant to be used there.
> >>>>> If I understand it correctly, your code first allocates memory with dma_alloc_coherent
> >>>>> (which possibly creates a sgt internally and then maps it with iommu_map_sg),
> >>>>> then coerces that back into a sgt with dma_get_sgtable, and then maps that sgt to
> >>>>> another iommu domain with dma_map_sg while assuming that the result will be contiguous
> >>>>> in IOVA space. It'll work out because dma_alloc_coherent is the very thing
> >>>>> meant to allocate pages that can be mapped into kernel and device VA space
> >>>>> as a single contiguous block and because both of your IOMMUs are different
> >>>>> instances of the same HW block. Anything allocated by dma_alloc_coherent for the
> >>>>> first IOMMU will have the right shape that will allow it to be mapped as
> >>>>> a single contiguous block for the second IOMMU.
> >>>>>
> >>>>> What could be done in your case is to instead use the IOMMU API,
> >>>>> allocate the pages yourself (while ensuring the sgt your create is made up
> >>>>> of blocks with size and physaddr aligned to max(domain_a->granule, domain_b->granule))
> >>>>> and then just use iommu_map_sg for both domains which actually comes with the
> >>>>> guarantee that the result will be a single contiguous block in IOVA space and
> >>>>> doesn't required the sgt roundtrip.
> >>>>
> >>>> In principle I agree. I am getting the sense this function can't be used
> >>>> correctly in general, and yet is the function that's meant to be used.
> >>>> If my interpretation of prior LKML discussion holds, the problems are
> >>>> far deeper than my code or indeed page size problems...
> >>>
> >>> Right, which makes reasoning about this function and its behavior if the
> >>> IOMMU pages size is unexpected very hard for me. I'm not opposed to just
> >>> keeping this function as-is when there's a mismatch between PAGE_SIZE and
> >>> the IOMMU page size (and it will probably work that way) but I'd like to
> >>> be sure that won't introduce unexpected behavior.
> >>>
> >>>>
> >>>> If the right way to handle this is with the IOMMU and IOVA APIs, I really wish
> >>>> that dance were wrapped up in a safe helper function instead of open
> >>>> coding it in every driver that does cross device sharing.
> >>>>
> >>>> We might even call that helper... hmm... dma_map_sg.... *ducks*
> >>>>
> >>>
> >>> There might be another way to do this correctly. I'm likely just a little
> >>> bit biased because I've spent the past weeks wrapping my head around the
> >>> IOMMU and DMA APIs and when all you have is a hammer everything looks like
> >>> a nail.
> >>>
> >>> But dma_map_sg operates at the DMA API level and at that point the dma-ops
> >>> for two different devices could be vastly different.
> >>> In the worst case one of them could be behind an IOMMU that can easily map
> >>> non-contiguous pages while the other one is directly connected to the bus and
> >>> can't even access >4G pages without swiotlb support.
> >>> It's really only possible to guarantee that it will map N buffers to <= N
> >>> DMA-addressable buffers (possibly by using an IOMMU or swiotlb internally) at
> >>> that point.
> >>>
> >>> On the IOMMU API level you have much more information available about the actual
> >>> hardware and can prepare the buffers in a way that makes both devices happy.
> >>> That's why iommu_map_sgtable combined with iovad->granule aligned sgt entries
> >>> can actually guarantee to map the entire list to a single contiguous IOVA block.
> >>
> >> Essentially there are two reasonable options, and doing pretend dma-buf
> >> export/import between two devices effectively owned by the same driver
> >> is neither of them. Handily, DRM happens to be exactly where all the
> >> precedent is, too; unsurprisingly this is not a new concern.
> >>
> >> One is to go full IOMMU API, like rockchip or tegra, attaching the
> >> relevant devices to your own unmanaged domain(s) and mapping pages
> >> exactly where you choose. You still make dma_map/dma_unmap calls for the
> >> sake of cache maintenance and other housekeeping on the underlying
> >> memory, but you ignore the provided DMA addresses in favour of your own
> >> IOVAs when it comes to programming the devices.
> >>
> >> The lazier option if you can rely on all relevant devices having equal
> >> DMA and IOMMU capabilities is to follow exynos, and herd the devices
> >> into a common default domain. Instead of allocating you own domain, you
> >> grab the current domain for one device (which will be its default
> >> domain) and manually attach the other devices to that. Then you forget
> >> all about IOMMUs but make sure to do all your regular DMA API calls
> >> using that first device, and the DMA addresses which come back should be
> >> magically valid for the other devices too. It was a bit of a cheeky hack
> >> TBH, but I'd still much prefer more of that over any usage of
> >> get_sgtable outside of actual dma-buf...
> >>
> >> Note that where multiple IOMMU instances are involved, the latter
> >> approach does depend on the IOMMU driver being able to support sharing a
> >> single domain across them; I think that might sort-of-work for DART
> >> already, but may need a little more attention.
> > 
> > It'll work for two streams inside the same DART but needs some
> > attention for streams from two separate DARTs.
> > 
> > Then there's also this amazing "feature" that the display controller DART
> > pagetable pointer register is read-only so that we have to reuse the memory
> > Apple configured for first level table. That needs some changes anyway
> > but might make adding multiple devices from different groups more complex.
> 
> OK, I was thinking the dual-DART accommodation is already at least some 
> of the way there, but I guess it's still tied to a single device's cfg. 

Pretty much. I think "needing a little more attention" describes it pretty
well :)


> One upside to generalising further might be that the dual-DART case 
> stops being particularly special :)
> 
> Not being able to physically share pagetables shouldn't be too big a 
> deal, just a bit more work to sync iommu_map/iommu_unmap calls across 
> all the relevant instances for the given domain.

True, it's just a bit more bookkeeping in the end.



Sven

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-09-03 16:52 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-28 15:36 [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Sven Peter
2021-08-28 15:36 ` [PATCH v2 1/8] iommu/dma: Align size for untrusted devs to IOVA granule Sven Peter
2021-08-28 15:36 ` [PATCH v2 2/8] iommu/dma: Fail unaligned map requests for untrusted devs Sven Peter
2021-08-28 19:00   ` Sven Peter
2021-08-28 15:36 ` [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE Sven Peter
2021-08-31 21:30   ` Alyssa Rosenzweig
2021-09-01 17:06     ` Sven Peter
2021-09-01 21:10       ` Alyssa Rosenzweig
2021-09-02 18:19         ` Sven Peter
2021-09-02 19:42           ` Robin Murphy
2021-09-03 13:11             ` Alyssa Rosenzweig
2021-09-03 15:16             ` Sven Peter
2021-09-03 15:45               ` Robin Murphy
2021-09-03 16:51                 ` Sven Peter
2021-08-28 15:36 ` [PATCH v2 4/8] iommu/dma: Support granule > PAGE_SIZE in dma_map_sg Sven Peter
2021-08-28 15:36 ` [PATCH v2 5/8] iommu/dma: Support PAGE_SIZE < iovad->granule allocations Sven Peter
2021-08-28 15:36 ` [PATCH v2 6/8] iommu: Move IOMMU pagesize check to attach_device Sven Peter
2021-08-31 21:39   ` Alyssa Rosenzweig
2021-09-01 17:14     ` Sven Peter
2021-09-01 18:53       ` Robin Murphy
2021-08-28 15:36 ` [PATCH v2 7/8] iommu: Introduce __IOMMU_DOMAIN_LP Sven Peter
2021-08-28 15:36 ` [PATCH v2 8/8] iommu/dart: Remove force_bypass logic Sven Peter
2021-08-31 21:40   ` Alyssa Rosenzweig
2021-08-31 21:32 ` [PATCH v2 0/8] Support IOMMU page sizes larger than the CPU page size Alyssa Rosenzweig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).