LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC 0/7] Support in-kernel DMA with PASID and SVA
@ 2021-09-21 20:29 Jacob Pan
  2021-09-21 20:29 ` [RFC 1/7] ioasid: reserve special PASID for in-kernel DMA Jacob Pan
                   ` (9 more replies)
  0 siblings, 10 replies; 29+ messages in thread
From: Jacob Pan @ 2021-09-21 20:29 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig
  Cc: Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

Hi Joerg/Jason/Christoph et all,

The current in-kernel supervisor PASID support is based on the SVM/SVA
machinery in sva-lib. Kernel SVA is achieved by extending a special flag
to indicate the binding of the device and a page table should be performed
on init_mm instead of the mm of the current process.Page requests and other
differences between user and kernel SVA are handled as special cases.

This unrestricted binding with the kernel page table is being challenged
for security and the convention that in-kernel DMA must be compatible with
DMA APIs.
(https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/)
There is also the lack of IOTLB synchronization upon kernel page table updates.

This patchset is trying to address these concerns by having an explicit DMA
API compatible model while continue to support in-kernel use of DMA requests
with PASID. Specifically, the following DMA-IOMMU APIs are introduced:

int iommu_dma_pasid_enable/disable(struct device *dev,
				   struct iommu_domain **domain,
				   enum iommu_dma_pasid_mode mode);
int iommu_map/unmap_kva(struct iommu_domain *domain,
			void *cpu_addr,size_t size, int prot);

The following three addressing modes are supported with example API usages
by device drivers.

1. Physical address (bypass) mode. Similar to DMA direct where trusted devices
can DMA pass through IOMMU on a per PASID basis.
Example:
	pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS);
	/* Use the returning PASID and PA for work submission */

2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as the
PCI requester ID (RID)
Example:
	pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA);
	/* Use the PASID and DMA API allocated IOVA for work submission */

3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes
transparently based on device trustfulness.
Example:
	pasid = iommu_dma_pasid_enable(dev, &domain, IOMMU_DMA_PASID_KVA);
	iommu_map_kva(domain, &buf, size, prot);
	/* Use the returned PASID and KVA to submit work */
Where:
	Fast mode: Shared CPU page tables for trusted devices only
	Strict mode: IOMMU domain returned for the untrusted device to
	replicate KVA-PA mapping in IOMMU page tables.

On a per device basis, DMA address and performance modes are enabled by the
device drivers. Platform information such as trustability, user command line
input (not included in this set) could also be taken into consideration (not
implemented in this RFC).

This RFC is intended to communicate the API directions. Little testing is done
outside IDXD and DMA engine tests.

For PA and IOVA modes, the implementation is straightforward and tested with
Intel IDXD driver. But several opens remain in KVA fast mode thus not tested:
1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a
result of module loading/eBPF load. Adding kernel mmu notifier?
2. The use of the auxiliary domain for KVA map, will aux domain stay in the
long term? Is there another way to represent sub-device granu isolation?
3. Is limiting the KVA sharing to the direct map range reasonable and
practical for all architectures?


Many thanks to Ashok Raj, Kevin Tian, and Baolu who provided feedback and
many ideas in this set.

Thanks,

Jacob

Jacob Pan (7):
  ioasid: reserve special PASID for in-kernel DMA
  dma-iommu: Add API for DMA request with PASID
  iommu/vt-d: Add DMA w/ PASID support for PA and IOVA
  dma-iommu: Add support for DMA w/ PASID in KVA
  iommu/vt-d: Add support for KVA PASID mode
  iommu: Add KVA map API
  dma/idxd: Use dma-iommu PASID API instead of SVA lib

 drivers/dma/idxd/idxd.h                       |   4 +-
 drivers/dma/idxd/init.c                       |  36 ++--
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +-
 drivers/iommu/dma-iommu.c                     | 123 +++++++++++++-
 drivers/iommu/intel/iommu.c                   | 154 +++++++++++++++++-
 drivers/iommu/ioasid.c                        |   2 +
 drivers/iommu/iommu-sva-lib.c                 |   1 +
 drivers/iommu/iommu.c                         |  63 +++++++
 include/linux/dma-iommu.h                     |  14 ++
 include/linux/intel-iommu.h                   |   7 +-
 include/linux/ioasid.h                        |   4 +
 include/linux/iommu.h                         |  13 ++
 12 files changed, 390 insertions(+), 33 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC 1/7] ioasid: reserve special PASID for in-kernel DMA
  2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
@ 2021-09-21 20:29 ` Jacob Pan
  2021-09-21 20:29 ` [RFC 2/7] dma-iommu: Add API for DMA request with PASID Jacob Pan
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Jacob Pan @ 2021-09-21 20:29 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig
  Cc: Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

IOASIDs associated with user processes are subject to resource
restrictions. However, in-kernel usages can target only one global
kernel virtual address mapping. Reserve a special IOASID for the devices
that perform DMA requests with PASID. This global PASID is excluded from
the IOASID allocator.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
 drivers/iommu/ioasid.c                          | 2 ++
 drivers/iommu/iommu-sva-lib.c                   | 1 +
 include/linux/ioasid.h                          | 4 ++++
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..c5fb89bd6229 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -329,7 +329,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
 		return ERR_PTR(-ENOMEM);
 
 	/* Allocate a PASID for this mm if necessary */
-	ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
+	ret = iommu_sva_alloc_pasid(mm, IOASID_ALLOC_START, (1U << master->ssid_bits) - 1);
 	if (ret)
 		goto err_free_bond;
 
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 50ee27bbd04e..89c6132bf1ec 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -317,6 +317,8 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
 	data->private = private;
 	refcount_set(&data->refs, 1);
 
+	if (min < IOASID_ALLOC_BASE)
+		min = IOASID_ALLOC_BASE;
 	/*
 	 * Custom allocator needs allocator data to perform platform specific
 	 * operations.
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index bd41405d34e9..4f56843517e5 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -28,6 +28,7 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
 	int ret = 0;
 	ioasid_t pasid;
 
+	/* TODO: no need to check!! reserved range is checked in ioasid_alloc() */
 	if (min == INVALID_IOASID || max == INVALID_IOASID ||
 	    min == 0 || max < min)
 		return -EINVAL;
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index e9dacd4b9f6b..4d435cbd48b8 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -6,6 +6,10 @@
 #include <linux/errno.h>
 
 #define INVALID_IOASID ((ioasid_t)-1)
+#define IOASID_DMA_NO_PASID	0 /* For DMA request w/o PASID */
+#define IOASID_DMA_PASID	1 /* For in-kernel DMA w/ PASID */
+#define IOASID_ALLOC_BASE	2 /* Start of the allocation */
+
 typedef unsigned int ioasid_t;
 typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data);
 typedef void (*ioasid_free_fn_t)(ioasid_t ioasid, void *data);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC 2/7] dma-iommu: Add API for DMA request with PASID
  2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
  2021-09-21 20:29 ` [RFC 1/7] ioasid: reserve special PASID for in-kernel DMA Jacob Pan
@ 2021-09-21 20:29 ` Jacob Pan
  2021-09-21 20:29 ` [RFC 3/7] iommu/vt-d: Add DMA w/ PASID support for PA and IOVA Jacob Pan
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Jacob Pan @ 2021-09-21 20:29 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig
  Cc: Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

DMA-API is the standard way for device drivers to perform in-kernel
DMA requests. If security is on, isolation is enforced by the IOMMU at
requester ID (RID) granularity. With the introduction of process address
space ID (PASID), DMA security is enforced per RID+PASID
granularity. On some modern platforms, DMA request with PASID is the
only option available. e.g. The shared work queues in Intel Data
Streaming Accelerators (DSA).
To support DMA with PASID while maintaining the ubiquitous usage of DMA
API, this patch introduces a new IOMMU DMA API that supports two
addressing modes:
1. Physical address or bypass mode. This is similar to DMA direct mode,
where trusted devices can DMA passthrough IOMMU.
2. IOVA mode that abides DMA APIs. Once set up, callers can use DMA API
as-is. DMA requests w/ and w/o PASID will be mapped by the same page
tables.  i.e. the default DMA domain will be used by both RID and
RID+PASID.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/dma-iommu.c | 54 +++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h | 12 +++++++++
 include/linux/iommu.h     |  2 ++
 3 files changed, 68 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..490731659def 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -174,6 +174,60 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL(iommu_put_dma_cookie);
 
+/**
+ * iommu_dma_pasid_enable --Enable device DMA request with PASID
+ * @dev:	Device to be enabled
+ * @mode:	DMA addressing mode
+ *
+ * The following mutually exclusive DMA addressing modes are supported:
+ *
+ *  1. Physical address or bypass mode. This is similar to DMA direct where
+ *     trusted devices can DMA pass-through IOMMU.
+ *
+ *  2. IOVA mode. This abides DMA APIs. Once set up, callers can use DMA API
+ *     as-is. DMA request w/ and w/o PASID will be mapped by the same page
+ *     tables. PCI requester ID (RID) and RID+PASID will be pointed to the same
+ *     PGD. i.e. the default DMA domain will be used by both RID and RID+PASID.
+ *
+ * @return the supervisor PASID to be used for DMA. Or INVALID_IOASID upon
+ *     failure.
+ */
+int iommu_dma_pasid_enable(struct device *dev,
+			   enum iommu_dma_pasid_mode mode)
+{
+	int pasid = INVALID_IOASID;
+	struct iommu_domain *dom = NULL;
+
+	/* TODO: only allow a single mode each time, track PASID DMA enabling
+	 * status per device. Perhaps add a flag in struct device.dev_iommu.
+	 */
+
+	/* Call vendor drivers to handle IOVA, bypass mode */
+	dom = iommu_get_domain_for_dev(dev);
+	if (dom->ops->enable_pasid_dma(dev, IOASID_DMA_PASID, mode)) {
+		dev_dbg(dev, "Failed to enable DMA pasid in mode %d", mode);
+		goto exit;
+	}
+	pasid = IOASID_DMA_PASID;
+
+	dev_dbg(dev, "Enable DMA pasid %d in mode %d", pasid, mode);
+	goto exit;
+exit:
+	return pasid;
+}
+EXPORT_SYMBOL(iommu_dma_pasid_enable);
+
+int iommu_dma_pasid_disable(struct device *dev)
+{
+	struct iommu_domain *dom;
+
+	/* Call vendor iommu ops to clean up supervisor PASID context */
+	dom = iommu_get_domain_for_dev(dev);
+
+	return dom->ops->disable_pasid_dma(dev);
+}
+EXPORT_SYMBOL(iommu_dma_pasid_disable);
+
 /**
  * iommu_dma_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 6e75a2d689b4..3c1555e0fd51 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -17,6 +17,18 @@
 int iommu_get_dma_cookie(struct iommu_domain *domain);
 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
 void iommu_put_dma_cookie(struct iommu_domain *domain);
+enum iommu_dma_pasid_mode {
+	/* Pass-through mode, use physical address */
+	IOMMU_DMA_PASID_BYPASS = 1,
+	/* Compatible with DMA APIs, same mapping as DMA w/o PASID */
+	IOMMU_DMA_PASID_IOVA,
+};
+/* For devices that can do DMA request with PASID, setup the system
+ * based on the address mode selected.
+ */
+int iommu_dma_pasid_enable(struct device *dev,
+			   enum iommu_dma_pasid_mode mode);
+int iommu_dma_pasid_disable(struct device *dev);
 
 /* Setup call for arch DMA mapping code */
 void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..610cbfd03e6b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -283,6 +283,8 @@ struct iommu_ops {
 
 	int (*def_domain_type)(struct device *dev);
 
+	int (*enable_pasid_dma)(struct device *dev, u32 pasid, int mode);
+	int (*disable_pasid_dma)(struct device *dev);
 	unsigned long pgsize_bitmap;
 	struct module *owner;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC 3/7] iommu/vt-d: Add DMA w/ PASID support for PA and IOVA
  2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
  2021-09-21 20:29 ` [RFC 1/7] ioasid: reserve special PASID for in-kernel DMA Jacob Pan
  2021-09-21 20:29 ` [RFC 2/7] dma-iommu: Add API for DMA request with PASID Jacob Pan
@ 2021-09-21 20:29 ` Jacob Pan
  2021-09-21 20:29 ` [RFC 4/7] dma-iommu: Add support for DMA w/ PASID in KVA Jacob Pan
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Jacob Pan @ 2021-09-21 20:29 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig
  Cc: Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

For physical address(PA) mode, PASID entry for the given supervisor
PASID will be set up for HW pass-through (PT). For IOVA mode, the
supervisor PASID entry will be configured the same as PASID 0, which is
a special PASID for DMA request w/o PASID, a.k.a.
RID2PASID. Additional IOTLB flush for the supervisor PASID is also
included.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 95 ++++++++++++++++++++++++++++++++++++-
 include/linux/intel-iommu.h |  7 ++-
 2 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284a2016..cbcfd178c16f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1631,7 +1631,11 @@ static void domain_flush_piotlb(struct intel_iommu *iommu,
 	if (domain->default_pasid)
 		qi_flush_piotlb(iommu, did, domain->default_pasid,
 				addr, npages, ih);
-
+	if (domain->s_pasid) {
+		pr_alert_ratelimited("%s: pasid %u", __func__, domain->s_pasid);
+		qi_flush_piotlb(iommu, did, domain->s_pasid,
+				addr, npages, ih);
+	}
 	if (!list_empty(&domain->devices))
 		qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
 }
@@ -5535,6 +5539,93 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
 	}
 }
 
+static int intel_enable_pasid_dma(struct device *dev, u32 pasid, int mode)
+{
+	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct device_domain_info *info;
+	unsigned long flags;
+	int ret = 0;
+
+	info = get_domain_info(dev);
+	if (!info)
+		return -ENODEV;
+
+	if (!dev_is_pci(dev) || !sm_supported(info->iommu))
+		return -EINVAL;
+
+	if (intel_iommu_enable_pasid(info->iommu, dev))
+		return -ENODEV;
+
+	spin_lock_irqsave(&iommu->lock, flags);
+	switch (mode) {
+	case IOMMU_DMA_PASID_BYPASS:
+		dev_dbg(dev, "%s PT mode", __func__);
+		/* As a precaution, translation request should be responded with
+		 * physical address.
+		 */
+		if (!hw_pass_through) {
+			ret = -ENODEV;
+			goto exit_unlock;
+		}
+		/* HW may use large page for ATS */
+		pci_disable_ats(pdev);
+		ret = intel_pasid_setup_pass_through(info->iommu, info->domain,
+						      dev, pasid);
+		if (ret)
+			dev_err(dev, "Failed SPASID %d BYPASS", pasid);
+		break;
+	case IOMMU_DMA_PASID_IOVA:
+		dev_dbg(dev, "%s IOVA mode", __func__);
+		/*
+		 * We could use SL but FL is preferred for consistency with VM
+		 * where vIOMMU exposes FL only cap
+		 */
+		if (!domain_use_first_level(info->domain))
+			return -EINVAL;
+		/* To be used for IOTLB flush at PASID granularity */
+		info->domain->s_pasid = pasid;
+		ret = domain_setup_first_level(info->iommu, info->domain, dev,
+						pasid);
+		break;
+	default:
+		dev_err(dev, "Invalid PASID DMA mode %d", mode);
+		ret = -EINVAL;
+		goto exit_unlock;
+	}
+	info->pasid_mode = mode;
+exit_unlock:
+	spin_unlock_irqrestore(&iommu->lock, flags);
+
+	return ret;
+}
+
+static int intel_disable_pasid_dma(struct device *dev)
+{
+	struct device_domain_info *info;
+	int ret = 0;
+
+	info = get_domain_info(dev);
+	if (!info)
+		return -ENODEV;
+
+	if (!dev_is_pci(dev) || !sm_supported(info->iommu))
+		return -EINVAL;
+
+	if (intel_iommu_enable_pasid(info->iommu, dev))
+		return -ENODEV;
+
+	if (!dev->iommu) {
+		dev_err(dev, "No IOMMU params");
+		return -ENODEV;
+	}
+	dev_info(dev, "Tearing down DMA PASID %d",  info->domain->s_pasid);
+	intel_pasid_tear_down_entry(info->iommu, info->dev, info->domain->s_pasid, false);
+
+	info->domain->s_pasid = 0;
+	return ret;
+}
+
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
@@ -5573,6 +5664,8 @@ const struct iommu_ops intel_iommu_ops = {
 	.sva_get_pasid		= intel_svm_get_pasid,
 	.page_response		= intel_svm_page_response,
 #endif
+	.enable_pasid_dma	= intel_enable_pasid_dma,
+	.disable_pasid_dma	= intel_disable_pasid_dma,
 };
 
 static void quirk_iommu_igfx(struct pci_dev *dev)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 03faf20a6817..8940759f759e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -21,6 +21,7 @@
 #include <linux/dmar.h>
 #include <linux/ioasid.h>
 #include <linux/bitfield.h>
+#include <linux/dma-iommu.h>
 
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
@@ -571,7 +572,10 @@ struct dmar_domain {
 					 * The default pasid used for non-SVM
 					 * traffic on mediated devices.
 					 */
-
+	u32		s_pasid;	/*
+					 * Supervisor PASID used for in-kernel
+					 * DMA request with PASID.
+					 */
 	struct iommu_domain domain;	/* generic domain data structure for
 					   iommu core */
 };
@@ -652,6 +656,7 @@ struct device_domain_info {
 	struct intel_iommu *iommu; /* IOMMU used by this device */
 	struct dmar_domain *domain; /* pointer to domain */
 	struct pasid_table *pasid_table; /* pasid table */
+	enum iommu_dma_pasid_mode pasid_mode; /* DMA PASID address mode */
 };
 
 static inline void __iommu_flush_cache(
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC 4/7] dma-iommu: Add support for DMA w/ PASID in KVA
  2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
                   ` (2 preceding siblings ...)
  2021-09-21 20:29 ` [RFC 3/7] iommu/vt-d: Add DMA w/ PASID support for PA and IOVA Jacob Pan
@ 2021-09-21 20:29 ` Jacob Pan
  2021-09-21 20:29 ` [RFC 5/7] iommu/vt-d: Add support for KVA PASID mode Jacob Pan
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Jacob Pan @ 2021-09-21 20:29 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig
  Cc: Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

Sharing virtual addresses between DMA and CPU has many advantages. It
simplifies the programming model, enhances DMA security over physical
addresses. This patch adds KVA support for DMA IOMMU API. Strict and
fast sub-modes are supported transparently based on the device
trustfulness.

The strict mode is intended for untrusted devices. KVA mapping is
established on-demand by a separate IOMMU page table referenced by a
supervisor PASID. An aux domain is allocated per device to carry
the supervisor PASID.

The fast mode is for trusted devices where KVA mapping is shared with
the CPU via kernel page table. Vendor IOMMU driver can choose to use a
global KVA domain for all devices in fast KVA mode.

The follow-up patches will introduce iommu_map_kva() API where KVA
domains will be used.The performance advantage of the fast mode rests
upon the fact that there is no need to build/teardown a separate IOMMU
page table for each DMA buffer.
                                                                       
Though multiple PASIDs and domains per device can be supported the    
lack of compelling usages leads to a single PASID option for now.       

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/dma-iommu.c | 75 ++++++++++++++++++++++++++++++++++-----
 drivers/iommu/iommu.c     |  6 ++++
 include/linux/dma-iommu.h |  6 ++--
 include/linux/iommu.h     |  6 ++++
 4 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 490731659def..5b25dbcef8ee 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -174,9 +174,15 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL(iommu_put_dma_cookie);
 
+static bool dev_is_untrusted(struct device *dev)
+{
+	return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
+}
+
 /**
  * iommu_dma_pasid_enable --Enable device DMA request with PASID
  * @dev:	Device to be enabled
+ * @domain:	IOMMU domain returned for KVA mode mapping
  * @mode:	DMA addressing mode
  *
  * The following mutually exclusive DMA addressing modes are supported:
@@ -189,10 +195,25 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
  *     tables. PCI requester ID (RID) and RID+PASID will be pointed to the same
  *     PGD. i.e. the default DMA domain will be used by both RID and RID+PASID.
  *
+ *  3. KVA mode. DMA address == CPU virtual address. There are two sub-modes:
+ *     strict mode and fast mode.
+ *     The strict mode is intended for the untrusted devices, where DMA address
+ *     is identical to KVA but restricted per device on the supervisor PASID.
+ *     The fast mode is for trusted devices where its DMA is only restricted
+ *     by the kernel page tables used by the CPU. iommu_domains with UNMANAGED
+ *     and KVA types are returned respectively. They are used by iommu_map_kva()
+ *
+ *     The performance advantage of the fast mode (based on whether the device
+ *     is trusted or user allowed), relies on the fact that there is no need
+ *     to build/teardown a separate IOMMU page tables for KVA mapping.
+ *
+ *     Though multiple PASIDs and domains per device can be supported but the
+ *     lack of compelling usages lead to a single PASID option for now.
+ *
  * @return the supervisor PASID to be used for DMA. Or INVALID_IOASID upon
  *     failure.
  */
-int iommu_dma_pasid_enable(struct device *dev,
+int iommu_dma_pasid_enable(struct device *dev, struct iommu_domain **domain,
 			   enum iommu_dma_pasid_mode mode)
 {
 	int pasid = INVALID_IOASID;
@@ -201,8 +222,37 @@ int iommu_dma_pasid_enable(struct device *dev,
 	/* TODO: only allow a single mode each time, track PASID DMA enabling
 	 * status per device. Perhaps add a flag in struct device.dev_iommu.
 	 */
+	if (mode == IOMMU_DMA_PASID_KVA) {
+		if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX)) {
+			dev_err(dev, "No aux domain support");
+			goto exit;
+		};
+		/*
+		 * Untrusted devices gets an unmanaged domain which will be
+		 * returned to the caller for strict IOMMU API KVA mapping.
+		 * Trusted device gets a special KVA domain with init_mm.pgd
+		 * assigned.
+		 */
+		if (dev_is_untrusted(dev))
+			dom = iommu_domain_alloc(dev->bus);
+		else
+			dom = iommu_domain_alloc_kva(dev->bus);
+		if (!dom) {
+			dev_err(dev, "No KVA iommu domain allocated");
+			goto exit_disable_aux;
+		}
+		if (iommu_aux_attach_device(dom, dev)) {
+			dev_err(dev, "Failed to attach KVA iommu domain");
+			goto exit_free_domain;
+		};
+		pasid = iommu_aux_get_pasid(dom, dev);
+
+		dev_dbg(dev, "KVA mode pasid %d", pasid);
+		*domain = dom;
+		goto exit;
+	}
 
-	/* Call vendor drivers to handle IOVA, bypass mode */
+	/* Call vendor drivers to handle IOVA, bypass, and KVA trusted mode */
 	dom = iommu_get_domain_for_dev(dev);
 	if (dom->ops->enable_pasid_dma(dev, IOASID_DMA_PASID, mode)) {
 		dev_dbg(dev, "Failed to enable DMA pasid in mode %d", mode);
@@ -212,15 +262,29 @@ int iommu_dma_pasid_enable(struct device *dev,
 
 	dev_dbg(dev, "Enable DMA pasid %d in mode %d", pasid, mode);
 	goto exit;
+
+exit_free_domain:
+	iommu_domain_free(dom);
+exit_disable_aux:
+	iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX);
 exit:
 	return pasid;
 }
 EXPORT_SYMBOL(iommu_dma_pasid_enable);
 
-int iommu_dma_pasid_disable(struct device *dev)
+int iommu_dma_pasid_disable(struct device *dev, struct iommu_domain *domain)
 {
 	struct iommu_domain *dom;
 
+	if (domain) {
+		/* s-pasid will be cleared during detach */
+		iommu_aux_detach_device(domain, dev);
+		iommu_domain_free(domain);
+		iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX);
+		dev_dbg(dev, "KVA aux domain freed");
+		return 0;
+	}
+
 	/* Call vendor iommu ops to clean up supervisor PASID context */
 	dom = iommu_get_domain_for_dev(dev);
 
@@ -364,11 +428,6 @@ static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad)
 	domain->ops->flush_iotlb_all(domain);
 }
 
-static bool dev_is_untrusted(struct device *dev)
-{
-	return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
-}
-
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df5..acfdcd7ebd6a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1950,6 +1950,12 @@ struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)
 }
 EXPORT_SYMBOL_GPL(iommu_domain_alloc);
 
+struct iommu_domain *iommu_domain_alloc_kva(struct bus_type *bus)
+{
+	return __iommu_domain_alloc(bus, IOMMU_DOMAIN_KVA);
+}
+EXPORT_SYMBOL_GPL(iommu_domain_alloc_kva);
+
 void iommu_domain_free(struct iommu_domain *domain)
 {
 	domain->ops->domain_free(domain);
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 3c1555e0fd51..e858d42a6669 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -22,13 +22,15 @@ enum iommu_dma_pasid_mode {
 	IOMMU_DMA_PASID_BYPASS = 1,
 	/* Compatible with DMA APIs, same mapping as DMA w/o PASID */
 	IOMMU_DMA_PASID_IOVA,
+	/* Use kernel direct mapping memory, page_offset_base */
+	IOMMU_DMA_PASID_KVA,
 };
 /* For devices that can do DMA request with PASID, setup the system
  * based on the address mode selected.
  */
-int iommu_dma_pasid_enable(struct device *dev,
+int iommu_dma_pasid_enable(struct device *dev, struct iommu_domain **domain,
 			   enum iommu_dma_pasid_mode mode);
-int iommu_dma_pasid_disable(struct device *dev);
+int iommu_dma_pasid_disable(struct device *dev, struct iommu_domain *domain);
 
 /* Setup call for arch DMA mapping code */
 void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 610cbfd03e6b..cd8225f6bc23 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,6 +60,7 @@ struct iommu_domain_geometry {
 #define __IOMMU_DOMAIN_DMA_API	(1U << 1)  /* Domain for use in DMA-API
 					      implementation              */
 #define __IOMMU_DOMAIN_PT	(1U << 2)  /* Domain is identity mapped   */
+#define __IOMMU_DOMAIN_KVA	(1U << 3)  /* Domain shares with CPU KVA */
 
 /*
  * This are the possible domain-types
@@ -72,12 +73,16 @@ struct iommu_domain_geometry {
  *	IOMMU_DOMAIN_DMA	- Internally used for DMA-API implementations.
  *				  This flag allows IOMMU drivers to implement
  *				  certain optimizations for these domains
+ *	IOMMU_DOMAIN_KVA	- DMA addresses are kernel virtual addresses.
+ *				  Mapping can be managed by IOMMU API or shares
+ *				  the CPU page table, i.e. SVA.
  */
 #define IOMMU_DOMAIN_BLOCKED	(0U)
 #define IOMMU_DOMAIN_IDENTITY	(__IOMMU_DOMAIN_PT)
 #define IOMMU_DOMAIN_UNMANAGED	(__IOMMU_DOMAIN_PAGING)
 #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
 				 __IOMMU_DOMAIN_DMA_API)
+#define IOMMU_DOMAIN_KVA	(__IOMMU_DOMAIN_KVA)
 
 struct iommu_domain {
 	unsigned type;
@@ -389,6 +394,7 @@ extern int bus_iommu_probe(struct bus_type *bus);
 extern bool iommu_present(struct bus_type *bus);
 extern bool iommu_capable(struct bus_type *bus, enum iommu_cap cap);
 extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
+extern struct iommu_domain *iommu_domain_alloc_kva(struct bus_type *bus);
 extern struct iommu_group *iommu_group_get_by_id(int id);
 extern void iommu_domain_free(struct iommu_domain *domain);
 extern int iommu_attach_device(struct iommu_domain *domain,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC 5/7] iommu/vt-d: Add support for KVA PASID mode
  2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
                   ` (3 preceding siblings ...)
  2021-09-21 20:29 ` [RFC 4/7] dma-iommu: Add support for DMA w/ PASID in KVA Jacob Pan
@ 2021-09-21 20:29 ` Jacob Pan
  2021-09-21 20:29 ` [RFC 6/7] iommu: Add KVA map API Jacob Pan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Jacob Pan @ 2021-09-21 20:29 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig
  Cc: Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

To support KVA fast mode, the VT-d driver must support domain allocation
of IOMMU_DOMAIN_KVA type. Since all devices in fast KVA mode share the
same kernel mapping, a single KVA domain is sufficient. This global KVA
domain contains the kernel mapping, i.e. init_mm.pgd.

The programming of the KVA domain follows the existing flow of auxiliary
domain attachment.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 59 ++++++++++++++++++++++++++++++++++---
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cbcfd178c16f..0dabd5f75acf 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -293,6 +293,9 @@ static inline void context_clear_entry(struct context_entry *context)
  *	3. Each iommu mapps to this domain if successful.
  */
 static struct dmar_domain *si_domain;
+
+/* This domain is used for shared virtual addressing with CPU kernel mapping */
+static struct dmar_domain *kva_domain;
 static int hw_pass_through = 1;
 
 #define for_each_domain_iommu(idx, domain)			\
@@ -1989,6 +1992,10 @@ static void domain_exit(struct dmar_domain *domain)
 	/* Remove associated devices and clear attached or cached domains */
 	domain_remove_dev_info(domain);
 
+	/* There is no IOMMU page table for KVA */
+	if (domain->pgd == (struct dma_pte *)init_mm.pgd)
+		return;
+
 	/* destroy iovas */
 	if (domain->domain.type == IOMMU_DOMAIN_DMA)
 		iommu_put_dma_cookie(&domain->domain);
@@ -2533,6 +2540,10 @@ static int domain_setup_first_level(struct intel_iommu *iommu,
 	int agaw, level;
 	int flags = 0;
 
+	if (domain->domain.type == IOMMU_DOMAIN_KVA) {
+		flags |= PASID_FLAG_SUPERVISOR_MODE;
+		goto do_setup;
+	}
 	/*
 	 * Skip top levels of page tables for iommu which has
 	 * less agaw than default. Unnecessary for PT mode.
@@ -2554,7 +2565,7 @@ static int domain_setup_first_level(struct intel_iommu *iommu,
 
 	if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
 		flags |= PASID_FLAG_PAGE_SNOOP;
-
+do_setup:
 	return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid,
 					     domain->iommu_did[iommu->seq_id],
 					     flags);
@@ -2713,7 +2724,28 @@ static int iommu_domain_identity_map(struct dmar_domain *domain,
 }
 
 static int md_domain_init(struct dmar_domain *domain, int guest_width);
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static int __init kva_domain_init(void)
+{
+	struct dmar_domain *dmar_domain;
+	struct iommu_domain *domain;
 
+	kva_domain = alloc_domain(0);
+	if (!kva_domain) {
+		pr_err("Can't allocate KVA domain\n");
+		return -EFAULT;
+	}
+	kva_domain->pgd = (struct dma_pte *)init_mm.pgd;
+	domain = &kva_domain->domain;
+	domain->type = IOMMU_DOMAIN_KVA;
+	/* REVISIT: may not need this other than sanity check */
+	domain->geometry.aperture_start = 0;
+	domain->geometry.aperture_end   =
+		__DOMAIN_MAX_ADDR(dmar_domain->gaw);
+	domain->geometry.force_aperture = true;
+	return 0;
+}
+#endif
 static int __init si_domain_init(int hw)
 {
 	struct dmar_rmrr_unit *rmrr;
@@ -3363,6 +3395,11 @@ static int __init init_dmars(void)
 			down_write(&dmar_global_lock);
 			if (ret)
 				goto free_iommu;
+			/* For in-kernel DMA with PASID in SVA */
+			ret = kva_domain_init();
+			if (ret)
+				goto free_iommu;
+
 		}
 #endif
 		ret = dmar_set_interrupt(iommu);
@@ -4558,6 +4595,9 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
 		domain->geometry.force_aperture = true;
 
 		return domain;
+	case IOMMU_DOMAIN_KVA:
+		/* Use a global domain for shared KVA mapping */
+		return &kva_domain->domain;
 	case IOMMU_DOMAIN_IDENTITY:
 		return &si_domain->domain;
 	default:
@@ -4583,7 +4623,8 @@ is_aux_domain(struct device *dev, struct iommu_domain *domain)
 	struct device_domain_info *info = get_domain_info(dev);
 
 	return info && info->auxd_enabled &&
-			domain->type == IOMMU_DOMAIN_UNMANAGED;
+		(domain->type == IOMMU_DOMAIN_UNMANAGED ||
+			domain->type == IOMMU_DOMAIN_KVA);
 }
 
 static inline struct subdev_domain_info *
@@ -4693,8 +4734,8 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
 	if (ret)
 		goto attach_failed;
 
-	/* Setup the PASID entry for mediated devices: */
-	if (domain_use_first_level(domain))
+	/* Setup the PASID entry for devices do DMA with the default PASID */
+	if (domain_use_first_level(domain) || domain->domain.type == IOMMU_DOMAIN_KVA)
 		ret = domain_setup_first_level(iommu, domain, dev,
 					       domain->default_pasid);
 	else
@@ -4761,6 +4802,10 @@ static int prepare_domain_attach_device(struct iommu_domain *domain,
 	if (!iommu)
 		return -ENODEV;
 
+	if (domain->type == IOMMU_DOMAIN_KVA) {
+		pr_info("TODO: KVA dom check if device can do full 64bit DMA");
+		return 0;
+	}
 	/* check if this iommu agaw is sufficient for max mapped address */
 	addr_width = agaw_to_width(iommu->agaw);
 	if (addr_width > cap_mgaw(iommu->cap))
@@ -5588,6 +5633,12 @@ static int intel_enable_pasid_dma(struct device *dev, u32 pasid, int mode)
 		ret = domain_setup_first_level(info->iommu, info->domain, dev,
 						pasid);
 		break;
+	case IOMMU_DMA_PASID_KVA:
+		/*
+		 * KVA mode should be handled in the aux domain attach where the default
+		 * PASID of the aux domain is used for setting up PASID FL.
+		 */
+		fallthrough;
 	default:
 		dev_err(dev, "Invalid PASID DMA mode %d", mode);
 		ret = -EINVAL;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC 6/7] iommu: Add KVA map API
  2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
                   ` (4 preceding siblings ...)
  2021-09-21 20:29 ` [RFC 5/7] iommu/vt-d: Add support for KVA PASID mode Jacob Pan
@ 2021-09-21 20:29 ` Jacob Pan
  2021-09-21 20:29 ` [RFC 7/7] dma/idxd: Use dma-iommu PASID API instead of SVA lib Jacob Pan
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Jacob Pan @ 2021-09-21 20:29 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig
  Cc: Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

This patch adds KVA map API. It enforces KVA address range checking and
other potential sanity checks. Currently, only the direct map range is
checked.
For trusted devices, this API returns immediately after the above sanity
check. For untrusted devices, this API serves as a simple wrapper around
IOMMU map/unmap APIs. 
OPEN: Alignment at the minimum page size is required, not as rich and
flexible as DMA-APIs.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/iommu.c | 57 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h |  5 ++++
 2 files changed, 62 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index acfdcd7ebd6a..45ba55941209 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2490,6 +2490,63 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
+/*
+ * REVISIT: This might not be sufficient. Could also check permission match,
+ * exclude kernel text, etc.
+ */
+static inline bool is_kernel_direct_map(unsigned long start, phys_addr_t size)
+{
+	return (start >= PAGE_OFFSET) && ((start + size) <= VMALLOC_START);
+}
+
+/**
+ * @brief Map kernel virtual address for DMA remap. DMA request with
+ *	domain's default PASID will target kernel virtual address space.
+ *
+ * @param domain	Domain contains the PASID
+ * @param page		Kernel virtual address
+ * @param size		Size to map
+ * @param prot		Permissions
+ * @return int		0 on success or error code
+ */
+int iommu_map_kva(struct iommu_domain *domain, struct page *page,
+		  size_t size, int prot)
+{
+	phys_addr_t phys = page_to_phys(page);
+	void *kva = phys_to_virt(phys);
+
+	/*
+	 * TODO: Limit DMA to kernel direct mapping only, avoid dynamic range
+	 * until we have mmu_notifier for making IOTLB coherent with CPU.
+	 */
+	if (!is_kernel_direct_map((unsigned long)kva, size))
+		return -EINVAL;
+	/* KVA domain type indicates shared CPU page table, skip building
+	 * IOMMU page tables. This is the fast mode where only sanity check
+	 * is performed.
+	 */
+	if (domain->type == IOMMU_DOMAIN_KVA)
+		return 0;
+
+	return iommu_map(domain, (unsigned long)kva, phys, size, prot);
+}
+EXPORT_SYMBOL_GPL(iommu_map_kva);
+
+int iommu_unmap_kva(struct iommu_domain *domain, void *kva,
+		    size_t size)
+{
+	if (!is_kernel_direct_map((unsigned long)kva, size))
+		return -EINVAL;
+
+	if (domain->type == IOMMU_DOMAIN_KVA) {
+		pr_debug_ratelimited("unmap kva skipped %llx", (u64)kva);
+		return 0;
+	}
+	/* REVISIT: do we need a fast version? */
+	return iommu_unmap(domain, (unsigned long)kva, size);
+}
+EXPORT_SYMBOL_GPL(iommu_unmap_kva);
+
 int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index cd8225f6bc23..c0fac050ca57 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -427,6 +427,11 @@ extern size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 extern size_t iommu_map_sg_atomic(struct iommu_domain *domain,
 				  unsigned long iova, struct scatterlist *sg,
 				  unsigned int nents, int prot);
+extern int iommu_map_kva(struct iommu_domain *domain,
+			 struct page *page, size_t size, int prot);
+extern int iommu_unmap_kva(struct iommu_domain *domain,
+			   void *kva, size_t size);
+
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova);
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 			iommu_fault_handler_t handler, void *token);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC 7/7] dma/idxd: Use dma-iommu PASID API instead of SVA lib
  2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
                   ` (5 preceding siblings ...)
  2021-09-21 20:29 ` [RFC 6/7] iommu: Add KVA map API Jacob Pan
@ 2021-09-21 20:29 ` Jacob Pan
  2021-09-22 17:04 ` [RFC 0/7] Support in-kernel DMA with PASID and SVA Jason Gunthorpe
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Jacob Pan @ 2021-09-21 20:29 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig
  Cc: Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

Showcase a partial usage of the new PASID DMA API, it replaces SVA
lib API in terms of obtaining system PASID and addressing mode setup.
But the actual work submission is not included.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/dma/idxd/idxd.h |  4 +++-
 drivers/dma/idxd/init.c | 36 ++++++++++++++++--------------------
 2 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 507f5d5119f9..eaedaf3c3e3b 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -12,6 +12,7 @@
 #include <linux/pci.h>
 #include <linux/perf_event.h>
 #include <linux/idxd.h>
+#include <linux/dma-iommu.h>
 #include "registers.h"
 
 #define IDXD_DRIVER_VERSION	"1.00"
@@ -253,7 +254,8 @@ struct idxd_device {
 
 	struct iommu_sva *sva;
 	unsigned int pasid;
-
+	enum iommu_dma_pasid_mode spasid_mode;
+	struct iommu_domain *domain; /* For KVA mapping */
 	int num_groups;
 
 	u32 msix_perm_offset;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index c404a1320536..8f952a4c8909 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -16,6 +16,7 @@
 #include <linux/idr.h>
 #include <linux/intel-svm.h>
 #include <linux/iommu.h>
+#include <linux/dma-iommu.h>
 #include <uapi/linux/idxd.h>
 #include <linux/dmaengine.h>
 #include "../dmaengine.h"
@@ -32,6 +33,11 @@ static bool sva = true;
 module_param(sva, bool, 0644);
 MODULE_PARM_DESC(sva, "Toggle SVA support on/off");
 
+static int spasid_mode = IOMMU_DMA_PASID_IOVA;
+module_param(spasid_mode, int, 0644);
+MODULE_PARM_DESC(spasid_mode,
+		 "Supervisor PASID mode (1: pass-through,2: DMA API)");
+
 #define DRV_NAME "idxd"
 
 bool support_enqcmd;
@@ -519,35 +525,25 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-	int flags;
-	unsigned int pasid;
-	struct iommu_sva *sva;
-
-	flags = SVM_FLAG_SUPERVISOR_MODE;
-
-	sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
-	if (IS_ERR(sva)) {
-		dev_warn(&idxd->pdev->dev,
-			 "iommu sva bind failed: %ld\n", PTR_ERR(sva));
-		return PTR_ERR(sva);
-	}
+	int pasid;
+	struct iommu_domain *domain = NULL;
 
-	pasid = iommu_sva_get_pasid(sva);
-	if (pasid == IOMMU_PASID_INVALID) {
-		iommu_sva_unbind_device(sva);
+	pasid = iommu_dma_pasid_enable(&idxd->pdev->dev, &domain, spasid_mode);
+	if (pasid == INVALID_IOASID) {
+		dev_err(&idxd->pdev->dev, "No DMA PASID in mode %d\n", spasid_mode);
 		return -ENODEV;
 	}
-
-	idxd->sva = sva;
 	idxd->pasid = pasid;
-	dev_dbg(&idxd->pdev->dev, "system pasid: %u\n", pasid);
+	idxd->spasid_mode = spasid_mode;
+	idxd->domain = domain;
+
 	return 0;
 }
 
 static void idxd_disable_system_pasid(struct idxd_device *idxd)
 {
-
-	iommu_sva_unbind_device(idxd->sva);
+	/* TODO: remove sva, restore no PASID PT and PASIDE */
+	iommu_dma_pasid_disable(&idxd->pdev->dev, idxd->domain);
 	idxd->sva = NULL;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
                   ` (6 preceding siblings ...)
  2021-09-21 20:29 ` [RFC 7/7] dma/idxd: Use dma-iommu PASID API instead of SVA lib Jacob Pan
@ 2021-09-22 17:04 ` Jason Gunthorpe
  2021-09-29 19:37 ` Jacob Pan
  2021-10-01 12:24 ` Barry Song
  9 siblings, 0 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2021-09-22 17:04 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, Christoph Hellwig, Lu Baolu,
	Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck, mike.campin,
	Yi Liu, Tian, Kevin

On Tue, Sep 21, 2021 at 01:29:34PM -0700, Jacob Pan wrote:
> Hi Joerg/Jason/Christoph et all,
> 
> The current in-kernel supervisor PASID support is based on the SVM/SVA
> machinery in sva-lib. Kernel SVA is achieved by extending a special flag
> to indicate the binding of the device and a page table should be performed
> on init_mm instead of the mm of the current process.Page requests and other
> differences between user and kernel SVA are handled as special cases.
> 
> This unrestricted binding with the kernel page table is being challenged
> for security and the convention that in-kernel DMA must be compatible with
> DMA APIs.
> (https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/)
> There is also the lack of IOTLB synchronization upon kernel page table updates.
> 
> This patchset is trying to address these concerns by having an explicit DMA
> API compatible model while continue to support in-kernel use of DMA requests
> with PASID. Specifically, the following DMA-IOMMU APIs are introduced:
> 
> int iommu_dma_pasid_enable/disable(struct device *dev,
> 				   struct iommu_domain **domain,
> 				   enum iommu_dma_pasid_mode mode);
> int iommu_map/unmap_kva(struct iommu_domain *domain,
> 			void *cpu_addr,size_t size, int prot);

I'm not convinced this is going in the right direction..

You should create/find a 'struct device' for the PASID and use the
normal DMA API, not try to create a parallel DMA API under the iommu
framework.

Again, there should no driver in Linux doing DMA without going through
the normal DMA API.

> The following three addressing modes are supported with example API usages
> by device drivers.
> 
> 1. Physical address (bypass) mode. Similar to DMA direct where trusted devices
> can DMA pass through IOMMU on a per PASID basis.
> Example:
> 	pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS);
> 	/* Use the returning PASID and PA for work submission */

And why should this even be a choice given to drivers?

Drivers do not get to self declare their "trustiness" - this is only
set by the admin.

PASID tagged DMA is no different than any other DMA and needs to
follow the global admin set IOMMU modes - without any driver knob to
change behaviors.

The API design should look more like this:

   u32 hw_pasid;
   struct device *pasid_dev = iommu_get_pasid_device_handle(pci_device, &hw_pasid);
   dma_addr_t addr = dma_map_XX(pasid_dev, buf, size)

   'tell HW to do DMA'(hw_pasid, addr, size)

   dma_unmap_XX(pasid_dev, addr, size);

If there is any performance tunable around how the IO page table is
consutrcted then the IOMMU layer will handle it transparently from
global config, just as it does for every other DMA out there.

> 1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a
> result of module loading/eBPF load. Adding kernel mmu notifier?

I'm deeply skeptical we should even have "KSVA" and would want to see
a lot of performance justification to introduce something like
this.

Given that basically only valloc memory could truely benefit from it,
I don't expect to see much win, especially when balanced with
burdening all valloc users with an IO page table synchronization.

Certainly it should not be part of a patch series fixing kPASID
support for basic DMA, and still doesn't excuse skpping the DMA API -
that is still mandatory for portability to support cache flushing.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
                   ` (7 preceding siblings ...)
  2021-09-22 17:04 ` [RFC 0/7] Support in-kernel DMA with PASID and SVA Jason Gunthorpe
@ 2021-09-29 19:37 ` Jacob Pan
  2021-09-29 19:39   ` Jason Gunthorpe
  2021-10-01 12:24 ` Barry Song
  9 siblings, 1 reply; 29+ messages in thread
From: Jacob Pan @ 2021-09-29 19:37 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig
  Cc: Tian, Kevin, Tony Luck, Dave Jiang, Raj Ashok, Kumar, Sanjay K,
	mike.campin, jacob.jun.pan, Thomas Gleixner

Hi,

Just to follow up on what we discussed during LPC VFIO/IOMMU/PCI MC.
https://linuxplumbersconf.org/event/11/contributions/1021/
The key takeaways are:
1. Addressing mode selections (PA, IOVA, and KVA) should be a policy
decision *not* to be made by device drivers. This implies that it is up to
the platform code or user (via some sysfs knobs) to decide what is the best
for each device. Drivers should not be aware of what addressing modes is
returned by DMA API.

2. DMA APIs can be extended to support DMA request with PASID. 

3. Performance benefit of using KVA (shared) should be demonstrated. Though
the saving of IOTLB flush over IOVA is conceivable.

#1 could be done in platform IOMMU code when devices are attached to their
default domains. E.g. if the device is trusted, it can operate at shared
KVA mode.

For #2, it seems we can store the kernel PASID in struct device. This will
preserve the DMA API interface while making it PASID capable. Essentially,
each PASID capable device would have two special global PASIDs: 
	- PASID 0 for DMA request w/o PASID, aka RID2PASID
	- PASID 1 (randomly selected) for in-kernel DMA request w/ PASID

Both PASID 0 and 1 will always point to the same page table. I.e. same
addressing mode, IOVA or KVA.

For devices does not support PASID, there is no change. For devices can do
both DMA w/ and w/o PASID, the IOTLB invalidation would include both PASIDs.

By embedding PASID in struct device, it also avoided changes in upper level
APIs. DMA engine API can continue to give out channels without knowing
whether PASID is used or not. The accelerator drivers that does work
submission can retrieve PASID from struct device.

Thoughts?

Thanks for the review and feedback at LPC!

Jacob

On Tue, 21 Sep 2021 13:29:34 -0700, Jacob Pan
<jacob.jun.pan@linux.intel.com> wrote:

> Hi Joerg/Jason/Christoph et all,
> 
> The current in-kernel supervisor PASID support is based on the SVM/SVA
> machinery in sva-lib. Kernel SVA is achieved by extending a special flag
> to indicate the binding of the device and a page table should be performed
> on init_mm instead of the mm of the current process.Page requests and
> other differences between user and kernel SVA are handled as special
> cases.
> 
> This unrestricted binding with the kernel page table is being challenged
> for security and the convention that in-kernel DMA must be compatible with
> DMA APIs.
> (https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/)
> There is also the lack of IOTLB synchronization upon kernel page table
> updates.
> 
> This patchset is trying to address these concerns by having an explicit
> DMA API compatible model while continue to support in-kernel use of DMA
> requests with PASID. Specifically, the following DMA-IOMMU APIs are
> introduced:
> 
> int iommu_dma_pasid_enable/disable(struct device *dev,
> 				   struct iommu_domain **domain,
> 				   enum iommu_dma_pasid_mode mode);
> int iommu_map/unmap_kva(struct iommu_domain *domain,
> 			void *cpu_addr,size_t size, int prot);
> 
> The following three addressing modes are supported with example API usages
> by device drivers.
> 
> 1. Physical address (bypass) mode. Similar to DMA direct where trusted
> devices can DMA pass through IOMMU on a per PASID basis.
> Example:
> 	pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS);
> 	/* Use the returning PASID and PA for work submission */
> 
> 2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as
> the PCI requester ID (RID)
> Example:
> 	pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA);
> 	/* Use the PASID and DMA API allocated IOVA for work submission */
> 
> 3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes
> transparently based on device trustfulness.
> Example:
> 	pasid = iommu_dma_pasid_enable(dev, &domain, IOMMU_DMA_PASID_KVA);
> 	iommu_map_kva(domain, &buf, size, prot);
> 	/* Use the returned PASID and KVA to submit work */
> Where:
> 	Fast mode: Shared CPU page tables for trusted devices only
> 	Strict mode: IOMMU domain returned for the untrusted device to
> 	replicate KVA-PA mapping in IOMMU page tables.
> 
> On a per device basis, DMA address and performance modes are enabled by
> the device drivers. Platform information such as trustability, user
> command line input (not included in this set) could also be taken into
> consideration (not implemented in this RFC).
> 
> This RFC is intended to communicate the API directions. Little testing is
> done outside IDXD and DMA engine tests.
> 
> For PA and IOVA modes, the implementation is straightforward and tested
> with Intel IDXD driver. But several opens remain in KVA fast mode thus
> not tested: 1. Lack of IOTLB synchronization, kernel direct map alias can
> be updated as a result of module loading/eBPF load. Adding kernel mmu
> notifier? 2. The use of the auxiliary domain for KVA map, will aux domain
> stay in the long term? Is there another way to represent sub-device granu
> isolation? 3. Is limiting the KVA sharing to the direct map range
> reasonable and practical for all architectures?
> 
> 
> Many thanks to Ashok Raj, Kevin Tian, and Baolu who provided feedback and
> many ideas in this set.
> 
> Thanks,
> 
> Jacob
> 
> Jacob Pan (7):
>   ioasid: reserve special PASID for in-kernel DMA
>   dma-iommu: Add API for DMA request with PASID
>   iommu/vt-d: Add DMA w/ PASID support for PA and IOVA
>   dma-iommu: Add support for DMA w/ PASID in KVA
>   iommu/vt-d: Add support for KVA PASID mode
>   iommu: Add KVA map API
>   dma/idxd: Use dma-iommu PASID API instead of SVA lib
> 
>  drivers/dma/idxd/idxd.h                       |   4 +-
>  drivers/dma/idxd/init.c                       |  36 ++--
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +-
>  drivers/iommu/dma-iommu.c                     | 123 +++++++++++++-
>  drivers/iommu/intel/iommu.c                   | 154 +++++++++++++++++-
>  drivers/iommu/ioasid.c                        |   2 +
>  drivers/iommu/iommu-sva-lib.c                 |   1 +
>  drivers/iommu/iommu.c                         |  63 +++++++
>  include/linux/dma-iommu.h                     |  14 ++
>  include/linux/intel-iommu.h                   |   7 +-
>  include/linux/ioasid.h                        |   4 +
>  include/linux/iommu.h                         |  13 ++
>  12 files changed, 390 insertions(+), 33 deletions(-)
> 


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-09-29 19:37 ` Jacob Pan
@ 2021-09-29 19:39   ` Jason Gunthorpe
  2021-09-29 22:57     ` Jacob Pan
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-09-29 19:39 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, Christoph Hellwig, Tian, Kevin,
	Tony Luck, Dave Jiang, Raj Ashok, Kumar, Sanjay K, mike.campin,
	Thomas Gleixner

On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote:
 
> For #2, it seems we can store the kernel PASID in struct device. This will
> preserve the DMA API interface while making it PASID capable. Essentially,
> each PASID capable device would have two special global PASIDs: 
> 	- PASID 0 for DMA request w/o PASID, aka RID2PASID
> 	- PASID 1 (randomly selected) for in-kernel DMA request w/ PASID

This seems reasonable, I had the same thought. Basically just have the
driver issue some trivial call:
  pci_enable_pasid_dma(pdev, &pasid)

And then DMA tagged with the PASID will be handled equivilant to
untagged DMA. Basically PASID and no PASID point to the exact same IO
page table and the DMA API manipulates that single page table.

Having multiple RID's pointing at the same IO page table is something
we expect iommufd to require so the whole thing should ideally fall
out naturally.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-09-29 19:39   ` Jason Gunthorpe
@ 2021-09-29 22:57     ` Jacob Pan
  2021-09-29 23:43       ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Jacob Pan @ 2021-09-29 22:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, LKML, Joerg Roedel, Christoph Hellwig, Tian, Kevin,
	Tony Luck, Dave Jiang, Raj Ashok, Kumar, Sanjay K, mike.campin,
	Thomas Gleixner, jacob.jun.pan

Hi Jason,

On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote:
>  
> > For #2, it seems we can store the kernel PASID in struct device. This
> > will preserve the DMA API interface while making it PASID capable.
> > Essentially, each PASID capable device would have two special global
> > PASIDs: 
> > 	- PASID 0 for DMA request w/o PASID, aka RID2PASID
> > 	- PASID 1 (randomly selected) for in-kernel DMA request w/
> > PASID  
> 
> This seems reasonable, I had the same thought. Basically just have the
> driver issue some trivial call:
>   pci_enable_pasid_dma(pdev, &pasid)
That would work, but I guess it needs to be an iommu_ call instead of pci_?

Or, it can be done by the platform IOMMU code where system PASID is
automatically enabled for PASID capable devices during boot and stored in
struct device. Device drivers can retrieve the PASID from struct device.

I think your suggestion is more precise, in case the driver does not want
to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only).

> And then DMA tagged with the PASID will be handled equivilant to
> untagged DMA. Basically PASID and no PASID point to the exact same IO
> page table and the DMA API manipulates that single page table.
> 
> Having multiple RID's pointing at the same IO page table is something
> we expect iommufd to require so the whole thing should ideally fall
> out naturally.
That would be the equivalent of attaching multiple devices to the same
IOMMU domain. right?

> Jason


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-09-29 22:57     ` Jacob Pan
@ 2021-09-29 23:43       ` Jason Gunthorpe
  2021-09-30 14:22         ` Campin, Mike
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-09-29 23:43 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, Christoph Hellwig, Tian, Kevin,
	Tony Luck, Dave Jiang, Raj Ashok, Kumar, Sanjay K, mike.campin,
	Thomas Gleixner

On Wed, Sep 29, 2021 at 03:57:20PM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote:
> >  
> > > For #2, it seems we can store the kernel PASID in struct device. This
> > > will preserve the DMA API interface while making it PASID capable.
> > > Essentially, each PASID capable device would have two special global
> > > PASIDs: 
> > > 	- PASID 0 for DMA request w/o PASID, aka RID2PASID
> > > 	- PASID 1 (randomly selected) for in-kernel DMA request w/
> > > PASID  
> > 
> > This seems reasonable, I had the same thought. Basically just have the
> > driver issue some trivial call:
> >   pci_enable_pasid_dma(pdev, &pasid)
> That would work, but I guess it needs to be an iommu_ call instead of pci_?

Which ever makes sense..  The API should take in a struct pci_device
and return a PCI PASID - at least as a wrapper around a more generic
immu api.

> I think your suggestion is more precise, in case the driver does not want
> to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only).

Since it is odd, and it may create overhead, I would do it only when
asked to do it

> > Having multiple RID's pointing at the same IO page table is something
> > we expect iommufd to require so the whole thing should ideally fall
> > out naturally.

> That would be the equivalent of attaching multiple devices to the same
> IOMMU domain. right?

Effectively..

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-09-29 23:43       ` Jason Gunthorpe
@ 2021-09-30 14:22         ` Campin, Mike
  2021-09-30 15:21           ` Jacob Pan
  0 siblings, 1 reply; 29+ messages in thread
From: Campin, Mike @ 2021-09-30 14:22 UTC (permalink / raw)
  To: Jason Gunthorpe, Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, Christoph Hellwig, Tian, Kevin, Luck,
	Tony, Jiang, Dave, Raj, Ashok, Kumar, Sanjay K, Thomas Gleixner

I need support for mixed user PASID, kernel PASID and non-PASID use cases in the driver.

-----Original Message-----
From: Jason Gunthorpe <jgg@nvidia.com> 
Sent: Wednesday, September 29, 2021 4:43 PM
To: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: iommu@lists.linux-foundation.org; LKML <linux-kernel@vger.kernel.org>; Joerg Roedel <joro@8bytes.org>; Christoph Hellwig <hch@infradead.org>; Tian, Kevin <kevin.tian@intel.com>; Luck, Tony <tony.luck@intel.com>; Jiang, Dave <dave.jiang@intel.com>; Raj, Ashok <ashok.raj@intel.com>; Kumar, Sanjay K <sanjay.k.kumar@intel.com>; Campin, Mike <mike.campin@intel.com>; Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

On Wed, Sep 29, 2021 at 03:57:20PM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote:
> >  
> > > For #2, it seems we can store the kernel PASID in struct device. 
> > > This will preserve the DMA API interface while making it PASID capable.
> > > Essentially, each PASID capable device would have two special 
> > > global
> > > PASIDs: 
> > > 	- PASID 0 for DMA request w/o PASID, aka RID2PASID
> > > 	- PASID 1 (randomly selected) for in-kernel DMA request w/ PASID
> > 
> > This seems reasonable, I had the same thought. Basically just have 
> > the driver issue some trivial call:
> >   pci_enable_pasid_dma(pdev, &pasid)
> That would work, but I guess it needs to be an iommu_ call instead of pci_?

Which ever makes sense..  The API should take in a struct pci_device and return a PCI PASID - at least as a wrapper around a more generic immu api.

> I think your suggestion is more precise, in case the driver does not 
> want to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only).

Since it is odd, and it may create overhead, I would do it only when asked to do it

> > Having multiple RID's pointing at the same IO page table is 
> > something we expect iommufd to require so the whole thing should 
> > ideally fall out naturally.

> That would be the equivalent of attaching multiple devices to the same 
> IOMMU domain. right?

Effectively..

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-09-30 14:22         ` Campin, Mike
@ 2021-09-30 15:21           ` Jacob Pan
  0 siblings, 0 replies; 29+ messages in thread
From: Jacob Pan @ 2021-09-30 15:21 UTC (permalink / raw)
  To: Campin, Mike
  Cc: Jason Gunthorpe, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Tian, Kevin, Luck, Tony, Jiang, Dave, Raj, Ashok, Kumar,
	Sanjay K, Thomas Gleixner, jacob.jun.pan

Hi Mike,

On Thu, 30 Sep 2021 14:22:34 +0000, "Campin, Mike" <mike.campin@intel.com>
wrote:

> I need support for mixed user PASID, kernel PASID and non-PASID use cases
> in the driver.
> 
This specific RFC is for kernel PASID only. User PASID native use is
supported under SVA lib kernel API and /dev/uacce UAPI or driver specific
char dev. Guest PASID is being developed under the new /dev/iommu framework.
Non-PASID kernel use should be under DMA API unchanged from the driver's
POV. In fact, this proposal will map non-PASID and PASID DMA identically.

Thanks,

Jacob

> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com> 
> Sent: Wednesday, September 29, 2021 4:43 PM
> To: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org; LKML
> <linux-kernel@vger.kernel.org>; Joerg Roedel <joro@8bytes.org>; Christoph
> Hellwig <hch@infradead.org>; Tian, Kevin <kevin.tian@intel.com>; Luck,
> Tony <tony.luck@intel.com>; Jiang, Dave <dave.jiang@intel.com>; Raj,
> Ashok <ashok.raj@intel.com>; Kumar, Sanjay K <sanjay.k.kumar@intel.com>;
> Campin, Mike <mike.campin@intel.com>; Thomas Gleixner
> <tglx@linutronix.de> Subject: Re: [RFC 0/7] Support in-kernel DMA with
> PASID and SVA
> 
> On Wed, Sep 29, 2021 at 03:57:20PM -0700, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe <jgg@nvidia.com>
> > wrote: 
> > > On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote:
> > >    
> > > > For #2, it seems we can store the kernel PASID in struct device. 
> > > > This will preserve the DMA API interface while making it PASID
> > > > capable. Essentially, each PASID capable device would have two
> > > > special global
> > > > PASIDs: 
> > > > 	- PASID 0 for DMA request w/o PASID, aka RID2PASID
> > > > 	- PASID 1 (randomly selected) for in-kernel DMA request w/
> > > > PASID  
> > > 
> > > This seems reasonable, I had the same thought. Basically just have 
> > > the driver issue some trivial call:
> > >   pci_enable_pasid_dma(pdev, &pasid)  
> > That would work, but I guess it needs to be an iommu_ call instead of
> > pci_?  
> 
> Which ever makes sense..  The API should take in a struct pci_device and
> return a PCI PASID - at least as a wrapper around a more generic immu api.
> 
> > I think your suggestion is more precise, in case the driver does not 
> > want to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only).  
> 
> Since it is odd, and it may create overhead, I would do it only when
> asked to do it
> 
> > > Having multiple RID's pointing at the same IO page table is 
> > > something we expect iommufd to require so the whole thing should 
> > > ideally fall out naturally.  
> 
> > That would be the equivalent of attaching multiple devices to the same 
> > IOMMU domain. right?  
> 
> Effectively..
> 
> Jason


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
                   ` (8 preceding siblings ...)
  2021-09-29 19:37 ` Jacob Pan
@ 2021-10-01 12:24 ` Barry Song
  2021-10-01 12:36   ` Jason Gunthorpe
  9 siblings, 1 reply; 29+ messages in thread
From: Barry Song @ 2021-10-01 12:24 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, Jason Gunthorpe, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

On Wed, Sep 22, 2021 at 5:14 PM Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
>
> Hi Joerg/Jason/Christoph et all,
>
> The current in-kernel supervisor PASID support is based on the SVM/SVA
> machinery in sva-lib. Kernel SVA is achieved by extending a special flag
> to indicate the binding of the device and a page table should be performed
> on init_mm instead of the mm of the current process.Page requests and other
> differences between user and kernel SVA are handled as special cases.
>
> This unrestricted binding with the kernel page table is being challenged
> for security and the convention that in-kernel DMA must be compatible with
> DMA APIs.
> (https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/)
> There is also the lack of IOTLB synchronization upon kernel page table updates.
>
> This patchset is trying to address these concerns by having an explicit DMA
> API compatible model while continue to support in-kernel use of DMA requests
> with PASID. Specifically, the following DMA-IOMMU APIs are introduced:
>
> int iommu_dma_pasid_enable/disable(struct device *dev,
>                                    struct iommu_domain **domain,
>                                    enum iommu_dma_pasid_mode mode);
> int iommu_map/unmap_kva(struct iommu_domain *domain,
>                         void *cpu_addr,size_t size, int prot);
>
> The following three addressing modes are supported with example API usages
> by device drivers.
>
> 1. Physical address (bypass) mode. Similar to DMA direct where trusted devices
> can DMA pass through IOMMU on a per PASID basis.
> Example:
>         pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS);
>         /* Use the returning PASID and PA for work submission */
>
> 2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as the
> PCI requester ID (RID)
> Example:
>         pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA);
>         /* Use the PASID and DMA API allocated IOVA for work submission */

Hi Jacob,
might be stupid question, what is the performance benefit of this IOVA
mode comparing
with the current dma_map/unmap_single/sg API which have enabled IOMMU like
drivers/iommu/arm/arm-smmu-v3? Do we still need to flush IOTLB by sending
commands to IOMMU  each time while doing dma_unmap?

>
> 3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes
> transparently based on device trustfulness.
> Example:
>         pasid = iommu_dma_pasid_enable(dev, &domain, IOMMU_DMA_PASID_KVA);
>         iommu_map_kva(domain, &buf, size, prot);
>         /* Use the returned PASID and KVA to submit work */
> Where:
>         Fast mode: Shared CPU page tables for trusted devices only
>         Strict mode: IOMMU domain returned for the untrusted device to
>         replicate KVA-PA mapping in IOMMU page tables.

a huge bottleneck of IOMMU we have seen before is that dma_unmap will
require IOTLB
flush, for example, in arm_smmu_cmdq_issue_cmdlist(), we are having
serious contention
on acquiring lock and delay on waiting for iotlb flush completion in
arm_smmu_cmdq_poll_until_sync() while multi-threads run.

I assume KVA mode can avoid this iotlb flush as the device is using
the page table
of the kernel and sharing the whole kernel space. But will users be
glad to accept
this mode?
It seems users are enduring the performance decrease of IOVA mapping
and unmapping
because it has better security. dma operations can only run on some
specific dma buffers
which have been mapped in the current dma-map/unmap with IOMMU backend.
some drivers are using bouncing buffer to overcome the performance loss of
dma_map/unmap as copying is faster than unmapping:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=907676b130711fd1f

BTW, we have been debugging on dma_map/unmap performance by this
benchmark:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/map_benchmark.c
you might be able to use it for your benchmarking as well :-)

>
> On a per device basis, DMA address and performance modes are enabled by the
> device drivers. Platform information such as trustability, user command line
> input (not included in this set) could also be taken into consideration (not
> implemented in this RFC).
>
> This RFC is intended to communicate the API directions. Little testing is done
> outside IDXD and DMA engine tests.
>
> For PA and IOVA modes, the implementation is straightforward and tested with
> Intel IDXD driver. But several opens remain in KVA fast mode thus not tested:
> 1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a
> result of module loading/eBPF load. Adding kernel mmu notifier?
> 2. The use of the auxiliary domain for KVA map, will aux domain stay in the
> long term? Is there another way to represent sub-device granu isolation?
> 3. Is limiting the KVA sharing to the direct map range reasonable and
> practical for all architectures?
>
>
> Many thanks to Ashok Raj, Kevin Tian, and Baolu who provided feedback and
> many ideas in this set.
>
> Thanks,
>
> Jacob
>
> Jacob Pan (7):
>   ioasid: reserve special PASID for in-kernel DMA
>   dma-iommu: Add API for DMA request with PASID
>   iommu/vt-d: Add DMA w/ PASID support for PA and IOVA
>   dma-iommu: Add support for DMA w/ PASID in KVA
>   iommu/vt-d: Add support for KVA PASID mode
>   iommu: Add KVA map API
>   dma/idxd: Use dma-iommu PASID API instead of SVA lib
>
>  drivers/dma/idxd/idxd.h                       |   4 +-
>  drivers/dma/idxd/init.c                       |  36 ++--
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +-
>  drivers/iommu/dma-iommu.c                     | 123 +++++++++++++-
>  drivers/iommu/intel/iommu.c                   | 154 +++++++++++++++++-
>  drivers/iommu/ioasid.c                        |   2 +
>  drivers/iommu/iommu-sva-lib.c                 |   1 +
>  drivers/iommu/iommu.c                         |  63 +++++++
>  include/linux/dma-iommu.h                     |  14 ++
>  include/linux/intel-iommu.h                   |   7 +-
>  include/linux/ioasid.h                        |   4 +
>  include/linux/iommu.h                         |  13 ++
>  12 files changed, 390 insertions(+), 33 deletions(-)
>
> --
> 2.25.1
>

Thanks
barry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-01 12:24 ` Barry Song
@ 2021-10-01 12:36   ` Jason Gunthorpe
  2021-10-01 12:45     ` Barry Song
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 12:36 UTC (permalink / raw)
  To: Barry Song
  Cc: Jacob Pan, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

On Sat, Oct 02, 2021 at 01:24:54AM +1300, Barry Song wrote:

> I assume KVA mode can avoid this iotlb flush as the device is using
> the page table of the kernel and sharing the whole kernel space. But
> will users be glad to accept this mode?

You can avoid the lock be identity mapping the physical address space
of the kernel and maping map/unmap a NOP.

KVA is just a different way to achive this identity map with slightly
different security properties than the normal way, but it doesn't
reach to the same security level as proper map/unmap.

I'm not sure anyone who cares about DMA security would see value in
the slight difference between KVA and a normal identity map.

> which have been mapped in the current dma-map/unmap with IOMMU backend.
> some drivers are using bouncing buffer to overcome the performance loss of
> dma_map/unmap as copying is faster than unmapping:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=907676b130711fd1f

It is pretty unforuntate that drivers are hard coding behaviors based
on assumptions of what the portable API is doing under the covers.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-01 12:36   ` Jason Gunthorpe
@ 2021-10-01 12:45     ` Barry Song
  2021-10-04 16:40       ` Jacob Pan
  0 siblings, 1 reply; 29+ messages in thread
From: Barry Song @ 2021-10-01 12:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jacob Pan, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

On Sat, Oct 2, 2021 at 1:36 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Sat, Oct 02, 2021 at 01:24:54AM +1300, Barry Song wrote:
>
> > I assume KVA mode can avoid this iotlb flush as the device is using
> > the page table of the kernel and sharing the whole kernel space. But
> > will users be glad to accept this mode?
>
> You can avoid the lock be identity mapping the physical address space
> of the kernel and maping map/unmap a NOP.
>
> KVA is just a different way to achive this identity map with slightly
> different security properties than the normal way, but it doesn't
> reach to the same security level as proper map/unmap.
>
> I'm not sure anyone who cares about DMA security would see value in
> the slight difference between KVA and a normal identity map.

yes. This is an important question. if users want a high security level,
kva might not their choice; if users don't want the security, they are using
iommu passthrough. So when will users choose KVA?

>
> > which have been mapped in the current dma-map/unmap with IOMMU backend.
> > some drivers are using bouncing buffer to overcome the performance loss of
> > dma_map/unmap as copying is faster than unmapping:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=907676b130711fd1f
>
> It is pretty unforuntate that drivers are hard coding behaviors based
> on assumptions of what the portable API is doing under the covers.

not real when it has a tx_copybreak which can be set by ethtool or
similar userspace
tools . if users are using iommu passthrough, copying won't happen by
the default
tx_copybreak.  if users are using restrict iommu mode, socket buffers
are copied into
the buffers allocated and mapped in the driver. so this won't require
mapping and
unmapping socket buffers frequently.

>
> Jason

Thanks
barry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-01 12:45     ` Barry Song
@ 2021-10-04 16:40       ` Jacob Pan
  2021-10-04 18:21         ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Jacob Pan @ 2021-10-04 16:40 UTC (permalink / raw)
  To: Barry Song
  Cc: Jason Gunthorpe, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin, jacob.jun.pan

Hi Barry,

On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cnbao@gmail.com> wrote:

> >  
> > > I assume KVA mode can avoid this iotlb flush as the device is using
> > > the page table of the kernel and sharing the whole kernel space. But
> > > will users be glad to accept this mode?  
> >
> > You can avoid the lock be identity mapping the physical address space
> > of the kernel and maping map/unmap a NOP.
> >
> > KVA is just a different way to achive this identity map with slightly
> > different security properties than the normal way, but it doesn't
> > reach to the same security level as proper map/unmap.
> >
> > I'm not sure anyone who cares about DMA security would see value in
> > the slight difference between KVA and a normal identity map.  
> 
> yes. This is an important question. if users want a high security level,
> kva might not their choice; if users don't want the security, they are
> using iommu passthrough. So when will users choose KVA?
Right, KVAs sit in the middle in terms of performance and security.
Performance is better than IOVA due to IOTLB flush as you mentioned. Also
not too far behind of pass-through.

Security-wise, KVA respects kernel mapping. So permissions are better
enforced than pass-through and identity mapping.
To balance performance and security, we are proposing KVA is only supported
on trusted devices. On an Intel platform, it would be based on ACPI SATC
(SoC Integrated Address Translation Cache (SATC) reporting structure, VT-d
spec. 8.2). I am also adding a kernel iommu parameter to allow user
override.


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-04 16:40       ` Jacob Pan
@ 2021-10-04 18:21         ` Jason Gunthorpe
  2021-10-07  5:43           ` Barry Song
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-04 18:21 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Barry Song, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

On Mon, Oct 04, 2021 at 09:40:03AM -0700, Jacob Pan wrote:
> Hi Barry,
> 
> On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cnbao@gmail.com> wrote:
> 
> > >  
> > > > I assume KVA mode can avoid this iotlb flush as the device is using
> > > > the page table of the kernel and sharing the whole kernel space. But
> > > > will users be glad to accept this mode?  
> > >
> > > You can avoid the lock be identity mapping the physical address space
> > > of the kernel and maping map/unmap a NOP.
> > >
> > > KVA is just a different way to achive this identity map with slightly
> > > different security properties than the normal way, but it doesn't
> > > reach to the same security level as proper map/unmap.
> > >
> > > I'm not sure anyone who cares about DMA security would see value in
> > > the slight difference between KVA and a normal identity map.  
> > 
> > yes. This is an important question. if users want a high security level,
> > kva might not their choice; if users don't want the security, they are
> > using iommu passthrough. So when will users choose KVA?
> Right, KVAs sit in the middle in terms of performance and security.
> Performance is better than IOVA due to IOTLB flush as you mentioned. Also
> not too far behind of pass-through.

The IOTLB flush is not on a DMA path but on a vmap path, so it is very
hard to compare the two things.. Maybe vmap can be made to do lazy
IOTLB flush or something and it could be closer

> Security-wise, KVA respects kernel mapping. So permissions are better
> enforced than pass-through and identity mapping.

Is this meaningful? Isn't the entire physical map still in the KVA and
isn't it entirely RW ?

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-04 18:21         ` Jason Gunthorpe
@ 2021-10-07  5:43           ` Barry Song
  2021-10-07 11:32             ` Jason Gunthorpe
  2021-10-07 19:11             ` Jacob Pan
  0 siblings, 2 replies; 29+ messages in thread
From: Barry Song @ 2021-10-07  5:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jacob Pan, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

On Tue, Oct 5, 2021 at 7:21 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Mon, Oct 04, 2021 at 09:40:03AM -0700, Jacob Pan wrote:
> > Hi Barry,
> >
> > On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cnbao@gmail.com> wrote:
> >
> > > >
> > > > > I assume KVA mode can avoid this iotlb flush as the device is using
> > > > > the page table of the kernel and sharing the whole kernel space. But
> > > > > will users be glad to accept this mode?
> > > >
> > > > You can avoid the lock be identity mapping the physical address space
> > > > of the kernel and maping map/unmap a NOP.
> > > >
> > > > KVA is just a different way to achive this identity map with slightly
> > > > different security properties than the normal way, but it doesn't
> > > > reach to the same security level as proper map/unmap.
> > > >
> > > > I'm not sure anyone who cares about DMA security would see value in
> > > > the slight difference between KVA and a normal identity map.
> > >
> > > yes. This is an important question. if users want a high security level,
> > > kva might not their choice; if users don't want the security, they are
> > > using iommu passthrough. So when will users choose KVA?
> > Right, KVAs sit in the middle in terms of performance and security.
> > Performance is better than IOVA due to IOTLB flush as you mentioned. Also
> > not too far behind of pass-through.
>
> The IOTLB flush is not on a DMA path but on a vmap path, so it is very
> hard to compare the two things.. Maybe vmap can be made to do lazy
> IOTLB flush or something and it could be closer
>
> > Security-wise, KVA respects kernel mapping. So permissions are better
> > enforced than pass-through and identity mapping.
>
> Is this meaningful? Isn't the entire physical map still in the KVA and
> isn't it entirely RW ?

Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY.
But the difference is really minor.

So do we have a case where devices can directly access the kernel's data
structure such as a list/graph/tree with pointers to a kernel virtual address?
then devices don't need to translate the address of pointers in a structure.
I assume this is one of the most useful features userspace SVA can provide.

But do we have a case where accelerators/GPU want to use the complex data
structures of kernel drivers?

>
> Jason

Thanks
barry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-07  5:43           ` Barry Song
@ 2021-10-07 11:32             ` Jason Gunthorpe
  2021-10-07 11:54               ` Barry Song
  2021-10-07 19:11             ` Jacob Pan
  1 sibling, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-07 11:32 UTC (permalink / raw)
  To: Barry Song
  Cc: Jacob Pan, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:

> So do we have a case where devices can directly access the kernel's data
> structure such as a list/graph/tree with pointers to a kernel virtual address?
> then devices don't need to translate the address of pointers in a structure.
> I assume this is one of the most useful features userspace SVA can provide.

AFIACT that is the only good case for KVA, but it is also completely
against the endianess, word size and DMA portability design of the
kernel.

Going there requires some new set of portable APIs for gobally
coherent KVA dma.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-07 11:32             ` Jason Gunthorpe
@ 2021-10-07 11:54               ` Barry Song
  2021-10-07 11:59                 ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Barry Song @ 2021-10-07 11:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jacob Pan, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:
>
> > So do we have a case where devices can directly access the kernel's data
> > structure such as a list/graph/tree with pointers to a kernel virtual address?
> > then devices don't need to translate the address of pointers in a structure.
> > I assume this is one of the most useful features userspace SVA can provide.
>
> AFIACT that is the only good case for KVA, but it is also completely
> against the endianess, word size and DMA portability design of the
> kernel.
>
> Going there requires some new set of portable APIs for gobally
> coherent KVA dma.

yep. I agree. it would be very weird if accelerators/gpu are sharing
kernel' data struct, but for each "DMA" operation - reading or writing
the data struct, we have to call dma_map_single/sg or
dma_sync_single_for_cpu/device etc. It seems once devices and cpus
are sharing virtual address(SVA), code doesn't need to do explicit
map/sync each time.

>
> Jason

Thanks
barry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-07 11:54               ` Barry Song
@ 2021-10-07 11:59                 ` Jason Gunthorpe
  2021-10-07 17:50                   ` Jacob Pan
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-07 11:59 UTC (permalink / raw)
  To: Barry Song
  Cc: Jacob Pan, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

On Fri, Oct 08, 2021 at 12:54:52AM +1300, Barry Song wrote:
> On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:
> >
> > > So do we have a case where devices can directly access the kernel's data
> > > structure such as a list/graph/tree with pointers to a kernel virtual address?
> > > then devices don't need to translate the address of pointers in a structure.
> > > I assume this is one of the most useful features userspace SVA can provide.
> >
> > AFIACT that is the only good case for KVA, but it is also completely
> > against the endianess, word size and DMA portability design of the
> > kernel.
> >
> > Going there requires some new set of portable APIs for gobally
> > coherent KVA dma.
> 
> yep. I agree. it would be very weird if accelerators/gpu are sharing
> kernel' data struct, but for each "DMA" operation - reading or writing
> the data struct, we have to call dma_map_single/sg or
> dma_sync_single_for_cpu/device etc. It seems once devices and cpus
> are sharing virtual address(SVA), code doesn't need to do explicit
> map/sync each time.

No, it still need to do something to manage visibility from the
current CPU to the DMA - it might not be flushing a cache, but it is
probably a arch specific CPU barrier instruction.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-07 17:50                   ` Jacob Pan
@ 2021-10-07 17:48                     ` Jason Gunthorpe
  2021-10-07 18:08                       ` Jacob Pan
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-07 17:48 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Barry Song, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

On Thu, Oct 07, 2021 at 10:50:10AM -0700, Jacob Pan wrote:

> On platforms that are DMA snooped, this barrier is not needed. But I think
> your point is that once we convert to DMA API, the sync/barrier is covered
> by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are good.

No.. my point is that a CPU store release is not necessary a DMA
visiable event on all platforms and things like dma_wmb/rmb() may
still be necessary. This all needs to be architected before anyone
starts writing drivers that assume a coherent DMA model without using
a coherent DMA allocation.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-07 11:59                 ` Jason Gunthorpe
@ 2021-10-07 17:50                   ` Jacob Pan
  2021-10-07 17:48                     ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Jacob Pan @ 2021-10-07 17:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Barry Song, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin, jacob.jun.pan

Hi Jason,

On Thu, 7 Oct 2021 08:59:18 -0300, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Fri, Oct 08, 2021 at 12:54:52AM +1300, Barry Song wrote:
> > On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >  
> > >
> > > On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:
> > >  
> > > > So do we have a case where devices can directly access the kernel's
> > > > data structure such as a list/graph/tree with pointers to a kernel
> > > > virtual address? then devices don't need to translate the address
> > > > of pointers in a structure. I assume this is one of the most useful
> > > > features userspace SVA can provide.  
> > >
> > > AFIACT that is the only good case for KVA, but it is also completely
> > > against the endianess, word size and DMA portability design of the
> > > kernel.
> > >
> > > Going there requires some new set of portable APIs for gobally
> > > coherent KVA dma.  
> > 
> > yep. I agree. it would be very weird if accelerators/gpu are sharing
> > kernel' data struct, but for each "DMA" operation - reading or writing
> > the data struct, we have to call dma_map_single/sg or
> > dma_sync_single_for_cpu/device etc. It seems once devices and cpus
> > are sharing virtual address(SVA), code doesn't need to do explicit
> > map/sync each time.  
> 
That is what we have today with sva_bind_device.
> No, it still need to do something to manage visibility from the
> current CPU to the DMA - it might not be flushing a cache, but it is
> probably a arch specific CPU barrier instruction.
> 
Are you talking about iommu_dma_sync_single_for_cpu(), this is not SVA
specific, right?

On platforms that are DMA snooped, this barrier is not needed. But I think
your point is that once we convert to DMA API, the sync/barrier is covered
by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are good.

We could also add a check for dev_is_dma_coherent(dev) before using SVA.

> Jason


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-07 17:48                     ` Jason Gunthorpe
@ 2021-10-07 18:08                       ` Jacob Pan
  0 siblings, 0 replies; 29+ messages in thread
From: Jacob Pan @ 2021-10-07 18:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Barry Song, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin, jacob.jun.pan

Hi Jason,

On Thu, 7 Oct 2021 14:48:22 -0300, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Thu, Oct 07, 2021 at 10:50:10AM -0700, Jacob Pan wrote:
> 
> > On platforms that are DMA snooped, this barrier is not needed. But I
> > think your point is that once we convert to DMA API, the sync/barrier
> > is covered by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are
> > good.  
> 
> No.. my point is that a CPU store release is not necessary a DMA
> visiable event on all platforms and things like dma_wmb/rmb() may
> still be necessary. This all needs to be architected before anyone
> starts writing drivers that assume a coherent DMA model without using
> a coherent DMA allocation.
> 
Why is that specific to SVA? Or you are talking about things in general?

Can we ensure coherency at the API level where SVA bind device is
happening? i.e. fail the bind if not passing coherency check.

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-07 19:11             ` Jacob Pan
@ 2021-10-07 19:10               ` Jason Gunthorpe
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-07 19:10 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Barry Song, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin

On Thu, Oct 07, 2021 at 12:11:27PM -0700, Jacob Pan wrote:
> Hi Barry,
> 
> On Thu, 7 Oct 2021 18:43:33 +1300, Barry Song <21cnbao@gmail.com> wrote:
> 
> > > > Security-wise, KVA respects kernel mapping. So permissions are better
> > > > enforced than pass-through and identity mapping.  
> > >
> > > Is this meaningful? Isn't the entire physical map still in the KVA and
> > > isn't it entirely RW ?  
> > 
> > Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY.
> > But the difference is really minor.
> That brought up a good point if we were to use DMA API to give out KVA as
> dma_addr for trusted devices. We cannot satisfy DMA direction requirements
> since we can't change kernel mapping. It will be similar to DMA direct
> where dir is ignored AFAICT.

Right.

Using the DMA API to DMA to read only kernel memory is a bug in the
first place.

> Or we are saying if the device is trusted, using pass-through is allowed.
> i.e. physical address.

I don't see trusted being relavent here beyond the usual decision to
use the trusted map or not.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
  2021-10-07  5:43           ` Barry Song
  2021-10-07 11:32             ` Jason Gunthorpe
@ 2021-10-07 19:11             ` Jacob Pan
  2021-10-07 19:10               ` Jason Gunthorpe
  1 sibling, 1 reply; 29+ messages in thread
From: Jacob Pan @ 2021-10-07 19:11 UTC (permalink / raw)
  To: Barry Song
  Cc: Jason Gunthorpe, iommu, LKML, Joerg Roedel, Christoph Hellwig,
	Lu Baolu, Raj Ashok, Kumar, Sanjay K, Dave Jiang, Tony Luck,
	mike.campin, Yi Liu, Tian, Kevin, jacob.jun.pan

Hi Barry,

On Thu, 7 Oct 2021 18:43:33 +1300, Barry Song <21cnbao@gmail.com> wrote:

> > > Security-wise, KVA respects kernel mapping. So permissions are better
> > > enforced than pass-through and identity mapping.  
> >
> > Is this meaningful? Isn't the entire physical map still in the KVA and
> > isn't it entirely RW ?  
> 
> Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY.
> But the difference is really minor.
That brought up a good point if we were to use DMA API to give out KVA as
dma_addr for trusted devices. We cannot satisfy DMA direction requirements
since we can't change kernel mapping. It will be similar to DMA direct
where dir is ignored AFAICT.

Or we are saying if the device is trusted, using pass-through is allowed.
i.e. physical address.

Thoughts?

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-10-07 19:10 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21 20:29 [RFC 0/7] Support in-kernel DMA with PASID and SVA Jacob Pan
2021-09-21 20:29 ` [RFC 1/7] ioasid: reserve special PASID for in-kernel DMA Jacob Pan
2021-09-21 20:29 ` [RFC 2/7] dma-iommu: Add API for DMA request with PASID Jacob Pan
2021-09-21 20:29 ` [RFC 3/7] iommu/vt-d: Add DMA w/ PASID support for PA and IOVA Jacob Pan
2021-09-21 20:29 ` [RFC 4/7] dma-iommu: Add support for DMA w/ PASID in KVA Jacob Pan
2021-09-21 20:29 ` [RFC 5/7] iommu/vt-d: Add support for KVA PASID mode Jacob Pan
2021-09-21 20:29 ` [RFC 6/7] iommu: Add KVA map API Jacob Pan
2021-09-21 20:29 ` [RFC 7/7] dma/idxd: Use dma-iommu PASID API instead of SVA lib Jacob Pan
2021-09-22 17:04 ` [RFC 0/7] Support in-kernel DMA with PASID and SVA Jason Gunthorpe
2021-09-29 19:37 ` Jacob Pan
2021-09-29 19:39   ` Jason Gunthorpe
2021-09-29 22:57     ` Jacob Pan
2021-09-29 23:43       ` Jason Gunthorpe
2021-09-30 14:22         ` Campin, Mike
2021-09-30 15:21           ` Jacob Pan
2021-10-01 12:24 ` Barry Song
2021-10-01 12:36   ` Jason Gunthorpe
2021-10-01 12:45     ` Barry Song
2021-10-04 16:40       ` Jacob Pan
2021-10-04 18:21         ` Jason Gunthorpe
2021-10-07  5:43           ` Barry Song
2021-10-07 11:32             ` Jason Gunthorpe
2021-10-07 11:54               ` Barry Song
2021-10-07 11:59                 ` Jason Gunthorpe
2021-10-07 17:50                   ` Jacob Pan
2021-10-07 17:48                     ` Jason Gunthorpe
2021-10-07 18:08                       ` Jacob Pan
2021-10-07 19:11             ` Jacob Pan
2021-10-07 19:10               ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).