LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support
@ 2019-05-03 22:32 Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 01/16] iommu: Introduce attach/detach_pasid_table API Jacob Pan
                   ` (16 more replies)
  0 siblings, 17 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to enable SVA virtualization, i.e. shared guest
application address space and physical device DMA address. Only IOMMU portion
of the changes are included in this series. Additional support is needed in
VFIO and QEMU (will be submitted separately) to complete this functionality.

To make incremental changes and reduce the size of each patchset. This series
does not inlcude support for page request services.

In VT-d implementation, PASID table is per device and maintained in the host.
Guest PASID table is shadowed in VMM where virtual IOMMU is emulated.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables


This work is based on collaboration with other developers on the IOMMU
mailing list. Notably,

[1] [PATCH v6 00/22] SMMUv3 Nested Stage Setup by Eric Auger
https://lkml.org/lkml/2019/3/17/124

[2] [RFC PATCH 2/6] drivers core: Add I/O ASID allocator by Jean-Philippe
Brucker
https://www.spinics.net/lists/iommu/msg30639.html

[3] [RFC PATCH 0/5] iommu: APIs for paravirtual PASID allocation by Lu Baolu
https://lkml.org/lkml/2018/11/12/1921

[4] [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual
    Address (SVA)
    https://lwn.net/Articles/754331/

There are roughly three parts:
1. Generic PASID allocator [1] with extension to support custom allocator
2. IOMMU cache invalidation passdown from guest to host
3. Guest PASID bind for nested translation

All generic IOMMU APIs are reused from [1], which has a v7 just published with
no real impact to the patches used here. It is worth noting that unlike sMMU
nested stage setup, where PASID table is owned by the guest, VT-d PASID table is
owned by the host, individual PASIDs are bound instead of the PASID table.

This series is based on the new VT-d 3.0 Specification (https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf).
This is different than the older series in [4] which was based on the older
specification that does not have scalable mode.


ChangeLog:
	- V3
	  - Addressed thorough review comments from Eric Auger (Thank you!)
	  - Moved IOASID allocator from driver core to IOMMU code per
	    suggestion by Christoph Hellwig
	    (https://lkml.org/lkml/2019/4/26/462)
	  - Rebased on top of Jean's SVA API branch and Eric's v7[1]
	    (git://linux-arm.org/linux-jpb.git sva/api)
	  - All IOMMU APIs are unmodified (except the new bind guest PASID
	    call in patch 9/16)

	- V2
	  - Rebased on Joerg's IOMMU x86/vt-d branch v5.1-rc4
	  - Integrated with Eric Auger's new v7 series for common APIs
	  (https://github.com/eauger/linux/tree/v5.1-rc3-2stage-v7)
	  - Addressed review comments from Andy Shevchenko and Alex Williamson on
	    IOASID custom allocator.
	  - Support multiple custom IOASID allocators (vIOMMUs) and dynamic
	    registration.


Jacob Pan (13):
  iommu: Introduce attach/detach_pasid_table API
  ioasid: Add custom IOASID allocator
  iommu/vt-d: Add custom allocator for IOASID
  iommu/vtd: Optimize tlb invalidation for vIOMMU
  iommu/vt-d: Replace Intel specific PASID allocator with IOASID
  iommu: Introduce guest PASID bind function
  iommu/vt-d: Move domain helper to header
  iommu/vt-d: Avoid duplicated code for PASID setup
  iommu/vt-d: Add nested translation helper function
  iommu/vt-d: Clean up for SVM device list
  iommu/vt-d: Add bind guest PASID support
  iommu/vt-d: Support flushing more translation cache types
  iommu/vt-d: Add svm/sva invalidate function

Jean-Philippe Brucker (1):
  iommu: Add I/O ASID allocator

Liu, Yi L (1):
  iommu: Introduce cache_invalidate API

Lu Baolu (1):
  iommu/vt-d: Enlightened PASID allocation

 drivers/iommu/Kconfig       |   7 ++
 drivers/iommu/Makefile      |   1 +
 drivers/iommu/dmar.c        |  50 ++++++++
 drivers/iommu/intel-iommu.c | 241 ++++++++++++++++++++++++++++++++++--
 drivers/iommu/intel-pasid.c | 223 +++++++++++++++++++++++++--------
 drivers/iommu/intel-pasid.h |  24 +++-
 drivers/iommu/intel-svm.c   | 293 +++++++++++++++++++++++++++++++++++---------
 drivers/iommu/ioasid.c      | 265 +++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c       |  53 ++++++++
 include/linux/intel-iommu.h |  41 ++++++-
 include/linux/intel-svm.h   |   7 ++
 include/linux/ioasid.h      |  67 ++++++++++
 include/linux/iommu.h       |  43 ++++++-
 include/uapi/linux/iommu.h  | 140 +++++++++++++++++++++
 14 files changed, 1328 insertions(+), 127 deletions(-)
 create mode 100644 drivers/iommu/ioasid.c
 create mode 100644 include/linux/ioasid.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 01/16] iommu: Introduce attach/detach_pasid_table API
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 02/16] iommu: Introduce cache_invalidate API Jacob Pan
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu, Yi L

In virtualization use case, when a guest is assigned
a PCI host device, protected by a virtual IOMMU on the guest,
the physical IOMMU must be programmed to be consistent with
the guest mappings. If the physical IOMMU supports two
translation stages it makes sense to program guest mappings
onto the first stage/level (ARM/Intel terminology) while the host
owns the stage/level 2.

In that case, it is mandated to trap on guest configuration
settings and pass those to the physical iommu driver.

This patch adds a new API to the iommu subsystem that allows
to set/unset the pasid table information.

A generic iommu_pasid_table_config struct is introduced in
a new iommu.h uapi header. This is going to be used by the VFIO
user API.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

---

This patch generalizes the API introduced by Jacob & co-authors in
https://lwn.net/Articles/754331/

v4 -> v5:
- no returned valued for dummy definition of iommu_detach_pasid_table
- fix order in comment
- added Jean's R-b

v3 -> v4:
- s/set_pasid_table/attach_pasid_table
- restore detach_pasid_table. Detach can be used on unwind path.
- add padding
- remove @abort
- signature used for config and format
- add comments for fields in the SMMU struct

v2 -> v3:
- replace unbind/bind by set_pasid_table
- move table pointer and pasid bits in the generic part of the struct

v1 -> v2:
- restore the original pasid table name
- remove the struct device * parameter in the API
- reworked iommu_pasid_smmuv3
---
 drivers/iommu/iommu.c      | 19 +++++++++++++++++++
 include/linux/iommu.h      | 18 ++++++++++++++++++
 include/uapi/linux/iommu.h | 47 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7718568..8df9d34 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1626,6 +1626,25 @@ int iommu_page_response(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_page_response);
 
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+			     struct iommu_pasid_table_config *cfg)
+{
+	if (unlikely(!domain->ops->attach_pasid_table))
+		return -ENODEV;
+
+	return domain->ops->attach_pasid_table(domain, cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
+
+void iommu_detach_pasid_table(struct iommu_domain *domain)
+{
+	if (unlikely(!domain->ops->detach_pasid_table))
+		return;
+
+	domain->ops->detach_pasid_table(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c56ce85..ab4d922 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -264,6 +264,8 @@ struct page_response_msg {
  * @sva_unbind: Unbind process address space from device
  * @sva_get_pasid: Get PASID associated to a SVA handle
  * @page_response: handle page request response
+ * @attach_pasid_table: attach a pasid table
+ * @detach_pasid_table: detach the pasid table
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -323,6 +325,9 @@ struct iommu_ops {
 				      void *drvdata);
 	void (*sva_unbind)(struct iommu_sva *handle);
 	int (*sva_get_pasid)(struct iommu_sva *handle);
+	int (*attach_pasid_table)(struct iommu_domain *domain,
+				  struct iommu_pasid_table_config *cfg);
+	void (*detach_pasid_table)(struct iommu_domain *domain);
 
 	int (*page_response)(struct device *dev, struct page_response_msg *msg);
 
@@ -434,6 +439,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_attach_pasid_table(struct iommu_domain *domain,
+				    struct iommu_pasid_table_config *cfg);
+extern void iommu_detach_pasid_table(struct iommu_domain *domain);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -943,6 +951,13 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
 	return -ENODEV;
 }
 
+static inline
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+			     struct iommu_pasid_table_config *cfg)
+{
+	return -ENODEV;
+}
+
 static inline struct iommu_sva *
 iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
 {
@@ -964,6 +979,9 @@ static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
 	return IOMMU_PASID_INVALID;
 }
 
+static inline
+void iommu_detach_pasid_table(struct iommu_domain *domain) {}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 564e02a..8848514 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -115,4 +115,51 @@ struct iommu_fault {
 		struct iommu_fault_page_request prm;
 	};
 };
+
+/**
+ * SMMUv3 Stream Table Entry stage 1 related information
+ * The PASID table is referred to as the context descriptor (CD) table.
+ *
+ * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
+   or 2-level table)
+ * @s1dss: STE s1dss (specifies the behavior when pasid_bits != 0
+   and no pasid is passed along with the incoming transaction)
+ * Please refer to the smmu 3.x spec (ARM IHI 0070A) for full details
+ */
+struct iommu_pasid_smmuv3 {
+#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
+	__u32	version;
+	__u8 s1fmt;
+	__u8 s1dss;
+	__u8 padding[2];
+};
+
+/**
+ * PASID table data used to bind guest PASID table to the host IOMMU
+ * Note PASID table corresponds to the Context Table on ARM SMMUv3.
+ *
+ * @version: API version to prepare for future extensions
+ * @format: format of the PASID table
+ * @base_ptr: guest physical address of the PASID table
+ * @pasid_bits: number of PASID bits used in the PASID table
+ * @config: indicates whether the guest translation stage must
+ * be translated, bypassed or aborted.
+ */
+struct iommu_pasid_table_config {
+#define PASID_TABLE_CFG_VERSION_1 1
+	__u32	version;
+#define IOMMU_PASID_FORMAT_SMMUV3	1
+	__u32	format;
+	__u64	base_ptr;
+	__u8	pasid_bits;
+#define IOMMU_PASID_CONFIG_TRANSLATE	1
+#define IOMMU_PASID_CONFIG_BYPASS	2
+#define IOMMU_PASID_CONFIG_ABORT	3
+	__u8	config;
+	__u8    padding[6];
+	union {
+		struct iommu_pasid_smmuv3 smmuv3;
+	};
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 01/16] iommu: Introduce attach/detach_pasid_table API Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-13  9:14   ` Auger Eric
  2019-05-03 22:32 ` [PATCH v3 03/16] iommu: Add I/O ASID allocator Jacob Pan
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Liu, Yi L, Liu, Jacob Pan

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>

In any virtualization use case, when the first translation stage
is "owned" by the guest OS, the host IOMMU driver has no knowledge
of caching structure updates unless the guest invalidation activities
are trapped by the virtualizer and passed down to the host.

Since the invalidation data are obtained from user space and will be
written into physical IOMMU, we must allow security check at various
layers. Therefore, generic invalidation data format are proposed here,
model specific IOMMU drivers need to convert them into their own format.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v6 -> v7:
- detail which fields are used for each invalidation type
- add a comment about multiple cache invalidation

v5 -> v6:
- fix merge issue

v3 -> v4:
- full reshape of the API following Alex' comments

v1 -> v2:
- add arch_id field
- renamed tlb_invalidate into cache_invalidate as this API allows
  to invalidate context caches on top of IOTLBs

v1:
renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in
header. Commit message reworded.
---
 drivers/iommu/iommu.c      | 14 ++++++++
 include/linux/iommu.h      | 15 ++++++++-
 include/uapi/linux/iommu.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8df9d34..a2f6f3e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1645,6 +1645,20 @@ void iommu_detach_pasid_table(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
 
+int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+			   struct iommu_cache_invalidate_info *inv_info)
+{
+	int ret = 0;
+
+	if (unlikely(!domain->ops->cache_invalidate))
+		return -ENODEV;
+
+	ret = domain->ops->cache_invalidate(domain, dev, inv_info);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ab4d922..d182525 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -266,6 +266,7 @@ struct page_response_msg {
  * @page_response: handle page request response
  * @attach_pasid_table: attach a pasid table
  * @detach_pasid_table: detach the pasid table
+ * @cache_invalidate: invalidate translation caches
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -328,8 +329,9 @@ struct iommu_ops {
 	int (*attach_pasid_table)(struct iommu_domain *domain,
 				  struct iommu_pasid_table_config *cfg);
 	void (*detach_pasid_table)(struct iommu_domain *domain);
-
 	int (*page_response)(struct device *dev, struct page_response_msg *msg);
+	int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
+				struct iommu_cache_invalidate_info *inv_info);
 
 	unsigned long pgsize_bitmap;
 };
@@ -442,6 +444,9 @@ extern void iommu_detach_device(struct iommu_domain *domain,
 extern int iommu_attach_pasid_table(struct iommu_domain *domain,
 				    struct iommu_pasid_table_config *cfg);
 extern void iommu_detach_pasid_table(struct iommu_domain *domain);
+extern int iommu_cache_invalidate(struct iommu_domain *domain,
+				  struct device *dev,
+				  struct iommu_cache_invalidate_info *inv_info);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -982,6 +987,14 @@ static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
 static inline
 void iommu_detach_pasid_table(struct iommu_domain *domain) {}
 
+static inline int
+iommu_cache_invalidate(struct iommu_domain *domain,
+		       struct device *dev,
+		       struct iommu_cache_invalidate_info *inv_info)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 8848514..fa96ecb 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -162,4 +162,84 @@ struct iommu_pasid_table_config {
 	};
 };
 
+/* defines the granularity of the invalidation */
+enum iommu_inv_granularity {
+	IOMMU_INV_GRANU_DOMAIN,	/* domain-selective invalidation */
+	IOMMU_INV_GRANU_PASID,	/* pasid-selective invalidation */
+	IOMMU_INV_GRANU_ADDR,	/* page-selective invalidation */
+	IOMMU_INVAL_GRANU_NR,   /* number of invalidation granularities */
+};
+
+/**
+ * Address Selective Invalidation Structure
+ *
+ * @flags indicates the granularity of the address-selective invalidation
+ * - if PASID bit is set, @pasid field is populated and the invalidation
+ *   relates to cache entries tagged with this PASID and matching the
+ *   address range.
+ * - if ARCHID bit is set, @archid is populated and the invalidation relates
+ *   to cache entries tagged with this architecture specific id and matching
+ *   the address range.
+ * - Both PASID and ARCHID can be set as they may tag different caches.
+ * - if neither PASID or ARCHID is set, global addr invalidation applies
+ * - LEAF flag indicates whether only the leaf PTE caching needs to be
+ *   invalidated and other paging structure caches can be preserved.
+ * @pasid: process address space id
+ * @archid: architecture-specific id
+ * @addr: first stage/level input address
+ * @granule_size: page/block size of the mapping in bytes
+ * @nb_granules: number of contiguous granules to be invalidated
+ */
+struct iommu_inv_addr_info {
+#define IOMMU_INV_ADDR_FLAGS_PASID	(1 << 0)
+#define IOMMU_INV_ADDR_FLAGS_ARCHID	(1 << 1)
+#define IOMMU_INV_ADDR_FLAGS_LEAF	(1 << 2)
+	__u32	flags;
+	__u32	archid;
+	__u64	pasid;
+	__u64	addr;
+	__u64	granule_size;
+	__u64	nb_granules;
+};
+
+/**
+ * First level/stage invalidation information
+ * @cache: bitfield that allows to select which caches to invalidate
+ * @granularity: defines the lowest granularity used for the invalidation:
+ *     domain > pasid > addr
+ *
+ * Not all the combinations of cache/granularity make sense:
+ *
+ *         type |   DEV_IOTLB   |     IOTLB     |      PASID    |
+ * granularity	|		|		|      cache	|
+ * -------------+---------------+---------------+---------------+
+ * DOMAIN	|	N/A	|       Y	|	Y	|
+ * PASID	|	Y	|       Y	|	Y	|
+ * ADDR		|       Y	|       Y	|	N/A	|
+ *
+ * Invalidations by %IOMMU_INV_GRANU_ADDR use field @addr_info.
+ * Invalidations by %IOMMU_INV_GRANU_PASID use field @pasid.
+ * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument.
+ *
+ * If multiple cache types are invalidated simultaneously, they all
+ * must support the used granularity.
+ */
+struct iommu_cache_invalidate_info {
+#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
+	__u32	version;
+/* IOMMU paging structure cache */
+#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
+#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
+#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
+#define IOMMU_CACHE_TYPE_NR		(3)
+	__u8	cache;
+	__u8	granularity;
+	__u8	padding[2];
+	union {
+		__u64	pasid;
+		struct iommu_inv_addr_info addr_info;
+	};
+};
+
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 03/16] iommu: Add I/O ASID allocator
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 01/16] iommu: Introduce attach/detach_pasid_table API Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 02/16] iommu: Introduce cache_invalidate API Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-21  8:21   ` Auger Eric
  2019-05-21  9:41   ` Auger Eric
  2019-05-03 22:32 ` [PATCH v3 04/16] ioasid: Add custom IOASID allocator Jacob Pan
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

Some devices might support multiple DMA address spaces, in particular
those that have the PCI PASID feature. PASID (Process Address Space ID)
allows to share process address spaces with devices (SVA), partition a
device into VM-assignable entities (VFIO mdev) or simply provide
multiple DMA address space to kernel drivers. Add a global PASID
allocator usable by different drivers at the same time. Name it I/O ASID
to avoid confusion with ASIDs allocated by arch code, which are usually
a separate ID space.

The IOASID space is global. Each device can have its own PASID space,
but by convention the IOMMU ended up having a global PASID space, so
that with SVA, each mm_struct is associated to a single PASID.

The allocator is primarily used by IOMMU subsystem but in rare occasions
drivers would like to allocate PASIDs for devices that aren't managed by
an IOMMU, using the same ID space as IOMMU.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lkml.org/lkml/2019/4/26/462
---
 drivers/iommu/Kconfig  |   6 +++
 drivers/iommu/Makefile |   1 +
 drivers/iommu/ioasid.c | 140 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/ioasid.h |  67 +++++++++++++++++++++++
 4 files changed, 214 insertions(+)
 create mode 100644 drivers/iommu/ioasid.c
 create mode 100644 include/linux/ioasid.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6f07f3b..75e7f97 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -2,6 +2,12 @@
 config IOMMU_IOVA
 	tristate
 
+config IOASID
+	bool
+	help
+	  Enable the I/O Address Space ID allocator. A single ID space shared
+	  between different users.
+
 # IOMMU_API always gets selected by whoever wants it.
 config IOMMU_API
 	bool
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 8c71a15..0efac6f 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOASID) += ioasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
new file mode 100644
index 0000000..99f5e0a
--- /dev/null
+++ b/drivers/iommu/ioasid.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * I/O Address Space ID allocator. There is one global IOASID space, split into
+ * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
+ * free IOASIDs with ioasid_alloc and ioasid_free.
+ */
+#include <linux/xarray.h>
+#include <linux/ioasid.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+struct ioasid_data {
+	ioasid_t id;
+	struct ioasid_set *set;
+	void *private;
+	struct rcu_head rcu;
+};
+
+static DEFINE_XARRAY_ALLOC(ioasid_xa);
+
+/**
+ * ioasid_set_data - Set private data for an allocated ioasid
+ * @ioasid: the ID to set data
+ * @data:   the private data
+ *
+ * For IOASID that is already allocated, private data can be set
+ * via this API. Future lookup can be done via ioasid_find.
+ */
+int ioasid_set_data(ioasid_t ioasid, void *data)
+{
+	struct ioasid_data *ioasid_data;
+	int ret = 0;
+
+	ioasid_data = xa_load(&ioasid_xa, ioasid);
+	if (ioasid_data)
+		ioasid_data->private = data;
+	else
+		ret = -ENOENT;
+
+	/* getter may use the private data */
+	synchronize_rcu();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_set_data);
+
+/**
+ * ioasid_alloc - Allocate an IOASID
+ * @set: the IOASID set
+ * @min: the minimum ID (inclusive)
+ * @max: the maximum ID (inclusive)
+ * @private: data private to the caller
+ *
+ * Allocate an ID between @min and @max (or %0 and %INT_MAX). Return the
+ * allocated ID on success, or INVALID_IOASID on failure. The @private pointer
+ * is stored internally and can be retrieved with ioasid_find().
+ */
+ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
+		      void *private)
+{
+	int id = INVALID_IOASID;
+	struct ioasid_data *data;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return INVALID_IOASID;
+
+	data->set = set;
+	data->private = private;
+
+	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
+		pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
+		goto exit_free;
+	}
+	data->id = id;
+
+exit_free:
+	if (id < 0 || id == INVALID_IOASID) {
+		kfree(data);
+		return INVALID_IOASID;
+	}
+	return id;
+}
+EXPORT_SYMBOL_GPL(ioasid_alloc);
+
+/**
+ * ioasid_free - Free an IOASID
+ * @ioasid: the ID to remove
+ */
+void ioasid_free(ioasid_t ioasid)
+{
+	struct ioasid_data *ioasid_data;
+
+	ioasid_data = xa_erase(&ioasid_xa, ioasid);
+
+	kfree_rcu(ioasid_data, rcu);
+}
+EXPORT_SYMBOL_GPL(ioasid_free);
+
+/**
+ * ioasid_find - Find IOASID data
+ * @set: the IOASID set
+ * @ioasid: the IOASID to find
+ * @getter: function to call on the found object
+ *
+ * The optional getter function allows to take a reference to the found object
+ * under the rcu lock. The function can also check if the object is still valid:
+ * if @getter returns false, then the object is invalid and NULL is returned.
+ *
+ * If the IOASID has been allocated for this set, return the private pointer
+ * passed to ioasid_alloc. Private data can be NULL if not set. Return an error
+ * if the IOASID is not found or not belong to the set.
+ */
+void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
+		  bool (*getter)(void *))
+{
+	void *priv = NULL;
+	struct ioasid_data *ioasid_data;
+
+	rcu_read_lock();
+	ioasid_data = xa_load(&ioasid_xa, ioasid);
+	if (!ioasid_data) {
+		priv = ERR_PTR(-ENOENT);
+		goto unlock;
+	}
+	if (set && ioasid_data->set != set) {
+		/* data found but does not belong to the set */
+		priv = ERR_PTR(-EACCES);
+		goto unlock;
+	}
+	/* Now IOASID and its set is verified, we can return the private data */
+	priv = ioasid_data->private;
+	if (getter && !getter(priv))
+		priv = NULL;
+unlock:
+	rcu_read_unlock();
+
+	return priv;
+}
+EXPORT_SYMBOL_GPL(ioasid_find);
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
new file mode 100644
index 0000000..41de5e4
--- /dev/null
+++ b/include/linux/ioasid.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_IOASID_H
+#define __LINUX_IOASID_H
+
+#define INVALID_IOASID ((ioasid_t)-1)
+typedef unsigned int ioasid_t;
+typedef int (*ioasid_iter_t)(ioasid_t ioasid, void *private, void *data);
+typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data);
+typedef void (*ioasid_free_fn_t)(ioasid_t ioasid, void *data);
+
+struct ioasid_set {
+	int dummy;
+};
+
+struct ioasid_allocator {
+	ioasid_alloc_fn_t alloc;
+	ioasid_free_fn_t free;
+	void *pdata;
+	struct list_head list;
+};
+
+#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
+
+#ifdef CONFIG_IOASID
+ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
+		      void *private);
+void ioasid_free(ioasid_t ioasid);
+
+void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
+		  bool (*getter)(void *));
+int ioasid_register_allocator(struct ioasid_allocator *allocator);
+void ioasid_unregister_allocator(struct ioasid_allocator *allocator);
+
+int ioasid_set_data(ioasid_t ioasid, void *data);
+
+#else /* !CONFIG_IOASID */
+static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
+				    ioasid_t max, void *private)
+{
+	return INVALID_IOASID;
+}
+
+static inline void ioasid_free(ioasid_t ioasid)
+{
+}
+
+static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
+				bool (*getter)(void *))
+{
+	return NULL;
+}
+static inline int ioasid_register_allocator(struct ioasid_allocator *allocator)
+{
+	return -ENODEV;
+}
+
+static inline void ioasid_unregister_allocator(struct ioasid_allocator *allocator)
+{
+}
+
+static inline int ioasid_set_data(ioasid_t ioasid, void *data)
+{
+	return -ENODEV;
+}
+
+#endif /* CONFIG_IOASID */
+#endif /* __LINUX_IOASID_H */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 04/16] ioasid: Add custom IOASID allocator
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (2 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 03/16] iommu: Add I/O ASID allocator Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-21  9:55   ` Auger Eric
  2019-05-03 22:32 ` [PATCH v3 05/16] iommu/vt-d: Enlightened PASID allocation Jacob Pan
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Sometimes, IOASID allocation must be handled by platform specific
code. The use cases are guest vIOMMU and pvIOMMU where IOASIDs need
to be allocated by the host via enlightened or paravirt interfaces.

This patch adds an extension to the IOASID allocator APIs such that
platform drivers can register a custom allocator, possibly at boot
time, to take over the allocation. Xarray is still used for tracking
and searching purposes internal to the IOASID code. Private data of
an IOASID can also be set after the allocation.

There can be multiple custom allocators registered but only one is
used at a time. In case of hot removal of devices that provides the
allocator, all IOASIDs must be freed prior to unregistering the
allocator. Default XArray based allocator cannot be mixed with
custom allocators, i.e. custom allocators will not be used if there
are outstanding IOASIDs allocated by the default XA allocator.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/ioasid.c | 125 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 125 insertions(+)

diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 99f5e0a..ed2915a 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -17,6 +17,100 @@ struct ioasid_data {
 };
 
 static DEFINE_XARRAY_ALLOC(ioasid_xa);
+static DEFINE_MUTEX(ioasid_allocator_lock);
+static struct ioasid_allocator *active_custom_allocator;
+
+static LIST_HEAD(custom_allocators);
+/*
+ * A flag to track if ioasid default allocator is in use, this will
+ * prevent custom allocator from being used. The reason is that custom allocator
+ * must have unadulterated space to track private data with xarray, there cannot
+ * be a mix been default and custom allocated IOASIDs.
+ */
+static int default_allocator_active;
+
+/**
+ * ioasid_register_allocator - register a custom allocator
+ * @allocator: the custom allocator to be registered
+ *
+ * Custom allocators take precedence over the default xarray based allocator.
+ * Private data associated with the ASID are managed by ASID common code
+ * similar to data stored in xa.
+ *
+ * There can be multiple allocators registered but only one is active. In case
+ * of runtime removal of a custom allocator, the next one is activated based
+ * on the registration ordering.
+ */
+int ioasid_register_allocator(struct ioasid_allocator *allocator)
+{
+	struct ioasid_allocator *pallocator;
+	int ret = 0;
+
+	if (!allocator)
+		return -EINVAL;
+
+	mutex_lock(&ioasid_allocator_lock);
+	/*
+	 * No particular preference since all custom allocators end up calling
+	 * the host to allocate IOASIDs. We activate the first one and keep
+	 * the later registered allocators in a list in case the first one gets
+	 * removed due to hotplug.
+	 */
+	if (list_empty(&custom_allocators))
+		active_custom_allocator = allocator;
+	else {
+		/* Check if the allocator is already registered */
+		list_for_each_entry(pallocator, &custom_allocators, list) {
+			if (pallocator == allocator) {
+				pr_err("IOASID allocator already registered\n");
+				ret = -EEXIST;
+				goto out_unlock;
+			}
+		}
+	}
+	list_add_tail(&allocator->list, &custom_allocators);
+
+out_unlock:
+	mutex_unlock(&ioasid_allocator_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_register_allocator);
+
+/**
+ * ioasid_unregister_allocator - Remove a custom IOASID allocator
+ * @allocator: the custom allocator to be removed
+ *
+ * Remove an allocator from the list, activate the next allocator in
+ * the order it was registered.
+ */
+void ioasid_unregister_allocator(struct ioasid_allocator *allocator)
+{
+	if (!allocator)
+		return;
+
+	if (list_empty(&custom_allocators)) {
+		pr_warn("No custom IOASID allocators active!\n");
+		return;
+	}
+
+	mutex_lock(&ioasid_allocator_lock);
+	list_del(&allocator->list);
+	if (list_empty(&custom_allocators)) {
+		pr_info("No custom IOASID allocators\n");
+		/*
+		 * All IOASIDs should have been freed before the last custom
+		 * allocator is unregistered. Unless default allocator is in
+		 * use.
+		 */
+		BUG_ON(!xa_empty(&ioasid_xa) && !default_allocator_active);
+		active_custom_allocator = NULL;
+	} else if (allocator == active_custom_allocator) {
+		active_custom_allocator = list_entry(&custom_allocators, struct ioasid_allocator, list);
+		pr_info("IOASID allocator changed");
+	}
+	mutex_unlock(&ioasid_allocator_lock);
+}
+EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);
 
 /**
  * ioasid_set_data - Set private data for an allocated ioasid
@@ -68,6 +162,29 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
 	data->set = set;
 	data->private = private;
 
+	mutex_lock(&ioasid_allocator_lock);
+	/*
+	 * Use custom allocator if available, otherwise use default.
+	 * However, if there are active IOASIDs already been allocated by default
+	 * allocator, custom allocator cannot be used.
+	 */
+	if (!default_allocator_active && active_custom_allocator) {
+		id = active_custom_allocator->alloc(min, max, active_custom_allocator->pdata);
+		if (id == INVALID_IOASID) {
+			pr_err("Failed ASID allocation by custom allocator\n");
+			mutex_unlock(&ioasid_allocator_lock);
+			goto exit_free;
+		}
+		/*
+		 * Use XA to manage private data also sanitiy check custom
+		 * allocator for duplicates.
+		 */
+		min = id;
+		max = id + 1;
+	} else
+		default_allocator_active = 1;
+	mutex_unlock(&ioasid_allocator_lock);
+
 	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
 		pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
 		goto exit_free;
@@ -91,9 +208,17 @@ void ioasid_free(ioasid_t ioasid)
 {
 	struct ioasid_data *ioasid_data;
 
+	mutex_lock(&ioasid_allocator_lock);
+	if (active_custom_allocator)
+		active_custom_allocator->free(ioasid, active_custom_allocator->pdata);
+	mutex_unlock(&ioasid_allocator_lock);
+
 	ioasid_data = xa_erase(&ioasid_xa, ioasid);
 
 	kfree_rcu(ioasid_data, rcu);
+
+	if (xa_empty(&ioasid_xa))
+		default_allocator_active = 0;
 }
 EXPORT_SYMBOL_GPL(ioasid_free);
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 05/16] iommu/vt-d: Enlightened PASID allocation
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (3 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 04/16] ioasid: Add custom IOASID allocator Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 06/16] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the
IOMMU driver should rely on the emulation software to allocate
and free PASID IDs. The Intel vt-d spec revision 3.0 defines a
register set to support this. This includes a capability register,
a virtual command register and a virtual response register. Refer
to section 10.4.42, 10.4.43, 10.4.44 for more information.

This patch adds the enlightened PASID allocation/free interfaces
via the virtual command register.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-pasid.c | 76 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-pasid.h | 13 +++++++-
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 03b12d2..95f8f0c 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -63,6 +63,82 @@ void *intel_pasid_lookup_id(int pasid)
 	return p;
 }
 
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
+{
+	u64 res;
+	u64 cap;
+	u8 status_code;
+	unsigned long flags;
+	int ret = 0;
+
+	if (!ecap_vcs(iommu->ecap)) {
+		pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
+			iommu->name);
+		return -ENODEV;
+	}
+
+	cap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
+	if (!(cap & DMA_VCS_PAS)) {
+		pr_warn("IOMMU: %s: Emulation software doesn't support PASID allocation\n",
+			iommu->name);
+		return -ENODEV;
+	}
+
+	raw_spin_lock_irqsave(&iommu->register_lock, flags);
+	dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
+	IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+		      !(res & VCMD_VRSP_IP), res);
+	raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+	status_code = VCMD_VRSP_SC(res);
+	switch (status_code) {
+	case VCMD_VRSP_SC_SUCCESS:
+		*pasid = VCMD_VRSP_RESULT(res);
+		break;
+	case VCMD_VRSP_SC_NO_PASID_AVAIL:
+		pr_info("IOMMU: %s: No PASID available\n", iommu->name);
+		ret = -ENOMEM;
+		break;
+	default:
+		ret = -ENODEV;
+		pr_warn("IOMMU: %s: Unkonwn error code %d\n",
+			iommu->name, status_code);
+	}
+
+	return ret;
+}
+
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
+{
+	u64 res;
+	u8 status_code;
+	unsigned long flags;
+
+	if (!ecap_vcs(iommu->ecap)) {
+		pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
+			iommu->name);
+		return;
+	}
+
+	raw_spin_lock_irqsave(&iommu->register_lock, flags);
+	dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE);
+	IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+		      !(res & VCMD_VRSP_IP), res);
+	raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+	status_code = VCMD_VRSP_SC(res);
+	switch (status_code) {
+	case VCMD_VRSP_SC_SUCCESS:
+		break;
+	case VCMD_VRSP_SC_INVALID_PASID:
+		pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
+		break;
+	default:
+		pr_warn("IOMMU: %s: Unkonwn error code %d\n",
+			iommu->name, status_code);
+	}
+}
+
 /*
  * Per device pasid table management:
  */
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 23537b3..4b26ab5 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -19,6 +19,16 @@
 #define PASID_PDE_SHIFT			6
 #define MAX_NR_PASID_BITS		20
 
+/* Virtual command interface for enlightened pasid management. */
+#define VCMD_CMD_ALLOC			0x1
+#define VCMD_CMD_FREE			0x2
+#define VCMD_VRSP_IP			0x1
+#define VCMD_VRSP_SC(e)			(((e) >> 1) & 0x3)
+#define VCMD_VRSP_SC_SUCCESS		0
+#define VCMD_VRSP_SC_NO_PASID_AVAIL	1
+#define VCMD_VRSP_SC_INVALID_PASID	1
+#define VCMD_VRSP_RESULT(e)		(((e) >> 8) & 0xfffff)
+
 /*
  * Domain ID reserved for pasid entries programmed for first-level
  * only and pass-through transfer modes.
@@ -69,5 +79,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct device *dev, int pasid);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, int pasid);
-
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
 #endif /* __INTEL_PASID_H */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 6925a18..bff907b 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -173,6 +173,7 @@
 #define ecap_smpwc(e)		(((e) >> 48) & 0x1)
 #define ecap_flts(e)		(((e) >> 47) & 0x1)
 #define ecap_slts(e)		(((e) >> 46) & 0x1)
+#define ecap_vcs(e)		(((e) >> 44) & 0x1)
 #define ecap_smts(e)		(((e) >> 43) & 0x1)
 #define ecap_dit(e)		((e >> 41) & 0x1)
 #define ecap_pasid(e)		((e >> 40) & 0x1)
@@ -289,6 +290,7 @@
 
 /* PRS_REG */
 #define DMA_PRS_PPR	((u32)1)
+#define DMA_VCS_PAS	((u64)1)
 
 #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)			\
 do {									\
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 06/16] iommu/vt-d: Add custom allocator for IOASID
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (4 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 05/16] iommu/vt-d: Enlightened PASID allocation Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 07/16] iommu/vtd: Optimize tlb invalidation for vIOMMU Jacob Pan
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu

When VT-d driver runs in the guest, PASID allocation must be
performed via virtual command interface. This patch registers a
custom IOASID allocator which takes precedence over the default
XArray based allocator. The resulting IOASID allocation will always
come from the host. This ensures that PASID namespace is system-
wide.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/Kconfig       |  1 +
 drivers/iommu/intel-iommu.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 63 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 75e7f97..d565ef7 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -210,6 +210,7 @@ config INTEL_IOMMU_SVM
 	bool "Support for Shared Virtual Memory with Intel IOMMU"
 	depends on INTEL_IOMMU && X86
 	select PCI_PASID
+	select IOASID
 	select MMU_NOTIFIER
 	help
 	  Shared Virtual Memory (SVM) provides a facility for devices
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d93c4bd..fcc694a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1711,6 +1711,8 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 		if (ecap_prs(iommu->ecap))
 			intel_svm_finish_prq(iommu);
 	}
+	ioasid_unregister_allocator(&iommu->pasid_allocator);
+
 #endif
 }
 
@@ -4811,6 +4813,46 @@ static int __init platform_optin_force_iommu(void)
 	return 1;
 }
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
+{
+	struct intel_iommu *iommu = data;
+	ioasid_t ioasid;
+
+	/*
+	 * VT-d virtual command interface always uses the full 20 bit
+	 * PASID range. Host can partition guest PASID range based on
+	 * policies but it is out of guest's control.
+	 */
+	if (min < PASID_MIN || max > PASID_MAX)
+		return -EINVAL;
+
+	if (vcmd_alloc_pasid(iommu, &ioasid))
+		return INVALID_IOASID;
+
+	return ioasid;
+}
+
+static void intel_ioasid_free(ioasid_t ioasid, void *data)
+{
+	struct iommu_pasid_alloc_info *svm;
+	struct intel_iommu *iommu = data;
+
+	if (!iommu)
+		return;
+	/*
+	 * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
+	 * We can only free the PASID when all the devices are unbond.
+	 */
+	svm = ioasid_find(NULL, ioasid, NULL);
+	if (!svm) {
+		pr_warn("Freeing unbond IOASID %d\n", ioasid);
+		return;
+	}
+	vcmd_free_pasid(iommu, ioasid);
+}
+#endif
+
 int __init intel_iommu_init(void)
 {
 	int ret = -ENODEV;
@@ -4912,6 +4954,24 @@ int __init intel_iommu_init(void)
 				       "%s", iommu->name);
 		iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops);
 		iommu_device_register(&iommu->iommu);
+#ifdef CONFIG_INTEL_IOMMU_SVM
+		if (cap_caching_mode(iommu->cap) && sm_supported(iommu)) {
+			/*
+			 * Register a custom ASID allocator if we are running
+			 * in a guest, the purpose is to have a system wide PASID
+			 * namespace among all PASID users.
+			 * There can be multiple vIOMMUs in each guest but only
+			 * one allocator is active. All vIOMMU allocators will
+			 * eventually be calling the same host allocator.
+			 */
+			iommu->pasid_allocator.alloc = intel_ioasid_alloc;
+			iommu->pasid_allocator.free = intel_ioasid_free;
+			iommu->pasid_allocator.pdata = (void *)iommu;
+			ret = ioasid_register_allocator(&iommu->pasid_allocator);
+			if (ret)
+				pr_warn("Custom PASID allocator registeration failed\n");
+		}
+#endif
 	}
 
 	bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index bff907b..c24c8aa 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -31,6 +31,7 @@
 #include <linux/iommu.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/dmar.h>
+#include <linux/ioasid.h>
 
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
@@ -549,6 +550,7 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	struct page_req_dsc *prq;
 	unsigned char prq_name[16];    /* Name for PRQ interrupt */
+	struct ioasid_allocator pasid_allocator; /* Custom allocator for PASIDs */
 #endif
 	struct q_inval  *qi;            /* Queued invalidation info */
 	u32 *iommu_state; /* Store iommu states between suspend and resume.*/
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 07/16] iommu/vtd: Optimize tlb invalidation for vIOMMU
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (5 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 06/16] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 08/16] iommu/vt-d: Replace Intel specific PASID allocator with IOASID Jacob Pan
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-svm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 8f87304..f5d1e1e 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -211,7 +211,9 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	rcu_read_lock();
 	list_for_each_entry_rcu(sdev, &svm->devs, list) {
 		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm->pasid);
-		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
+		/* for emulated iommu, PASID cache invalidation implies IOTLB/DTLB */
+		if (!cap_caching_mode(svm->iommu->cap))
+			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
 	}
 	rcu_read_unlock();
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 08/16] iommu/vt-d: Replace Intel specific PASID allocator with IOASID
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (6 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 07/16] iommu/vtd: Optimize tlb invalidation for vIOMMU Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 09/16] iommu: Introduce guest PASID bind function Jacob Pan
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Make use of generic IOASID code to manage PASID allocation,
free, and lookup. Replace Intel specific code.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 11 +++++------
 drivers/iommu/intel-pasid.c | 36 ------------------------------------
 drivers/iommu/intel-svm.c   | 37 +++++++++++++++++++++----------------
 3 files changed, 26 insertions(+), 58 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index fcc694a..64af526 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5155,7 +5155,7 @@ static void auxiliary_unlink_device(struct dmar_domain *domain,
 	domain->auxd_refcnt--;
 
 	if (!domain->auxd_refcnt && domain->default_pasid > 0)
-		intel_pasid_free_id(domain->default_pasid);
+		ioasid_free(domain->default_pasid);
 }
 
 static int aux_domain_add_dev(struct dmar_domain *domain,
@@ -5173,10 +5173,9 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
 	if (domain->default_pasid <= 0) {
 		int pasid;
 
-		pasid = intel_pasid_alloc_id(domain, PASID_MIN,
-					     pci_max_pasids(to_pci_dev(dev)),
-					     GFP_KERNEL);
-		if (pasid <= 0) {
+		pasid = ioasid_alloc(NULL, PASID_MIN, pci_max_pasids(to_pci_dev(dev)) - 1,
+				domain);
+		if (pasid == INVALID_IOASID) {
 			pr_err("Can't allocate default pasid\n");
 			return -ENODEV;
 		}
@@ -5212,7 +5211,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
 	spin_unlock(&iommu->lock);
 	spin_unlock_irqrestore(&device_domain_lock, flags);
 	if (!domain->auxd_refcnt && domain->default_pasid > 0)
-		intel_pasid_free_id(domain->default_pasid);
+		ioasid_free(domain->default_pasid);
 
 	return ret;
 }
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 95f8f0c..2ce6ac2 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -26,42 +26,6 @@
  */
 static DEFINE_SPINLOCK(pasid_lock);
 u32 intel_pasid_max_id = PASID_MAX;
-static DEFINE_IDR(pasid_idr);
-
-int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
-{
-	int ret, min, max;
-
-	min = max_t(int, start, PASID_MIN);
-	max = min_t(int, end, intel_pasid_max_id);
-
-	WARN_ON(in_interrupt());
-	idr_preload(gfp);
-	spin_lock(&pasid_lock);
-	ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
-	spin_unlock(&pasid_lock);
-	idr_preload_end();
-
-	return ret;
-}
-
-void intel_pasid_free_id(int pasid)
-{
-	spin_lock(&pasid_lock);
-	idr_remove(&pasid_idr, pasid);
-	spin_unlock(&pasid_lock);
-}
-
-void *intel_pasid_lookup_id(int pasid)
-{
-	void *p;
-
-	spin_lock(&pasid_lock);
-	p = idr_find(&pasid_idr, pasid);
-	spin_unlock(&pasid_lock);
-
-	return p;
-}
 
 int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
 {
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index f5d1e1e..8fff212 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -25,6 +25,7 @@
 #include <linux/dmar.h>
 #include <linux/interrupt.h>
 #include <linux/mm_types.h>
+#include <linux/ioasid.h>
 #include <asm/page.h>
 
 #include "intel-pasid.h"
@@ -334,16 +335,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 		if (pasid_max > intel_pasid_max_id)
 			pasid_max = intel_pasid_max_id;
 
-		/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
-		ret = intel_pasid_alloc_id(svm,
-					   !!cap_caching_mode(iommu->cap),
-					   pasid_max - 1, GFP_KERNEL);
-		if (ret < 0) {
+		/* Do not use PASID 0, reserved for RID to PASID */
+		svm->pasid = ioasid_alloc(NULL, PASID_MIN,
+					pasid_max - 1, svm);
+		if (svm->pasid == INVALID_IOASID) {
 			kfree(svm);
 			kfree(sdev);
+			ret = ENOSPC;
 			goto out;
 		}
-		svm->pasid = ret;
 		svm->notifier.ops = &intel_mmuops;
 		svm->mm = mm;
 		svm->flags = flags;
@@ -353,7 +353,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 		if (mm) {
 			ret = mmu_notifier_register(&svm->notifier, mm);
 			if (ret) {
-				intel_pasid_free_id(svm->pasid);
+				ioasid_free(svm->pasid);
 				kfree(svm);
 				kfree(sdev);
 				goto out;
@@ -369,7 +369,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 		if (ret) {
 			if (mm)
 				mmu_notifier_unregister(&svm->notifier, mm);
-			intel_pasid_free_id(svm->pasid);
+			ioasid_free(svm->pasid);
 			kfree(svm);
 			kfree(sdev);
 			goto out;
@@ -402,7 +402,12 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 	if (!iommu)
 		goto out;
 
-	svm = intel_pasid_lookup_id(pasid);
+	svm = ioasid_find(NULL, pasid, NULL);
+	if (IS_ERR(svm)) {
+		ret = PTR_ERR(svm);
+		goto out;
+	}
+
 	if (!svm)
 		goto out;
 
@@ -424,7 +429,7 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 				kfree_rcu(sdev, rcu);
 
 				if (list_empty(&svm->devs)) {
-					intel_pasid_free_id(svm->pasid);
+					ioasid_free(svm->pasid);
 					if (svm->mm)
 						mmu_notifier_unregister(&svm->notifier, svm->mm);
 
@@ -459,10 +464,11 @@ int intel_svm_is_pasid_valid(struct device *dev, int pasid)
 	if (!iommu)
 		goto out;
 
-	svm = intel_pasid_lookup_id(pasid);
-	if (!svm)
+	svm = ioasid_find(NULL, pasid, NULL);
+	if (IS_ERR(svm)) {
+		ret = PTR_ERR(svm);
 		goto out;
-
+	}
 	/* init_mm is used in this case */
 	if (!svm->mm)
 		ret = 1;
@@ -569,13 +575,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 
 		if (!svm || svm->pasid != req->pasid) {
 			rcu_read_lock();
-			svm = intel_pasid_lookup_id(req->pasid);
+			svm = ioasid_find(NULL, req->pasid, NULL);
 			/* It *can't* go away, because the driver is not permitted
 			 * to unbind the mm while any page faults are outstanding.
 			 * So we only need RCU to protect the internal idr code. */
 			rcu_read_unlock();
-
-			if (!svm) {
+			if (IS_ERR(svm) || !svm) {
 				pr_err("%s: Page request for invalid PASID %d: %08llx %08llx\n",
 				       iommu->name, req->pasid, ((unsigned long long *)req)[0],
 				       ((unsigned long long *)req)[1]);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 09/16] iommu: Introduce guest PASID bind function
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (7 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 08/16] iommu/vt-d: Replace Intel specific PASID allocator with IOASID Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-16 14:14   ` Jean-Philippe Brucker
  2019-05-03 22:32 ` [PATCH v3 10/16] iommu/vt-d: Move domain helper to header Jacob Pan
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Guest shared virtual address (SVA) may require host to shadow guest
PASID tables. Guest PASID can also be allocated from the host via
enlightened interfaces. In this case, guest needs to bind the guest
mm, i.e. cr3 in guest physical address to the actual PASID table in
the host IOMMU. Nesting will be turned on such that guest virtual
address can go through a two level translation:
- 1st level translates GVA to GPA
- 2nd level translates GPA to HPA
This patch introduces APIs to bind guest PASID data to the assigned
device entry in the physical IOMMU. See the diagram below for usage
explaination.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process mm, FL only |
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |
    '-------------'
Guest
------| Shadow |--------------------------|------------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.---------------------.
    |             |   |Set SL to GPA-HPA    |
    |             |   '---------------------'
    '-------------'

Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/iommu/iommu.c      | 20 ++++++++++++++++++++
 include/linux/iommu.h      | 10 ++++++++++
 include/uapi/linux/iommu.h | 15 ++++++++++++++-
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a2f6f3e..f8572d2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1659,6 +1659,26 @@ int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
 
+int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+			struct device *dev, struct gpasid_bind_data *data)
+{
+	if (unlikely(!domain->ops->sva_bind_gpasid))
+		return -ENODEV;
+
+	return domain->ops->sva_bind_gpasid(domain, dev, data);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
+
+int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+			int pasid)
+{
+	if (unlikely(!domain->ops->sva_unbind_gpasid))
+		return -ENODEV;
+
+	return domain->ops->sva_unbind_gpasid(dev, pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d182525..9a69b59 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -268,6 +268,8 @@ struct page_response_msg {
  * @detach_pasid_table: detach the pasid table
  * @cache_invalidate: invalidate translation caches
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @sva_bind_gpasid: bind guest pasid and mm
+ * @sva_unbind_gpasid: unbind guest pasid and mm
  */
 struct iommu_ops {
 	bool (*capable)(enum iommu_cap);
@@ -332,6 +334,10 @@ struct iommu_ops {
 	int (*page_response)(struct device *dev, struct page_response_msg *msg);
 	int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
 				struct iommu_cache_invalidate_info *inv_info);
+	int (*sva_bind_gpasid)(struct iommu_domain *domain,
+			struct device *dev, struct gpasid_bind_data *data);
+
+	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
 
 	unsigned long pgsize_bitmap;
 };
@@ -447,6 +453,10 @@ extern void iommu_detach_pasid_table(struct iommu_domain *domain);
 extern int iommu_cache_invalidate(struct iommu_domain *domain,
 				  struct device *dev,
 				  struct iommu_cache_invalidate_info *inv_info);
+extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+		struct device *dev, struct gpasid_bind_data *data);
+extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
+				struct device *dev, int pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index fa96ecb..3a781df 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -240,6 +240,19 @@ struct iommu_cache_invalidate_info {
 		struct iommu_inv_addr_info addr_info;
 	};
 };
-
+/**
+ * struct gpasid_bind_data - Information about device and guest PASID binding
+ * @gcr3:	Guest CR3 value from guest mm
+ * @pasid:	Process address space ID used for the guest mm
+ * @addr_width:	Guest address width. Paging mode can also be derived.
+ */
+struct gpasid_bind_data {
+	__u64 gcr3;
+	__u32 pasid;
+	__u32 addr_width;
+	__u32 flags;
+#define	IOMMU_SVA_GPASID_SRE	BIT(0) /* supervisor request */
+	__u8 padding[4];
+};
 
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 10/16] iommu/vt-d: Move domain helper to header
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (8 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 09/16] iommu: Introduce guest PASID bind function Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 11/16] iommu/vt-d: Avoid duplicated code for PASID setup Jacob Pan
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Move domainer helper to header to be used by SVA code.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 6 ------
 include/linux/intel-iommu.h | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 64af526..1316c96 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -427,12 +427,6 @@ static void init_translation_status(struct intel_iommu *iommu)
 		iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
 }
 
-/* Convert generic 'struct iommu_domain to private struct dmar_domain */
-static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
-{
-	return container_of(dom, struct dmar_domain, domain);
-}
-
 static int __init intel_iommu_setup(char *str)
 {
 	if (!str)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index c24c8aa..48fa164 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -597,6 +597,12 @@ static inline void __iommu_flush_cache(
 		clflush_cache_range(addr, size);
 }
 
+/* Convert generic 'struct iommu_domain to private struct dmar_domain */
+static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct dmar_domain, domain);
+}
+
 /*
  * 0: readable
  * 1: writable
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 11/16] iommu/vt-d: Avoid duplicated code for PASID setup
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (9 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 10/16] iommu/vt-d: Move domain helper to header Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 12/16] iommu/vt-d: Add nested translation helper function Jacob Pan
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

After each setup for PASID entry, related translation caches must be flushed.
We can combine duplicated code into one function which is less error prone.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-pasid.c | 48 +++++++++++++++++----------------------------
 1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 2ce6ac2..dde05b5 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -520,6 +520,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 		devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }
 
+static inline void pasid_flush_caches(struct intel_iommu *iommu,
+				struct pasid_entry *pte,
+				int pasid, u16 did)
+{
+	if (!ecap_coherent(iommu->ecap))
+		clflush_cache_range(pte, sizeof(*pte));
+
+	if (cap_caching_mode(iommu->cap)) {
+		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
+		iotlb_invalidation_with_pasid(iommu, did, pasid);
+	} else
+		iommu_flush_write_buffer(iommu);
+
+}
+
 /*
  * Set up the scalable mode pasid table entry for first only
  * translation type.
@@ -565,16 +580,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
 	/* Setup Present and PASID Granular Transfer Type: */
 	pasid_set_translation_type(pte, 1);
 	pasid_set_present(pte);
-
-	if (!ecap_coherent(iommu->ecap))
-		clflush_cache_range(pte, sizeof(*pte));
-
-	if (cap_caching_mode(iommu->cap)) {
-		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-		iotlb_invalidation_with_pasid(iommu, did, pasid);
-	} else {
-		iommu_flush_write_buffer(iommu);
-	}
+	pasid_flush_caches(iommu, pte, pasid, did);
 
 	return 0;
 }
@@ -638,16 +644,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 	 */
 	pasid_set_sre(pte);
 	pasid_set_present(pte);
-
-	if (!ecap_coherent(iommu->ecap))
-		clflush_cache_range(pte, sizeof(*pte));
-
-	if (cap_caching_mode(iommu->cap)) {
-		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-		iotlb_invalidation_with_pasid(iommu, did, pasid);
-	} else {
-		iommu_flush_write_buffer(iommu);
-	}
+	pasid_flush_caches(iommu, pte, pasid, did);
 
 	return 0;
 }
@@ -681,16 +678,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 	 */
 	pasid_set_sre(pte);
 	pasid_set_present(pte);
-
-	if (!ecap_coherent(iommu->ecap))
-		clflush_cache_range(pte, sizeof(*pte));
-
-	if (cap_caching_mode(iommu->cap)) {
-		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-		iotlb_invalidation_with_pasid(iommu, did, pasid);
-	} else {
-		iommu_flush_write_buffer(iommu);
-	}
+	pasid_flush_caches(iommu, pte, pasid, did);
 
 	return 0;
 }
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 12/16] iommu/vt-d: Add nested translation helper function
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (10 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 11/16] iommu/vt-d: Avoid duplicated code for PASID setup Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 13/16] iommu/vt-d: Clean up for SVM device list Jacob Pan
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu, Yi L

Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-pasid.c | 93 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-pasid.h | 11 ++++++
 2 files changed, 104 insertions(+)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index dde05b5..d8421f7 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -682,3 +682,96 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 
 	return 0;
 }
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation
+ * which is used for vSVA. The first level page tables are used for
+ * GVA-GPA translation in the guest, second level page tables are used
+ * for GPA to HPA translation.
+ *
+ * @iommu:      Iommu which the device belong to
+ * @dev:        Device to be set up for translation
+ * @gpgd:       FLPTPTR: First Level Page translation pointer in GPA
+ * @pasid:      PASID to be programmed in the device PASID table
+ * @flags:      Additional info such as supervisor PASID
+ * @domain:     Domain info for setting up second level page tables
+ * @addr_width: Address width of the first level (guest)
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu,
+			struct device *dev, pgd_t *gpgd,
+			int pasid, int flags,
+			struct dmar_domain *domain,
+			int addr_width)
+{
+	struct pasid_entry *pte;
+	struct dma_pte *pgd;
+	u64 pgd_val;
+	int agaw;
+	u16 did;
+
+	if (!ecap_nest(iommu->ecap)) {
+		pr_err("IOMMU: %s: No nested translation support\n",
+		       iommu->name);
+		return -EINVAL;
+	}
+
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (WARN_ON(!pte))
+		return -EINVAL;
+
+	pasid_clear_entry(pte);
+
+	/* Sanity checking performed by caller to make sure address
+	 * width matching in two dimensions:
+	 * 1. CPU vs. IOMMU
+	 * 2. Guest vs. Host.
+	 */
+	switch (addr_width) {
+	case 57:
+		pasid_set_flpm(pte, 1);
+		break;
+	case 48:
+		pasid_set_flpm(pte, 0);
+		break;
+	default:
+		dev_err(dev, "Invalid paging mode %d\n", addr_width);
+		return -EINVAL;
+	}
+
+	/* Setup the first level page table pointer in GPA */
+	pasid_set_flptr(pte, (u64)gpgd);
+	if (flags & PASID_FLAG_SUPERVISOR_MODE) {
+		if (!ecap_srs(iommu->ecap)) {
+			pr_err("No supervisor request support on %s\n",
+			       iommu->name);
+			return -EINVAL;
+		}
+		pasid_set_sre(pte);
+	}
+
+	/* Setup the second level based on the given domain */
+	pgd = domain->pgd;
+
+	for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
+		pgd = phys_to_virt(dma_pte_addr(pgd));
+		if (!dma_pte_present(pgd)) {
+			dev_err(dev, "Invalid domain page table\n");
+			return -EINVAL;
+		}
+	}
+	pgd_val = virt_to_phys(pgd);
+	pasid_set_slptr(pte, pgd_val);
+	pasid_set_fault_enable(pte);
+
+	did = domain->iommu_did[iommu->seq_id];
+	pasid_set_domain_id(pte, did);
+
+	pasid_set_address_width(pte, agaw);
+	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+
+	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+	pasid_set_present(pte);
+	pasid_flush_caches(iommu, pte, pasid, did);
+
+	return 0;
+}
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 4b26ab5..2234fd5 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -42,6 +42,7 @@
  * to vmalloc or even module mappings.
  */
 #define PASID_FLAG_SUPERVISOR_MODE	BIT(0)
+#define PASID_FLAG_NESTED		BIT(1)
 
 struct pasid_dir_entry {
 	u64 val;
@@ -51,6 +52,11 @@ struct pasid_entry {
 	u64 val[8];
 };
 
+#define PASID_ENTRY_PGTT_FL_ONLY	(1)
+#define PASID_ENTRY_PGTT_SL_ONLY	(2)
+#define PASID_ENTRY_PGTT_NESTED		(3)
+#define PASID_ENTRY_PGTT_PT		(4)
+
 /* The representative of a PASID table */
 struct pasid_table {
 	void			*table;		/* pasid table pointer */
@@ -77,6 +83,11 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, int pasid);
+int intel_pasid_setup_nested(struct intel_iommu *iommu,
+			struct device *dev, pgd_t *pgd,
+			int pasid, int flags,
+			struct dmar_domain *domain,
+			int addr_width);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, int pasid);
 int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 13/16] iommu/vt-d: Clean up for SVM device list
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (11 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 12/16] iommu/vt-d: Add nested translation helper function Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 14/16] iommu/vt-d: Add bind guest PASID support Jacob Pan
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Use combined macro for_each_svm_dev() to simplify SVM device iteration.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-svm.c | 79 +++++++++++++++++++++++------------------------
 1 file changed, 39 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 8fff212..068dd9e 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -227,6 +227,9 @@ static const struct mmu_notifier_ops intel_mmuops = {
 
 static DEFINE_MUTEX(pasid_mutex);
 static LIST_HEAD(global_svm_list);
+#define for_each_svm_dev() \
+	list_for_each_entry(sdev, &svm->devs, list)	\
+	if (dev == sdev->dev)				\
 
 int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
 {
@@ -273,15 +276,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 				goto out;
 			}
 
-			list_for_each_entry(sdev, &svm->devs, list) {
-				if (dev == sdev->dev) {
-					if (sdev->ops != ops) {
-						ret = -EBUSY;
-						goto out;
-					}
-					sdev->users++;
-					goto success;
+			for_each_svm_dev() {
+				if (sdev->ops != ops) {
+					ret = -EBUSY;
+					goto out;
 				}
+				sdev->users++;
+				goto success;
 			}
 
 			break;
@@ -411,40 +412,38 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 	if (!svm)
 		goto out;
 
-	list_for_each_entry(sdev, &svm->devs, list) {
-		if (dev == sdev->dev) {
-			ret = 0;
-			sdev->users--;
-			if (!sdev->users) {
-				list_del_rcu(&sdev->list);
-				/* Flush the PASID cache and IOTLB for this device.
-				 * Note that we do depend on the hardware *not* using
-				 * the PASID any more. Just as we depend on other
-				 * devices never using PASIDs that they have no right
-				 * to use. We have a *shared* PASID table, because it's
-				 * large and has to be physically contiguous. So it's
-				 * hard to be as defensive as we might like. */
-				intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
-				intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
-				kfree_rcu(sdev, rcu);
-
-				if (list_empty(&svm->devs)) {
-					ioasid_free(svm->pasid);
-					if (svm->mm)
-						mmu_notifier_unregister(&svm->notifier, svm->mm);
-
-					list_del(&svm->list);
-
-					/* We mandate that no page faults may be outstanding
-					 * for the PASID when intel_svm_unbind_mm() is called.
-					 * If that is not obeyed, subtle errors will happen.
-					 * Let's make them less subtle... */
-					memset(svm, 0x6b, sizeof(*svm));
-					kfree(svm);
-				}
+	for_each_svm_dev() {
+		ret = 0;
+		sdev->users--;
+		if (!sdev->users) {
+			list_del_rcu(&sdev->list);
+			/* Flush the PASID cache and IOTLB for this device.
+			 * Note that we do depend on the hardware *not* using
+			 * the PASID any more. Just as we depend on other
+			 * devices never using PASIDs that they have no right
+			 * to use. We have a *shared* PASID table, because it's
+			 * large and has to be physically contiguous. So it's
+			 * hard to be as defensive as we might like. */
+			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
+			kfree_rcu(sdev, rcu);
+
+			if (list_empty(&svm->devs)) {
+				ioasid_free(svm->pasid);
+				if (svm->mm)
+					mmu_notifier_unregister(&svm->notifier, svm->mm);
+
+				list_del(&svm->list);
+
+				/* We mandate that no page faults may be outstanding
+				 * for the PASID when intel_svm_unbind_mm() is called.
+				 * If that is not obeyed, subtle errors will happen.
+				 * Let's make them less subtle... */
+				memset(svm, 0x6b, sizeof(*svm));
+				kfree(svm);
 			}
-			break;
 		}
+		break;
 	}
  out:
 	mutex_unlock(&pasid_mutex);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 14/16] iommu/vt-d: Add bind guest PASID support
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (12 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 13/16] iommu/vt-d: Clean up for SVM device list Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 15/16] iommu/vt-d: Support flushing more translation cache types Jacob Pan
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu, Yi L

When supporting guest SVA with emulated IOMMU, the guest PASID
table is shadowed in VMM. Updates to guest vIOMMU PASID table
will result in PASID cache flush which will be passed down to
the host as bind guest PASID calls.

For the SL page tables, it will be harvested from device's
default domain (request w/o PASID), or aux domain in case of
mediated device.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c |   4 +
 drivers/iommu/intel-svm.c   | 175 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h |  10 ++-
 include/linux/intel-svm.h   |   7 ++
 4 files changed, 194 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 1316c96..a10cb70 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5770,6 +5770,10 @@ const struct iommu_ops intel_iommu_ops = {
 	.dev_enable_feat	= intel_iommu_dev_enable_feat,
 	.dev_disable_feat	= intel_iommu_dev_disable_feat,
 	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+	.sva_bind_gpasid	= intel_svm_bind_gpasid,
+	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
+#endif
 };
 
 static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 068dd9e..0815615 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -231,6 +231,181 @@ static LIST_HEAD(global_svm_list);
 	list_for_each_entry(sdev, &svm->devs, list)	\
 	if (dev == sdev->dev)				\
 
+int intel_svm_bind_gpasid(struct iommu_domain *domain,
+			struct device *dev,
+			struct gpasid_bind_data *data)
+{
+	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+	struct intel_svm_dev *sdev;
+	struct intel_svm *svm = NULL;
+	struct dmar_domain *ddomain;
+	int ret = 0;
+
+	if (WARN_ON(!iommu) || !data)
+		return -EINVAL;
+
+	if (dev_is_pci(dev)) {
+		/* VT-d supports devices with full 20 bit PASIDs only */
+		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
+			return -EINVAL;
+	}
+
+	if (data->pasid <= 0 || data->pasid >= PASID_MAX)
+		return -EINVAL;
+
+	ddomain = to_dmar_domain(domain);
+	/* REVISIT:
+	 * Sanity check adddress width and paging mode support
+	 * width matching in two dimensions:
+	 * 1. paging mode CPU <= IOMMU
+	 * 2. address width Guest <= Host.
+	 */
+	mutex_lock(&pasid_mutex);
+	svm = ioasid_find(NULL, data->pasid, NULL);
+	if (IS_ERR(svm)) {
+		ret = PTR_ERR(svm);
+		goto out;
+	}
+	if (svm) {
+		/*
+		 * If we found svm for the PASID, there must be at
+		 * least one device bond, otherwise svm should be freed.
+		 */
+		BUG_ON(list_empty(&svm->devs));
+
+		for_each_svm_dev() {
+			/* In case of multiple sub-devices of the same pdev assigned, we should
+			 * allow multiple bind calls with the same PASID and pdev.
+			 */
+			sdev->users++;
+			goto out;
+		}
+	} else {
+		/* We come here when PASID has never been bond to a device. */
+		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
+		if (!svm) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		/* REVISIT: upper layer/VFIO can track host process that bind the PASID.
+		 * ioasid_set = mm might be sufficient for vfio to check pasid VMM
+		 * ownership.
+		 */
+		svm->mm = get_task_mm(current);
+		svm->pasid = data->pasid;
+		refcount_set(&svm->refs, 0);
+		ioasid_set_data(data->pasid, svm);
+		INIT_LIST_HEAD_RCU(&svm->devs);
+		INIT_LIST_HEAD(&svm->list);
+
+		mmput(svm->mm);
+	}
+	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
+	if (!sdev) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	sdev->dev = dev;
+	sdev->users = 1;
+
+	/* Set up device context entry for PASID if not enabled already */
+	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
+	if (ret) {
+		dev_err(dev, "Failed to enable PASID capability\n");
+		kfree(sdev);
+		goto out;
+	}
+
+	/*
+	 * For guest bind, we need to set up PASID table entry as follows:
+	 * - FLPM matches guest paging mode
+	 * - turn on nested mode
+	 * - SL guest address width matching
+	 */
+	ret = intel_pasid_setup_nested(iommu,
+				dev,
+				(pgd_t *)data->gcr3,
+				data->pasid,
+				data->flags,
+				ddomain,
+				data->addr_width);
+	if (ret) {
+		dev_err(dev, "Failed to set up PASID %d in nested mode, Err %d\n",
+			data->pasid, ret);
+		kfree(sdev);
+		goto out;
+	}
+	svm->flags |= SVM_FLAG_GUEST_MODE;
+
+	init_rcu_head(&sdev->rcu);
+	refcount_inc(&svm->refs);
+	list_add_rcu(&sdev->list, &svm->devs);
+ out:
+	mutex_unlock(&pasid_mutex);
+	return ret;
+}
+
+int intel_svm_unbind_gpasid(struct device *dev, int pasid)
+{
+	struct intel_svm_dev *sdev;
+	struct intel_iommu *iommu;
+	struct intel_svm *svm;
+	int ret = -EINVAL;
+
+	mutex_lock(&pasid_mutex);
+	iommu = intel_svm_device_to_iommu(dev);
+	if (!iommu)
+		goto out;
+
+	svm = ioasid_find(NULL, pasid, NULL);
+	if (IS_ERR(svm)) {
+		ret = PTR_ERR(svm);
+		goto out;
+	}
+
+	if (!svm)
+		goto out;
+
+	for_each_svm_dev() {
+		ret = 0;
+		sdev->users--;
+		if (!sdev->users) {
+			list_del_rcu(&sdev->list);
+			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			/* TODO: Drain in flight PRQ for the PASID since it
+			 * may get reused soon, we don't want to
+			 * confuse with its previous live.
+			 * intel_svm_drain_prq(dev, pasid);
+			 */
+			kfree_rcu(sdev, rcu);
+
+			if (list_empty(&svm->devs)) {
+				list_del(&svm->list);
+				kfree(svm);
+				/*
+				 * We do not free PASID here until explicit call
+				 * from VFIO to free. The PASID life cycle
+				 * management is largely tied to VFIO management
+				 * of assigned device life cycles. In case of
+				 * guest exit without a explicit free PASID call,
+				 * the responsibility lies in VFIO layer to free
+				 * the PASIDs allocated for the guest.
+				 * For security reasons, VFIO has to track the
+				 * PASID ownership per guest anyway to ensure
+				 * that PASID allocated by one guest cannot be
+				 * used by another.
+				 */
+				ioasid_set_data(pasid, NULL);
+			}
+		}
+		break;
+	}
+ out:
+	mutex_unlock(&pasid_mutex);
+
+	return ret;
+}
+
 int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
 {
 	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 48fa164..774f368 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -677,7 +677,9 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
 int intel_svm_init(struct intel_iommu *iommu);
 extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-
+extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
+		struct device *dev, struct gpasid_bind_data *data);
+extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
 struct svm_dev_ops;
 
 struct intel_svm_dev {
@@ -693,12 +695,16 @@ struct intel_svm_dev {
 
 struct intel_svm {
 	struct mmu_notifier notifier;
-	struct mm_struct *mm;
+	union {
+		struct mm_struct *mm;
+		u64 gcr3;
+	};
 	struct intel_iommu *iommu;
 	int flags;
 	int pasid;
 	struct list_head devs;
 	struct list_head list;
+	refcount_t refs; /* Number of devices sharing this PASID */
 };
 
 extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index e3f7631..34b0a3b 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -52,6 +52,13 @@ struct svm_dev_ops {
  * do such IOTLB flushes automatically.
  */
 #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
+/*
+ * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a device.
+ * In this case the mm_struct is in the guest kernel or userspace, its life
+ * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API provides
+ * means to bind/unbind guest CR3 with PASIDs allocated for a device.
+ */
+#define SVM_FLAG_GUEST_MODE	(1<<2)
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 15/16] iommu/vt-d: Support flushing more translation cache types
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (13 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 14/16] iommu/vt-d: Add bind guest PASID support Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-03 22:32 ` [PATCH v3 16/16] iommu/vt-d: Add svm/sva invalidate function Jacob Pan
  2019-05-15 16:31 ` [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
IOTLB invalidation may be passed down from outside IOMMU subsystems.
This patch adds invalidation functions that can be used for additional
translation cache types.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/dmar.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h | 21 +++++++++++++++----
 2 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 9c49300..46ad701 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1357,6 +1357,21 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 	qi_submit_sync(&desc, iommu);
 }
 
+/* PASID-based IOTLB Invalidate */
+void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
+		unsigned int size_order, u64 granu, int ih)
+{
+	struct qi_desc desc;
+
+	desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
+		QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
+	desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
+		QI_EIOTLB_AM(size_order);
+	desc.qw2 = 0;
+	desc.qw3 = 0;
+	qi_submit_sync(&desc, iommu);
+}
+
 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 			u16 qdep, u64 addr, unsigned mask)
 {
@@ -1380,6 +1395,41 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	qi_submit_sync(&desc, iommu);
 }
 
+/* PASID-based device IOTLB Invalidate */
+void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+		u32 pasid,  u16 qdep, u64 addr, unsigned size, u64 granu)
+{
+	struct qi_desc desc;
+
+	desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
+		QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
+		QI_DEV_IOTLB_PFSID(pfsid);
+	desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
+
+	/* If S bit is 0, we only flush a single page. If S bit is set,
+	 * The least significant zero bit indicates the size. VT-d spec
+	 * 6.5.2.6
+	 */
+	if (!size)
+		desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
+	else {
+		unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size);
+
+		desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | QI_DEV_EIOTLB_SIZE;
+	}
+	qi_submit_sync(&desc, iommu);
+}
+
+void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid)
+{
+	struct qi_desc desc;
+
+	desc.qw0 = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu) | QI_PC_PASID(pasid);
+	desc.qw1 = 0;
+	desc.qw2 = 0;
+	desc.qw3 = 0;
+	qi_submit_sync(&desc, iommu);
+}
 /*
  * Disable Queued Invalidation interface.
  */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 774f368..6b6522d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -339,7 +339,7 @@ enum {
 #define QI_IOTLB_GRAN(gran) 	(((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
 #define QI_IOTLB_ADDR(addr)	(((u64)addr) & VTD_PAGE_MASK)
 #define QI_IOTLB_IH(ih)		(((u64)ih) << 6)
-#define QI_IOTLB_AM(am)		(((u8)am))
+#define QI_IOTLB_AM(am)		(((u8)am) & 0x3f)
 
 #define QI_CC_FM(fm)		(((u64)fm) << 48)
 #define QI_CC_SID(sid)		(((u64)sid) << 32)
@@ -357,17 +357,22 @@ enum {
 #define QI_PC_DID(did)		(((u64)did) << 16)
 #define QI_PC_GRAN(gran)	(((u64)gran) << 4)
 
-#define QI_PC_ALL_PASIDS	(QI_PC_TYPE | QI_PC_GRAN(0))
-#define QI_PC_PASID_SEL		(QI_PC_TYPE | QI_PC_GRAN(1))
+/* PASID cache invalidation granu */
+#define QI_PC_ALL_PASIDS	0
+#define QI_PC_PASID_SEL		1
 
 #define QI_EIOTLB_ADDR(addr)	((u64)(addr) & VTD_PAGE_MASK)
 #define QI_EIOTLB_GL(gl)	(((u64)gl) << 7)
 #define QI_EIOTLB_IH(ih)	(((u64)ih) << 6)
-#define QI_EIOTLB_AM(am)	(((u64)am))
+#define QI_EIOTLB_AM(am)	(((u64)am) & 0x3f)
 #define QI_EIOTLB_PASID(pasid) 	(((u64)pasid) << 32)
 #define QI_EIOTLB_DID(did)	(((u64)did) << 16)
 #define QI_EIOTLB_GRAN(gran) 	(((u64)gran) << 4)
 
+/* QI Dev-IOTLB inv granu */
+#define QI_DEV_IOTLB_GRAN_ALL		1
+#define QI_DEV_IOTLB_GRAN_PASID_SEL	0
+
 #define QI_DEV_EIOTLB_ADDR(a)	((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE	(((u64)1) << 11)
 #define QI_DEV_EIOTLB_GLOB(g)	((u64)g)
@@ -658,8 +663,16 @@ extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,
 			     u8 fm, u64 type);
 extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 			  unsigned int size_order, u64 type);
+extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
+			u32 pasid, unsigned int size_order, u64 type, int ih);
 extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 			u16 qdep, u64 addr, unsigned mask);
+
+extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+			u32 pasid, u16 qdep, u64 addr, unsigned size, u64 granu);
+
+extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid);
+
 extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
 
 extern int dmar_ir_support(void);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 16/16] iommu/vt-d: Add svm/sva invalidate function
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (14 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 15/16] iommu/vt-d: Support flushing more translation cache types Jacob Pan
@ 2019-05-03 22:32 ` Jacob Pan
  2019-05-15 16:31 ` [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-03 22:32 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu, Yi L

When Shared Virtual Address (SVA) is enabled for a guest OS via
vIOMMU, we need to provide invalidation support at IOMMU API and driver
level. This patch adds Intel VT-d specific function to implement
iommu passdown invalidate API for shared virtual address.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 160 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 160 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a10cb70..94eb211 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5340,6 +5340,165 @@ static void intel_iommu_aux_detach_device(struct iommu_domain *domain,
 	aux_domain_remove_dev(to_dmar_domain(domain), dev);
 }
 
+/*
+ * 2D array for converting and sanitizing IOMMU generic TLB granularity to
+ * VT-d granularity. Invalidation is typically included in the unmap operation
+ * as a result of DMA or VFIO unmap. However, for assigned device where guest
+ * could own the first level page tables without being shadowed by QEMU. In
+ * this case there is no pass down unmap to the host IOMMU as a result of unmap
+ * in the guest. Only invalidations are trapped and passed down.
+ * In all cases, only first level TLB invalidation (request with PASID) can be
+ * passed down, therefore we do not include IOTLB granularity for request
+ * without PASID (second level).
+ *
+ * For an example, to find the VT-d granularity encoding for IOTLB
+ * type and page selective granularity within PASID:
+ * X: indexed by iommu cache type
+ * Y: indexed by enum iommu_inv_granularity
+ * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
+ *
+ * Granu_map array indicates validity of the table. 1: valid, 0: invalid
+ *
+ */
+const static int inv_type_granu_map[IOMMU_CACHE_TYPE_NR][IOMMU_INVAL_GRANU_NR] = {
+	/* PASID based IOTLB, support PASID selective and page selective */
+	{0, 1, 1},
+	/* PASID based dev TLBs, only support all PASIDs or single PASID */
+	{1, 1, 0},
+	/* PASID cache */
+	{1, 1, 0}
+};
+
+const static u64 inv_type_granu_table[IOMMU_CACHE_TYPE_NR][IOMMU_INVAL_GRANU_NR] = {
+	/* PASID based IOTLB */
+	{0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
+	/* PASID based dev TLBs */
+	{QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
+	/* PASID cache */
+	{QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
+};
+
+static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
+{
+	if (type >= IOMMU_CACHE_TYPE_NR || granu >= IOMMU_INVAL_GRANU_NR ||
+		!inv_type_granu_map[type][granu])
+		return -EINVAL;
+
+	*vtd_granu = inv_type_granu_table[type][granu];
+
+	return 0;
+}
+
+static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
+{
+	u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
+
+	/* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
+	 * IOMMU cache invalidate API passes granu_size in bytes, and number of
+	 * granu size in contiguous memory.
+	 */
+	return order_base_2(nr_pages);
+}
+
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct iommu_cache_invalidate_info *inv_info)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct device_domain_info *info;
+	struct intel_iommu *iommu;
+	unsigned long flags;
+	int cache_type;
+	u8 bus, devfn;
+	u16 did, sid;
+	int ret = 0;
+	u64 granu;
+	u64 size;
+
+	if (!inv_info || !dmar_domain ||
+		inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+		return -EINVAL;
+
+	if (!dev || !dev_is_pci(dev))
+		return -ENODEV;
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+
+	spin_lock_irqsave(&device_domain_lock, flags);
+	spin_lock(&iommu->lock);
+	info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
+	if (!info) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	did = dmar_domain->iommu_did[iommu->seq_id];
+	sid = PCI_DEVID(bus, devfn);
+	size = to_vtd_size(inv_info->addr_info.granule_size, inv_info->addr_info.nb_granules);
+
+	for_each_set_bit(cache_type, (unsigned long *)&inv_info->cache, IOMMU_CACHE_TYPE_NR) {
+
+		ret = to_vtd_granularity(cache_type, inv_info->granularity, &granu);
+		if (ret) {
+			pr_err("Invalid cache type and granu combination %d/%d\n", cache_type,
+				inv_info->granularity);
+			break;
+		}
+
+		switch (BIT(cache_type)) {
+		case IOMMU_CACHE_INV_TYPE_IOTLB:
+			if (size && (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
+				pr_err("Address out of range, 0x%llx, size order %llu\n",
+					inv_info->addr_info.addr, size);
+				ret = -ERANGE;
+				goto out_unlock;
+			}
+
+			qi_flush_piotlb(iommu, did, mm_to_dma_pfn(inv_info->addr_info.addr),
+					inv_info->addr_info.pasid,
+					size, granu, inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
+
+			/*
+			 * Always flush device IOTLB if ATS is enabled since guest
+			 * vIOMMU exposes CM = 1, no device IOTLB flush will be passed
+			 * down. REVISIT: cannot assume Linux guest
+			 */
+			if (info->ats_enabled) {
+				qi_flush_dev_piotlb(iommu, sid, info->pfsid,
+						inv_info->addr_info.pasid, info->ats_qdep,
+						inv_info->addr_info.addr, size,
+						granu);
+			}
+			break;
+		case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
+			if (info->ats_enabled) {
+				qi_flush_dev_piotlb(iommu, sid, info->pfsid,
+						inv_info->addr_info.pasid, info->ats_qdep,
+						inv_info->addr_info.addr, size,
+						granu);
+			} else
+				pr_warn("Passdown device IOTLB flush w/o ATS!\n");
+
+			break;
+		case IOMMU_CACHE_INV_TYPE_PASID:
+			qi_flush_pasid_cache(iommu, did, granu, inv_info->pasid);
+
+			break;
+		default:
+			dev_err(dev, "Unsupported IOMMU invalidation type %d\n",
+				cache_type);
+			ret = -EINVAL;
+		}
+	}
+out_unlock:
+	spin_unlock(&iommu->lock);
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+
+	return ret;
+}
+#endif
+
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
 			   size_t size, int iommu_prot)
@@ -5771,6 +5930,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.dev_disable_feat	= intel_iommu_dev_disable_feat,
 	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
 #ifdef CONFIG_INTEL_IOMMU_SVM
+	.cache_invalidate	= intel_iommu_sva_invalidate,
 	.sva_bind_gpasid	= intel_svm_bind_gpasid,
 	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
 #endif
-- 
2.7.4


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-03 22:32 ` [PATCH v3 02/16] iommu: Introduce cache_invalidate API Jacob Pan
@ 2019-05-13  9:14   ` Auger Eric
  2019-05-13 11:20     ` Jean-Philippe Brucker
  0 siblings, 1 reply; 52+ messages in thread
From: Auger Eric @ 2019-05-13  9:14 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi Jacob, Jean-Philippe,

On 5/4/19 12:32 AM, Jacob Pan wrote:
> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> 
> In any virtualization use case, when the first translation stage
> is "owned" by the guest OS, the host IOMMU driver has no knowledge
> of caching structure updates unless the guest invalidation activities
> are trapped by the virtualizer and passed down to the host.
> 
> Since the invalidation data are obtained from user space and will be
> written into physical IOMMU, we must allow security check at various
> layers. Therefore, generic invalidation data format are proposed here,
> model specific IOMMU drivers need to convert them into their own format.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v6 -> v7:
> - detail which fields are used for each invalidation type
> - add a comment about multiple cache invalidation
> 
> v5 -> v6:
> - fix merge issue
> 
> v3 -> v4:
> - full reshape of the API following Alex' comments
> 
> v1 -> v2:
> - add arch_id field
> - renamed tlb_invalidate into cache_invalidate as this API allows
>   to invalidate context caches on top of IOTLBs
> 
> v1:
> renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in
> header. Commit message reworded.
> ---
>  drivers/iommu/iommu.c      | 14 ++++++++
>  include/linux/iommu.h      | 15 ++++++++-
>  include/uapi/linux/iommu.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 108 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 8df9d34..a2f6f3e 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1645,6 +1645,20 @@ void iommu_detach_pasid_table(struct iommu_domain *domain)
>  }
>  EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>  
> +int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
> +			   struct iommu_cache_invalidate_info *inv_info)
> +{
> +	int ret = 0;
> +
> +	if (unlikely(!domain->ops->cache_invalidate))
> +		return -ENODEV;
> +
> +	ret = domain->ops->cache_invalidate(domain, dev, inv_info);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index ab4d922..d182525 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -266,6 +266,7 @@ struct page_response_msg {
>   * @page_response: handle page request response
>   * @attach_pasid_table: attach a pasid table
>   * @detach_pasid_table: detach the pasid table
> + * @cache_invalidate: invalidate translation caches
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   */
>  struct iommu_ops {
> @@ -328,8 +329,9 @@ struct iommu_ops {
>  	int (*attach_pasid_table)(struct iommu_domain *domain,
>  				  struct iommu_pasid_table_config *cfg);
>  	void (*detach_pasid_table)(struct iommu_domain *domain);
> -
>  	int (*page_response)(struct device *dev, struct page_response_msg *msg);
> +	int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
> +				struct iommu_cache_invalidate_info *inv_info);
>  
>  	unsigned long pgsize_bitmap;
>  };
> @@ -442,6 +444,9 @@ extern void iommu_detach_device(struct iommu_domain *domain,
>  extern int iommu_attach_pasid_table(struct iommu_domain *domain,
>  				    struct iommu_pasid_table_config *cfg);
>  extern void iommu_detach_pasid_table(struct iommu_domain *domain);
> +extern int iommu_cache_invalidate(struct iommu_domain *domain,
> +				  struct device *dev,
> +				  struct iommu_cache_invalidate_info *inv_info);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -982,6 +987,14 @@ static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
>  static inline
>  void iommu_detach_pasid_table(struct iommu_domain *domain) {}
>  
> +static inline int
> +iommu_cache_invalidate(struct iommu_domain *domain,
> +		       struct device *dev,
> +		       struct iommu_cache_invalidate_info *inv_info)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #ifdef CONFIG_IOMMU_DEBUGFS
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 8848514..fa96ecb 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -162,4 +162,84 @@ struct iommu_pasid_table_config {
>  	};
>  };

I noticed my qemu integration was currently incorrectly using PASID
invalidation for ASID based invalidation (SMMUV3 Stage1 CMD_TLBI_NH_ASID
invalidation command). So I think we also need ARCHID invalidation.
Sorry for the late notice.
>  
> +/* defines the granularity of the invalidation */
> +enum iommu_inv_granularity {
> +	IOMMU_INV_GRANU_DOMAIN,	/* domain-selective invalidation */
        IOMMU_INV_GRANU_ARCHID, /* archid-selective invalidation */
> +	IOMMU_INV_GRANU_PASID,	/* pasid-selective invalidation */
> +	IOMMU_INV_GRANU_ADDR,	/* page-selective invalidation */
> +	IOMMU_INVAL_GRANU_NR,   /* number of invalidation granularities */
> +};
> +
> +/**
> + * Address Selective Invalidation Structure
> + *
> + * @flags indicates the granularity of the address-selective invalidation
> + * - if PASID bit is set, @pasid field is populated and the invalidation
> + *   relates to cache entries tagged with this PASID and matching the
> + *   address range.
> + * - if ARCHID bit is set, @archid is populated and the invalidation relates
> + *   to cache entries tagged with this architecture specific id and matching
> + *   the address range.
> + * - Both PASID and ARCHID can be set as they may tag different caches.
> + * - if neither PASID or ARCHID is set, global addr invalidation applies
> + * - LEAF flag indicates whether only the leaf PTE caching needs to be
> + *   invalidated and other paging structure caches can be preserved.
> + * @pasid: process address space id
> + * @archid: architecture-specific id
> + * @addr: first stage/level input address
> + * @granule_size: page/block size of the mapping in bytes
> + * @nb_granules: number of contiguous granules to be invalidated
> + */
> +struct iommu_inv_addr_info {
> +#define IOMMU_INV_ADDR_FLAGS_PASID	(1 << 0)
> +#define IOMMU_INV_ADDR_FLAGS_ARCHID	(1 << 1)
> +#define IOMMU_INV_ADDR_FLAGS_LEAF	(1 << 2)
> +	__u32	flags;
> +	__u32	archid;
> +	__u64	pasid;
> +	__u64	addr;
> +	__u64	granule_size;
> +	__u64	nb_granules;
> +};
> +
> +/**
> + * First level/stage invalidation information
> + * @cache: bitfield that allows to select which caches to invalidate
> + * @granularity: defines the lowest granularity used for the invalidation:
> + *     domain > pasid > addr
> + *
> + * Not all the combinations of cache/granularity make sense:
> + *
> + *         type |   DEV_IOTLB   |     IOTLB     |      PASID    |
> + * granularity	|		|		|      cache	|
> + * -------------+---------------+---------------+---------------+
> + * DOMAIN	|	N/A	|       Y	|	Y	|
 * ARCHID       |       N/A     |       Y       |       N/A     |

> + * PASID	|	Y	|       Y	|	Y	|
> + * ADDR		|       Y	|       Y	|	N/A	|
> + *
> + * Invalidations by %IOMMU_INV_GRANU_ADDR use field @addr_info.
 * Invalidations by %IOMMU_INV_GRANU_ARCHID use field @archid.
> + * Invalidations by %IOMMU_INV_GRANU_PASID use field @pasid.
> + * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument.
> + *
> + * If multiple cache types are invalidated simultaneously, they all
> + * must support the used granularity.
> + */
> +struct iommu_cache_invalidate_info {
> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> +	__u32	version;
> +/* IOMMU paging structure cache */
> +#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
> +#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
> +#define IOMMU_CACHE_TYPE_NR		(3)
> +	__u8	cache;
> +	__u8	granularity;
> +	__u8	padding[2];
> +	union {
> +		__u64	pasid;
                __u32   archid;

Thanks

Eric
> +		struct iommu_inv_addr_info addr_info;
> +	};
> +};
> +
> +
>  #endif /* _UAPI_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-13  9:14   ` Auger Eric
@ 2019-05-13 11:20     ` Jean-Philippe Brucker
  2019-05-13 16:50       ` Auger Eric
  0 siblings, 1 reply; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-13 11:20 UTC (permalink / raw)
  To: Auger Eric, Jacob Pan, iommu, LKML, Joerg Roedel,
	David Woodhouse, Alex Williamson
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi Eric,

On 13/05/2019 10:14, Auger Eric wrote:
> I noticed my qemu integration was currently incorrectly using PASID
> invalidation for ASID based invalidation (SMMUV3 Stage1 CMD_TLBI_NH_ASID
> invalidation command). So I think we also need ARCHID invalidation.
> Sorry for the late notice.
>>  
>> +/* defines the granularity of the invalidation */
>> +enum iommu_inv_granularity {
>> +	IOMMU_INV_GRANU_DOMAIN,	/* domain-selective invalidation */
>         IOMMU_INV_GRANU_ARCHID, /* archid-selective invalidation */
>> +	IOMMU_INV_GRANU_PASID,	/* pasid-selective invalidation */

In terms of granularity, these values have the same meaning: invalidate
the whole address space of a context. Then you can communicate two
things using the same struct:
* If ATS is enables an Arm host needs to invalidate all ATC entries
using PASID.
* If BTM isn't used by the guest, the host needs to invalidate all TLB
entries using ARCHID.

Rather than introducing a new granule here, could we just add an archid
field to the struct associated with IOMMU_INV_GRANU_PASID? Something like...

>> +	IOMMU_INV_GRANU_ADDR,	/* page-selective invalidation */
>> +	IOMMU_INVAL_GRANU_NR,   /* number of invalidation granularities */
>> +};
>> +
>> +/**
>> + * Address Selective Invalidation Structure
>> + *
>> + * @flags indicates the granularity of the address-selective invalidation
>> + * - if PASID bit is set, @pasid field is populated and the invalidation
>> + *   relates to cache entries tagged with this PASID and matching the
>> + *   address range.
>> + * - if ARCHID bit is set, @archid is populated and the invalidation relates
>> + *   to cache entries tagged with this architecture specific id and matching
>> + *   the address range.
>> + * - Both PASID and ARCHID can be set as they may tag different caches.
>> + * - if neither PASID or ARCHID is set, global addr invalidation applies
>> + * - LEAF flag indicates whether only the leaf PTE caching needs to be
>> + *   invalidated and other paging structure caches can be preserved.
>> + * @pasid: process address space id
>> + * @archid: architecture-specific id
>> + * @addr: first stage/level input address
>> + * @granule_size: page/block size of the mapping in bytes
>> + * @nb_granules: number of contiguous granules to be invalidated
>> + */
>> +struct iommu_inv_addr_info {
>> +#define IOMMU_INV_ADDR_FLAGS_PASID	(1 << 0)
>> +#define IOMMU_INV_ADDR_FLAGS_ARCHID	(1 << 1)
>> +#define IOMMU_INV_ADDR_FLAGS_LEAF	(1 << 2)
>> +	__u32	flags;
>> +	__u32	archid;
>> +	__u64	pasid;
>> +	__u64	addr;
>> +	__u64	granule_size;
>> +	__u64	nb_granules;
>> +};

struct iommu_inv_pasid_info {
#define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
#define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
	__u32	flags;
	__u32	archid;
	__u64	pasid;
};

>> +
>> +/**
>> + * First level/stage invalidation information
>> + * @cache: bitfield that allows to select which caches to invalidate
>> + * @granularity: defines the lowest granularity used for the invalidation:
>> + *     domain > pasid > addr
>> + *
>> + * Not all the combinations of cache/granularity make sense:
>> + *
>> + *         type |   DEV_IOTLB   |     IOTLB     |      PASID    |
>> + * granularity	|		|		|      cache	|
>> + * -------------+---------------+---------------+---------------+
>> + * DOMAIN	|	N/A	|       Y	|	Y	|
>  * ARCHID       |       N/A     |       Y       |       N/A     |
> 
>> + * PASID	|	Y	|       Y	|	Y	|
>> + * ADDR		|       Y	|       Y	|	N/A	|
>> + *
>> + * Invalidations by %IOMMU_INV_GRANU_ADDR use field @addr_info.
>  * Invalidations by %IOMMU_INV_GRANU_ARCHID use field @archid.
>> + * Invalidations by %IOMMU_INV_GRANU_PASID use field @pasid.
>> + * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument.
>> + *
>> + * If multiple cache types are invalidated simultaneously, they all
>> + * must support the used granularity.
>> + */
>> +struct iommu_cache_invalidate_info {
>> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>> +	__u32	version;
>> +/* IOMMU paging structure cache */
>> +#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
>> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
>> +#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
>> +#define IOMMU_CACHE_TYPE_NR		(3)
>> +	__u8	cache;
>> +	__u8	granularity;
>> +	__u8	padding[2];
>> +	union {
>> +		__u64	pasid;
>                 __u32   archid;

struct iommu_inv_pasid_info pasid_info;

Thanks,
Jean

> 
> Thanks
> 
> Eric
>> +		struct iommu_inv_addr_info addr_info;
>> +	};
>> +};
>> +
>> +
>>  #endif /* _UAPI_IOMMU_H */
>>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-13 11:20     ` Jean-Philippe Brucker
@ 2019-05-13 16:50       ` Auger Eric
  2019-05-13 17:09         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 52+ messages in thread
From: Auger Eric @ 2019-05-13 16:50 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jacob Pan, iommu, LKML, Joerg Roedel,
	David Woodhouse, Alex Williamson
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi Jean-Philippe,

On 5/13/19 1:20 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On 13/05/2019 10:14, Auger Eric wrote:
>> I noticed my qemu integration was currently incorrectly using PASID
>> invalidation for ASID based invalidation (SMMUV3 Stage1 CMD_TLBI_NH_ASID
>> invalidation command). So I think we also need ARCHID invalidation.
>> Sorry for the late notice.
>>>  
>>> +/* defines the granularity of the invalidation */
>>> +enum iommu_inv_granularity {
>>> +	IOMMU_INV_GRANU_DOMAIN,	/* domain-selective invalidation */
>>         IOMMU_INV_GRANU_ARCHID, /* archid-selective invalidation */
>>> +	IOMMU_INV_GRANU_PASID,	/* pasid-selective invalidation */
> 
> In terms of granularity, these values have the same meaning: invalidate
> the whole address space of a context. Then you can communicate two
> things using the same struct:
> * If ATS is enables an Arm host needs to invalidate all ATC entries
> using PASID.
> * If BTM isn't used by the guest, the host needs to invalidate all TLB
> entries using ARCHID.
> 
> Rather than introducing a new granule here, could we just add an archid
> field to the struct associated with IOMMU_INV_GRANU_PASID? Something like...
> 
>>> +	IOMMU_INV_GRANU_ADDR,	/* page-selective invalidation */
>>> +	IOMMU_INVAL_GRANU_NR,   /* number of invalidation granularities */
>>> +};
>>> +
>>> +/**
>>> + * Address Selective Invalidation Structure
>>> + *
>>> + * @flags indicates the granularity of the address-selective invalidation
>>> + * - if PASID bit is set, @pasid field is populated and the invalidation
>>> + *   relates to cache entries tagged with this PASID and matching the
>>> + *   address range.
>>> + * - if ARCHID bit is set, @archid is populated and the invalidation relates
>>> + *   to cache entries tagged with this architecture specific id and matching
>>> + *   the address range.
>>> + * - Both PASID and ARCHID can be set as they may tag different caches.
>>> + * - if neither PASID or ARCHID is set, global addr invalidation applies
>>> + * - LEAF flag indicates whether only the leaf PTE caching needs to be
>>> + *   invalidated and other paging structure caches can be preserved.
>>> + * @pasid: process address space id
>>> + * @archid: architecture-specific id
>>> + * @addr: first stage/level input address
>>> + * @granule_size: page/block size of the mapping in bytes
>>> + * @nb_granules: number of contiguous granules to be invalidated
>>> + */
>>> +struct iommu_inv_addr_info {
>>> +#define IOMMU_INV_ADDR_FLAGS_PASID	(1 << 0)
>>> +#define IOMMU_INV_ADDR_FLAGS_ARCHID	(1 << 1)
>>> +#define IOMMU_INV_ADDR_FLAGS_LEAF	(1 << 2)
>>> +	__u32	flags;
>>> +	__u32	archid;
>>> +	__u64	pasid;
>>> +	__u64	addr;
>>> +	__u64	granule_size;
>>> +	__u64	nb_granules;
>>> +};
> 
> struct iommu_inv_pasid_info {
> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
> 	__u32	flags;
> 	__u32	archid;
> 	__u64	pasid;
> };
I agree it does the job now. However it looks a bit strange to do a
PASID based invalidation in my case - SMMUv3 nested stage - where I
don't have any PASID involved.

Couldn't we call it context based invalidation then? A context can be
tagged by a PASID or/and an ARCHID.

Domain invalidation would invalidate all the contexts belonging to that
domain.

Thanks

Eric
> 
>>> +
>>> +/**
>>> + * First level/stage invalidation information
>>> + * @cache: bitfield that allows to select which caches to invalidate
>>> + * @granularity: defines the lowest granularity used for the invalidation:
>>> + *     domain > pasid > addr
>>> + *
>>> + * Not all the combinations of cache/granularity make sense:
>>> + *
>>> + *         type |   DEV_IOTLB   |     IOTLB     |      PASID    |
>>> + * granularity	|		|		|      cache	|
>>> + * -------------+---------------+---------------+---------------+
>>> + * DOMAIN	|	N/A	|       Y	|	Y	|
>>  * ARCHID       |       N/A     |       Y       |       N/A     |
>>
>>> + * PASID	|	Y	|       Y	|	Y	|
>>> + * ADDR		|       Y	|       Y	|	N/A	|
>>> + *
>>> + * Invalidations by %IOMMU_INV_GRANU_ADDR use field @addr_info.
>>  * Invalidations by %IOMMU_INV_GRANU_ARCHID use field @archid.
>>> + * Invalidations by %IOMMU_INV_GRANU_PASID use field @pasid.
>>> + * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument.
>>> + *
>>> + * If multiple cache types are invalidated simultaneously, they all
>>> + * must support the used granularity.
>>> + */
>>> +struct iommu_cache_invalidate_info {
>>> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>>> +	__u32	version;
>>> +/* IOMMU paging structure cache */
>>> +#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
>>> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
>>> +#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
>>> +#define IOMMU_CACHE_TYPE_NR		(3)
>>> +	__u8	cache;
>>> +	__u8	granularity;
>>> +	__u8	padding[2];
>>> +	union {
>>> +		__u64	pasid;
>>                 __u32   archid;
> 
> struct iommu_inv_pasid_info pasid_info;
> 
> Thanks,
> Jean
> 
>>
>> Thanks
>>
>> Eric
>>> +		struct iommu_inv_addr_info addr_info;
>>> +	};
>>> +};
>>> +
>>> +
>>>  #endif /* _UAPI_IOMMU_H */
>>>
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-13 16:50       ` Auger Eric
@ 2019-05-13 17:09         ` Jean-Philippe Brucker
  2019-05-13 22:16           ` Jacob Pan
  2019-05-14  7:46           ` Auger Eric
  0 siblings, 2 replies; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-13 17:09 UTC (permalink / raw)
  To: Auger Eric, Jacob Pan, iommu, LKML, Joerg Roedel,
	David Woodhouse, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Andriy Shevchenko

On 13/05/2019 17:50, Auger Eric wrote:
>> struct iommu_inv_pasid_info {
>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
>> 	__u32	flags;
>> 	__u32	archid;
>> 	__u64	pasid;
>> };
> I agree it does the job now. However it looks a bit strange to do a
> PASID based invalidation in my case - SMMUv3 nested stage - where I
> don't have any PASID involved.
> 
> Couldn't we call it context based invalidation then? A context can be
> tagged by a PASID or/and an ARCHID.

I think calling it "context" would be confusing as well (I shouldn't
have used it earlier), since VT-d uses that name for device table
entries (=STE on Arm SMMU). Maybe "addr_space"?

Thanks,
Jean

> 
> Domain invalidation would invalidate all the contexts belonging to that
> domain.
> 
> Thanks
> 
> Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-13 17:09         ` Jean-Philippe Brucker
@ 2019-05-13 22:16           ` Jacob Pan
  2019-05-14  7:36             ` Auger Eric
  2019-05-14  7:46           ` Auger Eric
  1 sibling, 1 reply; 52+ messages in thread
From: Jacob Pan @ 2019-05-13 22:16 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Auger Eric, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko,
	jacob.jun.pan

On Mon, 13 May 2019 18:09:48 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 13/05/2019 17:50, Auger Eric wrote:
> >> struct iommu_inv_pasid_info {
> >> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
> >> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
> >> 	__u32	flags;
> >> 	__u32	archid;
> >> 	__u64	pasid;
> >> };  
> > I agree it does the job now. However it looks a bit strange to do a
> > PASID based invalidation in my case - SMMUv3 nested stage - where I
> > don't have any PASID involved.
> > 
> > Couldn't we call it context based invalidation then? A context can
> > be tagged by a PASID or/and an ARCHID.  
> 
> I think calling it "context" would be confusing as well (I shouldn't
> have used it earlier), since VT-d uses that name for device table
> entries (=STE on Arm SMMU). Maybe "addr_space"?
> 
I am still struggling to understand what ARCHID is after scanning
through SMMUv3.1 spec. It seems to be a constant for a given SMMU. Why
do you need to pass it down every time? Could you point to me the
document or explain a little more on ARCHID use cases.
We have three fileds called pasid under this struct
iommu_cache_invalidate_info{}
Gets confusing :)
> Thanks,
> Jean
> 
> > 
> > Domain invalidation would invalidate all the contexts belonging to
> > that domain.
> > 
> > Thanks
> > 
> > Eric  

[Jacob Pan]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-13 22:16           ` Jacob Pan
@ 2019-05-14  7:36             ` Auger Eric
  2019-05-14 10:41               ` Jean-Philippe Brucker
  0 siblings, 1 reply; 52+ messages in thread
From: Auger Eric @ 2019-05-14  7:36 UTC (permalink / raw)
  To: Jacob Pan, Jean-Philippe Brucker
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Tian, Kevin, Raj Ashok, Andriy Shevchenko

Hi Jacob,

On 5/14/19 12:16 AM, Jacob Pan wrote:
> On Mon, 13 May 2019 18:09:48 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> On 13/05/2019 17:50, Auger Eric wrote:
>>>> struct iommu_inv_pasid_info {
>>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
>>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
>>>> 	__u32	flags;
>>>> 	__u32	archid;
>>>> 	__u64	pasid;
>>>> };  
>>> I agree it does the job now. However it looks a bit strange to do a
>>> PASID based invalidation in my case - SMMUv3 nested stage - where I
>>> don't have any PASID involved.
>>>
>>> Couldn't we call it context based invalidation then? A context can
>>> be tagged by a PASID or/and an ARCHID.  
>>
>> I think calling it "context" would be confusing as well (I shouldn't
>> have used it earlier), since VT-d uses that name for device table
>> entries (=STE on Arm SMMU). Maybe "addr_space"?
>>
> I am still struggling to understand what ARCHID is after scanning
> through SMMUv3.1 spec. It seems to be a constant for a given SMMU. Why
> do you need to pass it down every time? Could you point to me the
> document or explain a little more on ARCHID use cases.
> We have three fileds called pasid under this struct
> iommu_cache_invalidate_info{}
> Gets confusing :)
archid is a generic term. That's why you did not find it in the spec ;-)

On ARM SMMU the archid is called the ASID (Address Space ID, up to 16
bits. The ASID is stored in the Context Descriptor Entry (your PASID
entry) and thus characterizes a given stage 1 translation
"context"/"adress space".

At the moment the ASID is allocated per iommu domain. With aux domains
we should have one ASID per aux domain, Jean-Philippe said.

ASID tags IOTLB S1 entries. As the ASID is part of the "context
descriptor" which is owned by the guest, the API must pass it somehow.

4.4.1.2 CMD_TLBI_NH_ASID(VMID, ASID) invalidation command allows to
invalidate all IOTLB S1 entries for a given VMID/ASID and this is the
functionality which is currently missing in the API. This is not an
address based invalidation or a "pure" PASID based invalidation. At the
moment we don't support PASIDs on ARM and I need this capability.

Thanks

Eric



>> Thanks,
>> Jean
>>
>>>
>>> Domain invalidation would invalidate all the contexts belonging to
>>> that domain.
>>>
>>> Thanks
>>>
>>> Eric  
> 
> [Jacob Pan]
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-13 17:09         ` Jean-Philippe Brucker
  2019-05-13 22:16           ` Jacob Pan
@ 2019-05-14  7:46           ` Auger Eric
  2019-05-14 10:42             ` Jean-Philippe Brucker
  1 sibling, 1 reply; 52+ messages in thread
From: Auger Eric @ 2019-05-14  7:46 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jacob Pan, iommu, LKML, Joerg Roedel,
	David Woodhouse, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Andriy Shevchenko

Hi Jean,

On 5/13/19 7:09 PM, Jean-Philippe Brucker wrote:
> On 13/05/2019 17:50, Auger Eric wrote:
>>> struct iommu_inv_pasid_info {
>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
>>> 	__u32	flags;
>>> 	__u32	archid;
>>> 	__u64	pasid;
>>> };
>> I agree it does the job now. However it looks a bit strange to do a
>> PASID based invalidation in my case - SMMUv3 nested stage - where I
>> don't have any PASID involved.
>>
>> Couldn't we call it context based invalidation then? A context can be
>> tagged by a PASID or/and an ARCHID.
> 
> I think calling it "context" would be confusing as well (I shouldn't
> have used it earlier), since VT-d uses that name for device table
> entries (=STE on Arm SMMU). Maybe "addr_space"?
yes you're right. Well we already pasid table table terminology so we
can use it here as well - as long as we understand what purpose it
serves ;-) - So OK for iommu_inv_pasid_info.

I think Jean understood we would keep pasid standalone field in
iommu_cache_invalidate_info's union. I understand the struct
iommu_inv_pasid_info now would replace it, correct?

Thanks

Eric
> 
> Thanks,
> Jean
> 
>>
>> Domain invalidation would invalidate all the contexts belonging to that
>> domain.
>>
>> Thanks
>>
>> Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-14  7:36             ` Auger Eric
@ 2019-05-14 10:41               ` Jean-Philippe Brucker
  2019-05-14 17:44                 ` Jacob Pan
  0 siblings, 1 reply; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-14 10:41 UTC (permalink / raw)
  To: Auger Eric, Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Tian, Kevin, Raj Ashok, Andriy Shevchenko

On 14/05/2019 08:36, Auger Eric wrote:
> Hi Jacob,
> 
> On 5/14/19 12:16 AM, Jacob Pan wrote:
>> On Mon, 13 May 2019 18:09:48 +0100
>> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
>>
>>> On 13/05/2019 17:50, Auger Eric wrote:
>>>>> struct iommu_inv_pasid_info {
>>>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
>>>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
>>>>> 	__u32	flags;
>>>>> 	__u32	archid;
>>>>> 	__u64	pasid;
>>>>> };  
>>>> I agree it does the job now. However it looks a bit strange to do a
>>>> PASID based invalidation in my case - SMMUv3 nested stage - where I
>>>> don't have any PASID involved.
>>>>
>>>> Couldn't we call it context based invalidation then? A context can
>>>> be tagged by a PASID or/and an ARCHID.  
>>>
>>> I think calling it "context" would be confusing as well (I shouldn't
>>> have used it earlier), since VT-d uses that name for device table
>>> entries (=STE on Arm SMMU). Maybe "addr_space"?
>>>
>> I am still struggling to understand what ARCHID is after scanning
>> through SMMUv3.1 spec. It seems to be a constant for a given SMMU. Why
>> do you need to pass it down every time? Could you point to me the
>> document or explain a little more on ARCHID use cases.
>> We have three fileds called pasid under this struct
>> iommu_cache_invalidate_info{}
>> Gets confusing :)
> archid is a generic term. That's why you did not find it in the spec ;-)
> 
> On ARM SMMU the archid is called the ASID (Address Space ID, up to 16
> bits. The ASID is stored in the Context Descriptor Entry (your PASID
> entry) and thus characterizes a given stage 1 translation
> "context"/"adress space".

Yes, another way to look at it is, for a given address space:
* PASID tags device-IOTLB (ATC) entries.
* ASID (here called archid) tags IOTLB entries.

They could have the same value, but it depends on the guest's allocation
policy which isn't in our control. With my PASID patches for SMMUv3,
they have different values. So we need both fields if we intend to
invalidate both ATC and IOTLB with a single call.

Thanks,
Jean

> 
> At the moment the ASID is allocated per iommu domain. With aux domains
> we should have one ASID per aux domain, Jean-Philippe said.
> 
> ASID tags IOTLB S1 entries. As the ASID is part of the "context
> descriptor" which is owned by the guest, the API must pass it somehow.
> 
> 4.4.1.2 CMD_TLBI_NH_ASID(VMID, ASID) invalidation command allows to
> invalidate all IOTLB S1 entries for a given VMID/ASID and this is the
> functionality which is currently missing in the API. This is not an
> address based invalidation or a "pure" PASID based invalidation. At the
> moment we don't support PASIDs on ARM and I need this capability.
> 
> Thanks
> 
> Eric
> 
> 
> 
>>> Thanks,
>>> Jean
>>>
>>>>
>>>> Domain invalidation would invalidate all the contexts belonging to
>>>> that domain.
>>>>
>>>> Thanks
>>>>
>>>> Eric  
>>
>> [Jacob Pan]
>>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-14  7:46           ` Auger Eric
@ 2019-05-14 10:42             ` Jean-Philippe Brucker
  2019-05-14 11:02               ` Auger Eric
  0 siblings, 1 reply; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-14 10:42 UTC (permalink / raw)
  To: Auger Eric, Jacob Pan, iommu, LKML, Joerg Roedel,
	David Woodhouse, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Andriy Shevchenko

On 14/05/2019 08:46, Auger Eric wrote:
> Hi Jean,
> 
> On 5/13/19 7:09 PM, Jean-Philippe Brucker wrote:
>> On 13/05/2019 17:50, Auger Eric wrote:
>>>> struct iommu_inv_pasid_info {
>>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
>>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
>>>> 	__u32	flags;
>>>> 	__u32	archid;
>>>> 	__u64	pasid;
>>>> };
>>> I agree it does the job now. However it looks a bit strange to do a
>>> PASID based invalidation in my case - SMMUv3 nested stage - where I
>>> don't have any PASID involved.
>>>
>>> Couldn't we call it context based invalidation then? A context can be
>>> tagged by a PASID or/and an ARCHID.
>>
>> I think calling it "context" would be confusing as well (I shouldn't
>> have used it earlier), since VT-d uses that name for device table
>> entries (=STE on Arm SMMU). Maybe "addr_space"?
> yes you're right. Well we already pasid table table terminology so we
> can use it here as well - as long as we understand what purpose it
> serves ;-) - So OK for iommu_inv_pasid_info.
> 
> I think Jean understood we would keep pasid standalone field in
> iommu_cache_invalidate_info's union. I understand the struct
> iommu_inv_pasid_info now would replace it, correct?

Yes

Thanks,
Jean

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-14 10:42             ` Jean-Philippe Brucker
@ 2019-05-14 11:02               ` Auger Eric
  2019-05-14 17:55                 ` Jacob Pan
  0 siblings, 1 reply; 52+ messages in thread
From: Auger Eric @ 2019-05-14 11:02 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jacob Pan, iommu, LKML, Joerg Roedel,
	David Woodhouse, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Andriy Shevchenko

Hi Jean,

On 5/14/19 12:42 PM, Jean-Philippe Brucker wrote:
> On 14/05/2019 08:46, Auger Eric wrote:
>> Hi Jean,
>>
>> On 5/13/19 7:09 PM, Jean-Philippe Brucker wrote:
>>> On 13/05/2019 17:50, Auger Eric wrote:
>>>>> struct iommu_inv_pasid_info {
>>>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
>>>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
>>>>> 	__u32	flags;
>>>>> 	__u32	archid;
>>>>> 	__u64	pasid;
>>>>> };
>>>> I agree it does the job now. However it looks a bit strange to do a
>>>> PASID based invalidation in my case - SMMUv3 nested stage - where I
>>>> don't have any PASID involved.
>>>>
>>>> Couldn't we call it context based invalidation then? A context can be
>>>> tagged by a PASID or/and an ARCHID.
>>>
>>> I think calling it "context" would be confusing as well (I shouldn't
>>> have used it earlier), since VT-d uses that name for device table
>>> entries (=STE on Arm SMMU). Maybe "addr_space"?
>> yes you're right. Well we already pasid table table terminology so we
>> can use it here as well - as long as we understand what purpose it
>> serves ;-) - So OK for iommu_inv_pasid_info.
>>
>> I think Jean understood we would keep pasid standalone field in
I meant Jacob here.
>> iommu_cache_invalidate_info's union. I understand the struct
>> iommu_inv_pasid_info now would replace it, correct?

Thank you for the confirmation.

Eric

> 
> Yes
> 
> Thanks,
> Jean
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-14 10:41               ` Jean-Philippe Brucker
@ 2019-05-14 17:44                 ` Jacob Pan
  2019-05-14 17:57                   ` Jacob Pan
  2019-05-15 11:03                   ` Jean-Philippe Brucker
  0 siblings, 2 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-14 17:44 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Auger Eric, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko,
	jacob.jun.pan

Hi Thank you both for the explanation.

On Tue, 14 May 2019 11:41:24 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 14/05/2019 08:36, Auger Eric wrote:
> > Hi Jacob,
> > 
> > On 5/14/19 12:16 AM, Jacob Pan wrote:  
> >> On Mon, 13 May 2019 18:09:48 +0100
> >> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> >>  
> >>> On 13/05/2019 17:50, Auger Eric wrote:  
> >>>>> struct iommu_inv_pasid_info {
> >>>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
> >>>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
> >>>>> 	__u32	flags;
> >>>>> 	__u32	archid;
> >>>>> 	__u64	pasid;
> >>>>> };    
> >>>> I agree it does the job now. However it looks a bit strange to
> >>>> do a PASID based invalidation in my case - SMMUv3 nested stage -
> >>>> where I don't have any PASID involved.
> >>>>
> >>>> Couldn't we call it context based invalidation then? A context
> >>>> can be tagged by a PASID or/and an ARCHID.    
> >>>
> >>> I think calling it "context" would be confusing as well (I
> >>> shouldn't have used it earlier), since VT-d uses that name for
> >>> device table entries (=STE on Arm SMMU). Maybe "addr_space"?
> >>>  
> >> I am still struggling to understand what ARCHID is after scanning
> >> through SMMUv3.1 spec. It seems to be a constant for a given SMMU.
> >> Why do you need to pass it down every time? Could you point to me
> >> the document or explain a little more on ARCHID use cases.
> >> We have three fileds called pasid under this struct
> >> iommu_cache_invalidate_info{}
> >> Gets confusing :)  
> > archid is a generic term. That's why you did not find it in the
> > spec ;-)
> > 
> > On ARM SMMU the archid is called the ASID (Address Space ID, up to
> > 16 bits. The ASID is stored in the Context Descriptor Entry (your
> > PASID entry) and thus characterizes a given stage 1 translation
> > "context"/"adress space".  
> 
> Yes, another way to look at it is, for a given address space:
> * PASID tags device-IOTLB (ATC) entries.
> * ASID (here called archid) tags IOTLB entries.
> 
> They could have the same value, but it depends on the guest's
> allocation policy which isn't in our control. With my PASID patches
> for SMMUv3, they have different values. So we need both fields if we
> intend to invalidate both ATC and IOTLB with a single call.
> 
For ASID invalidation, there is also page/address selective within an
ASID, right? I guess it is CMD_TLBI_NH_VA?
So the single call to invalidate both ATC & IOTLB should share the same
address information. i.e.
struct iommu_inv_addr_info {}

Just out of curiosity, what is the advantage of having guest tag its
ATC with its own PASID? I thought you were planning to use custom
ioasid allocator to get PASID from host.

Also ASID is 16 bit as Eric said and PASID (substreamID?) is 20 bit,
right?

> Thanks,
> Jean
> 
> > 
> > At the moment the ASID is allocated per iommu domain. With aux
> > domains we should have one ASID per aux domain, Jean-Philippe said.
> > 
> > ASID tags IOTLB S1 entries. As the ASID is part of the "context
> > descriptor" which is owned by the guest, the API must pass it
> > somehow.
> > 
> > 4.4.1.2 CMD_TLBI_NH_ASID(VMID, ASID) invalidation command allows to
> > invalidate all IOTLB S1 entries for a given VMID/ASID and this is
> > the functionality which is currently missing in the API. This is
> > not an address based invalidation or a "pure" PASID based
> > invalidation. At the moment we don't support PASIDs on ARM and I
> > need this capability.
> > 
Got it.
> > Thanks
> > 
> > Eric
> > 
> > 
> >   
> >>> Thanks,
> >>> Jean
> >>>  
> >>>>
> >>>> Domain invalidation would invalidate all the contexts belonging
> >>>> to that domain.
> >>>>
> >>>> Thanks
> >>>>
> >>>> Eric    
> >>
> >> [Jacob Pan]
> >>  
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-14 11:02               ` Auger Eric
@ 2019-05-14 17:55                 ` Jacob Pan
  2019-05-15 15:52                   ` Jean-Philippe Brucker
  0 siblings, 1 reply; 52+ messages in thread
From: Jacob Pan @ 2019-05-14 17:55 UTC (permalink / raw)
  To: Auger Eric
  Cc: Jean-Philippe Brucker, iommu, LKML, Joerg Roedel,
	David Woodhouse, Alex Williamson, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko, jacob.jun.pan

On Tue, 14 May 2019 13:02:47 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Jean,
> 
> On 5/14/19 12:42 PM, Jean-Philippe Brucker wrote:
> > On 14/05/2019 08:46, Auger Eric wrote:  
> >> Hi Jean,
> >>
> >> On 5/13/19 7:09 PM, Jean-Philippe Brucker wrote:  
> >>> On 13/05/2019 17:50, Auger Eric wrote:  
> >>>>> struct iommu_inv_pasid_info {
> >>>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
> >>>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
> >>>>> 	__u32	flags;
> >>>>> 	__u32	archid;
> >>>>> 	__u64	pasid;
> >>>>> };  
> >>>> I agree it does the job now. However it looks a bit strange to
> >>>> do a PASID based invalidation in my case - SMMUv3 nested stage -
> >>>> where I don't have any PASID involved.
> >>>>
> >>>> Couldn't we call it context based invalidation then? A context
> >>>> can be tagged by a PASID or/and an ARCHID.  
> >>>
> >>> I think calling it "context" would be confusing as well (I
> >>> shouldn't have used it earlier), since VT-d uses that name for
> >>> device table entries (=STE on Arm SMMU). Maybe "addr_space"?  
> >> yes you're right. Well we already pasid table table terminology so
> >> we can use it here as well - as long as we understand what purpose
> >> it serves ;-) - So OK for iommu_inv_pasid_info.
> >>
> >> I think Jean understood we would keep pasid standalone field in  
> I meant Jacob here.
> >> iommu_cache_invalidate_info's union. I understand the struct
> >> iommu_inv_pasid_info now would replace it, correct?  
> 
> Thank you for the confirmation.
> 
Yes, I agree to replace the standalone __64 pasid with this struct.
Looks more inline with address selective info., Just to double confirm
the new struct.

Jean, will you put this in your sva/api repo?

struct iommu_cache_invalidate_info {
#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
	__u32	version;
/* IOMMU paging structure cache */
#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB
*/
#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
#define IOMMU_CACHE_TYPE_NR		(3)
	__u8	cache;
	__u8	granularity;
	__u8	padding[2];
	union {
		struct iommu_inv_pasid_info pasid_info;
		struct iommu_inv_addr_info addr_info;
	};
};




^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-14 17:44                 ` Jacob Pan
@ 2019-05-14 17:57                   ` Jacob Pan
  2019-05-15 11:03                   ` Jean-Philippe Brucker
  1 sibling, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-14 17:57 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Auger Eric, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko,
	jacob.jun.pan

On Tue, 14 May 2019 10:44:01 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> Hi Thank you both for the explanation.
> 
> On Tue, 14 May 2019 11:41:24 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
> > On 14/05/2019 08:36, Auger Eric wrote:  
> > > Hi Jacob,
> > > 
> > > On 5/14/19 12:16 AM, Jacob Pan wrote:    
> > >> On Mon, 13 May 2019 18:09:48 +0100
> > >> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> > >>    
> > >>> On 13/05/2019 17:50, Auger Eric wrote:    
> > >>>>> struct iommu_inv_pasid_info {
> > >>>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
> > >>>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
> > >>>>> 	__u32	flags;
> > >>>>> 	__u32	archid;
> > >>>>> 	__u64	pasid;
> > >>>>> };      
> > >>>> I agree it does the job now. However it looks a bit strange to
> > >>>> do a PASID based invalidation in my case - SMMUv3 nested stage
> > >>>> - where I don't have any PASID involved.
> > >>>>
> > >>>> Couldn't we call it context based invalidation then? A context
> > >>>> can be tagged by a PASID or/and an ARCHID.      
> > >>>
> > >>> I think calling it "context" would be confusing as well (I
> > >>> shouldn't have used it earlier), since VT-d uses that name for
> > >>> device table entries (=STE on Arm SMMU). Maybe "addr_space"?
> > >>>    
> > >> I am still struggling to understand what ARCHID is after scanning
> > >> through SMMUv3.1 spec. It seems to be a constant for a given
> > >> SMMU. Why do you need to pass it down every time? Could you
> > >> point to me the document or explain a little more on ARCHID use
> > >> cases. We have three fileds called pasid under this struct
> > >> iommu_cache_invalidate_info{}
> > >> Gets confusing :)    
> > > archid is a generic term. That's why you did not find it in the
> > > spec ;-)
> > > 
> > > On ARM SMMU the archid is called the ASID (Address Space ID, up to
> > > 16 bits. The ASID is stored in the Context Descriptor Entry (your
> > > PASID entry) and thus characterizes a given stage 1 translation
> > > "context"/"adress space".    
> > 
> > Yes, another way to look at it is, for a given address space:
> > * PASID tags device-IOTLB (ATC) entries.
> > * ASID (here called archid) tags IOTLB entries.
> > 
> > They could have the same value, but it depends on the guest's
> > allocation policy which isn't in our control. With my PASID patches
> > for SMMUv3, they have different values. So we need both fields if we
> > intend to invalidate both ATC and IOTLB with a single call.
> >   
> For ASID invalidation, there is also page/address selective within an
> ASID, right? I guess it is CMD_TLBI_NH_VA?
> So the single call to invalidate both ATC & IOTLB should share the
> same address information. i.e.
> struct iommu_inv_addr_info {}
> 
Nevermind for this question. archid field is already in the addr_info.
Sorry.
> Just out of curiosity, what is the advantage of having guest tag its
> ATC with its own PASID? I thought you were planning to use custom
> ioasid allocator to get PASID from host.
> 
> Also ASID is 16 bit as Eric said and PASID (substreamID?) is 20 bit,
> right?
> 
> > Thanks,
> > Jean
> >   
> > > 
> > > At the moment the ASID is allocated per iommu domain. With aux
> > > domains we should have one ASID per aux domain, Jean-Philippe
> > > said.
> > > 
> > > ASID tags IOTLB S1 entries. As the ASID is part of the "context
> > > descriptor" which is owned by the guest, the API must pass it
> > > somehow.
> > > 
> > > 4.4.1.2 CMD_TLBI_NH_ASID(VMID, ASID) invalidation command allows
> > > to invalidate all IOTLB S1 entries for a given VMID/ASID and this
> > > is the functionality which is currently missing in the API. This
> > > is not an address based invalidation or a "pure" PASID based
> > > invalidation. At the moment we don't support PASIDs on ARM and I
> > > need this capability.
> > >   
> Got it.
> > > Thanks
> > > 
> > > Eric
> > > 
> > > 
> > >     
> > >>> Thanks,
> > >>> Jean
> > >>>    
> > >>>>
> > >>>> Domain invalidation would invalidate all the contexts belonging
> > >>>> to that domain.
> > >>>>
> > >>>> Thanks
> > >>>>
> > >>>> Eric      
> > >>
> > >> [Jacob Pan]
> > >>    
> >   
> 
> [Jacob Pan]

[Jacob Pan]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-14 17:44                 ` Jacob Pan
  2019-05-14 17:57                   ` Jacob Pan
@ 2019-05-15 11:03                   ` Jean-Philippe Brucker
  2019-05-15 14:47                     ` Tian, Kevin
  1 sibling, 1 reply; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-15 11:03 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Auger Eric, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko

On 14/05/2019 18:44, Jacob Pan wrote:
> Hi Thank you both for the explanation.
> 
> On Tue, 14 May 2019 11:41:24 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> On 14/05/2019 08:36, Auger Eric wrote:
>>> Hi Jacob,
>>>
>>> On 5/14/19 12:16 AM, Jacob Pan wrote:  
>>>> On Mon, 13 May 2019 18:09:48 +0100
>>>> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
>>>>  
>>>>> On 13/05/2019 17:50, Auger Eric wrote:  
>>>>>>> struct iommu_inv_pasid_info {
>>>>>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
>>>>>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
>>>>>>> 	__u32	flags;
>>>>>>> 	__u32	archid;
>>>>>>> 	__u64	pasid;
>>>>>>> };    
>>>>>> I agree it does the job now. However it looks a bit strange to
>>>>>> do a PASID based invalidation in my case - SMMUv3 nested stage -
>>>>>> where I don't have any PASID involved.
>>>>>>
>>>>>> Couldn't we call it context based invalidation then? A context
>>>>>> can be tagged by a PASID or/and an ARCHID.    
>>>>>
>>>>> I think calling it "context" would be confusing as well (I
>>>>> shouldn't have used it earlier), since VT-d uses that name for
>>>>> device table entries (=STE on Arm SMMU). Maybe "addr_space"?
>>>>>  
>>>> I am still struggling to understand what ARCHID is after scanning
>>>> through SMMUv3.1 spec. It seems to be a constant for a given SMMU.
>>>> Why do you need to pass it down every time? Could you point to me
>>>> the document or explain a little more on ARCHID use cases.
>>>> We have three fileds called pasid under this struct
>>>> iommu_cache_invalidate_info{}
>>>> Gets confusing :)  
>>> archid is a generic term. That's why you did not find it in the
>>> spec ;-)
>>>
>>> On ARM SMMU the archid is called the ASID (Address Space ID, up to
>>> 16 bits. The ASID is stored in the Context Descriptor Entry (your
>>> PASID entry) and thus characterizes a given stage 1 translation
>>> "context"/"adress space".  
>>
>> Yes, another way to look at it is, for a given address space:
>> * PASID tags device-IOTLB (ATC) entries.
>> * ASID (here called archid) tags IOTLB entries.
>>
>> They could have the same value, but it depends on the guest's
>> allocation policy which isn't in our control. With my PASID patches
>> for SMMUv3, they have different values. So we need both fields if we
>> intend to invalidate both ATC and IOTLB with a single call.
>>
> For ASID invalidation, there is also page/address selective within an
> ASID, right? I guess it is CMD_TLBI_NH_VA?
> So the single call to invalidate both ATC & IOTLB should share the same
> address information. i.e.
> struct iommu_inv_addr_info {}
> 
> Just out of curiosity, what is the advantage of having guest tag its
> ATC with its own PASID? I thought you were planning to use custom
> ioasid allocator to get PASID from host.

Hm, for the moment I mostly considered the custom ioasid allocator for
Intel platforms. On Arm platforms the SR-IOV model where each VM has its
own PASID space is still very much on the table. This would be the only
model supported by a vSMMU emulation for example, since the SMMU doesn't
have PASID allocation commands.

> Also ASID is 16 bit as Eric said and PASID (substreamID?) is 20 bit,
> right?

Yes. Some implementations have 8-bit ASIDs, but I think those would be
on embedded rather than server class platforms. And yes, if it wasn't
confusing enough, the Arm SMMU uses "SubstreamID" (SSID) for PASIDs :)

Thanks,
Jean

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-15 11:03                   ` Jean-Philippe Brucker
@ 2019-05-15 14:47                     ` Tian, Kevin
  2019-05-15 15:25                       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 52+ messages in thread
From: Tian, Kevin @ 2019-05-15 14:47 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jacob Pan
  Cc: Raj, Ashok, iommu, LKML, Alex Williamson, Andriy Shevchenko,
	David Woodhouse

> From: Jean-Philippe Brucker
> Sent: Wednesday, May 15, 2019 7:04 PM
> 
> On 14/05/2019 18:44, Jacob Pan wrote:
> > Hi Thank you both for the explanation.
> >
> > On Tue, 14 May 2019 11:41:24 +0100
> > Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> >
> >> On 14/05/2019 08:36, Auger Eric wrote:
> >>> Hi Jacob,
> >>>
> >>> On 5/14/19 12:16 AM, Jacob Pan wrote:
> >>>> On Mon, 13 May 2019 18:09:48 +0100
> >>>> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> >>>>
> >>>>> On 13/05/2019 17:50, Auger Eric wrote:
> >>>>>>> struct iommu_inv_pasid_info {
> >>>>>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
> >>>>>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
> >>>>>>> 	__u32	flags;
> >>>>>>> 	__u32	archid;
> >>>>>>> 	__u64	pasid;
> >>>>>>> };
> >>>>>> I agree it does the job now. However it looks a bit strange to
> >>>>>> do a PASID based invalidation in my case - SMMUv3 nested stage -
> >>>>>> where I don't have any PASID involved.
> >>>>>>
> >>>>>> Couldn't we call it context based invalidation then? A context
> >>>>>> can be tagged by a PASID or/and an ARCHID.
> >>>>>
> >>>>> I think calling it "context" would be confusing as well (I
> >>>>> shouldn't have used it earlier), since VT-d uses that name for
> >>>>> device table entries (=STE on Arm SMMU). Maybe "addr_space"?
> >>>>>
> >>>> I am still struggling to understand what ARCHID is after scanning
> >>>> through SMMUv3.1 spec. It seems to be a constant for a given SMMU.
> >>>> Why do you need to pass it down every time? Could you point to me
> >>>> the document or explain a little more on ARCHID use cases.
> >>>> We have three fileds called pasid under this struct
> >>>> iommu_cache_invalidate_info{}
> >>>> Gets confusing :)
> >>> archid is a generic term. That's why you did not find it in the
> >>> spec ;-)
> >>>
> >>> On ARM SMMU the archid is called the ASID (Address Space ID, up to
> >>> 16 bits. The ASID is stored in the Context Descriptor Entry (your
> >>> PASID entry) and thus characterizes a given stage 1 translation
> >>> "context"/"adress space".
> >>
> >> Yes, another way to look at it is, for a given address space:
> >> * PASID tags device-IOTLB (ATC) entries.
> >> * ASID (here called archid) tags IOTLB entries.
> >>
> >> They could have the same value, but it depends on the guest's
> >> allocation policy which isn't in our control. With my PASID patches
> >> for SMMUv3, they have different values. So we need both fields if we
> >> intend to invalidate both ATC and IOTLB with a single call.
> >>
> > For ASID invalidation, there is also page/address selective within an
> > ASID, right? I guess it is CMD_TLBI_NH_VA?
> > So the single call to invalidate both ATC & IOTLB should share the same
> > address information. i.e.
> > struct iommu_inv_addr_info {}
> >
> > Just out of curiosity, what is the advantage of having guest tag its
> > ATC with its own PASID? I thought you were planning to use custom
> > ioasid allocator to get PASID from host.
> 
> Hm, for the moment I mostly considered the custom ioasid allocator for
> Intel platforms. On Arm platforms the SR-IOV model where each VM has its
> own PASID space is still very much on the table. This would be the only
> model supported by a vSMMU emulation for example, since the SMMU
> doesn't
> have PASID allocation commands.
> 

I didn't get how ATS works in such case, if device ATC PASID is different
from IOTLB ASID. Who will be responsible for translation in-between?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-15 14:47                     ` Tian, Kevin
@ 2019-05-15 15:25                       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-15 15:25 UTC (permalink / raw)
  To: Tian, Kevin, Jacob Pan
  Cc: Raj, Ashok, Alex Williamson, LKML, iommu, Andriy Shevchenko,
	David Woodhouse

On 15/05/2019 15:47, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Wednesday, May 15, 2019 7:04 PM
>>
>> On 14/05/2019 18:44, Jacob Pan wrote:
>>> Hi Thank you both for the explanation.
>>>
>>> On Tue, 14 May 2019 11:41:24 +0100
>>> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
>>>
>>>> On 14/05/2019 08:36, Auger Eric wrote:
>>>>> Hi Jacob,
>>>>>
>>>>> On 5/14/19 12:16 AM, Jacob Pan wrote:
>>>>>> On Mon, 13 May 2019 18:09:48 +0100
>>>>>> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
>>>>>>
>>>>>>> On 13/05/2019 17:50, Auger Eric wrote:
>>>>>>>>> struct iommu_inv_pasid_info {
>>>>>>>>> #define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
>>>>>>>>> #define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
>>>>>>>>> 	__u32	flags;
>>>>>>>>> 	__u32	archid;
>>>>>>>>> 	__u64	pasid;
>>>>>>>>> };
>>>>>>>> I agree it does the job now. However it looks a bit strange to
>>>>>>>> do a PASID based invalidation in my case - SMMUv3 nested stage -
>>>>>>>> where I don't have any PASID involved.
>>>>>>>>
>>>>>>>> Couldn't we call it context based invalidation then? A context
>>>>>>>> can be tagged by a PASID or/and an ARCHID.
>>>>>>>
>>>>>>> I think calling it "context" would be confusing as well (I
>>>>>>> shouldn't have used it earlier), since VT-d uses that name for
>>>>>>> device table entries (=STE on Arm SMMU). Maybe "addr_space"?
>>>>>>>
>>>>>> I am still struggling to understand what ARCHID is after scanning
>>>>>> through SMMUv3.1 spec. It seems to be a constant for a given SMMU.
>>>>>> Why do you need to pass it down every time? Could you point to me
>>>>>> the document or explain a little more on ARCHID use cases.
>>>>>> We have three fileds called pasid under this struct
>>>>>> iommu_cache_invalidate_info{}
>>>>>> Gets confusing :)
>>>>> archid is a generic term. That's why you did not find it in the
>>>>> spec ;-)
>>>>>
>>>>> On ARM SMMU the archid is called the ASID (Address Space ID, up to
>>>>> 16 bits. The ASID is stored in the Context Descriptor Entry (your
>>>>> PASID entry) and thus characterizes a given stage 1 translation
>>>>> "context"/"adress space".
>>>>
>>>> Yes, another way to look at it is, for a given address space:
>>>> * PASID tags device-IOTLB (ATC) entries.
>>>> * ASID (here called archid) tags IOTLB entries.
>>>>
>>>> They could have the same value, but it depends on the guest's
>>>> allocation policy which isn't in our control. With my PASID patches
>>>> for SMMUv3, they have different values. So we need both fields if we
>>>> intend to invalidate both ATC and IOTLB with a single call.
>>>>
>>> For ASID invalidation, there is also page/address selective within an
>>> ASID, right? I guess it is CMD_TLBI_NH_VA?
>>> So the single call to invalidate both ATC & IOTLB should share the same
>>> address information. i.e.
>>> struct iommu_inv_addr_info {}
>>>
>>> Just out of curiosity, what is the advantage of having guest tag its
>>> ATC with its own PASID? I thought you were planning to use custom
>>> ioasid allocator to get PASID from host.
>>
>> Hm, for the moment I mostly considered the custom ioasid allocator for
>> Intel platforms. On Arm platforms the SR-IOV model where each VM has its
>> own PASID space is still very much on the table. This would be the only
>> model supported by a vSMMU emulation for example, since the SMMU
>> doesn't
>> have PASID allocation commands.
>>
> 
> I didn't get how ATS works in such case, if device ATC PASID is different
> from IOTLB ASID. Who will be responsible for translation in-between?

ATS with the SMMU works like this:

* The PCI function sends a Translation Request with PASID.
* The SMMU walks the PASID table (which we call context descriptor
table), finds the context descriptor indexed by PASID. This context
descriptor has an ASID field, and a page directory pointer.
* After successfully walking the page tables, the SMMU may add an IOTLB
entry tagged by ASID and address, then returns a Translation Completion.
* The PCI function adds an ATC entry tagged by PASID and address.

I think the ASID on Arm CPUs is roughly equivalent to Intel PCID. One
reason we use ASIDs for IOTLBs is that with SVA, the ASID of an address
space is the same on the CPU side. And when the CPU executes a TLB
invalidation instructions, it also invalidates the corresponding IOTLB
entries. It's nice for vSVA because you don't need to context-switch to
the host to send an IOTLB invalidation. But only non-PCI devices that
implement SVA benefit from this at the moment, because ATC invalidations
still have to go through the SMMU command queue.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-14 17:55                 ` Jacob Pan
@ 2019-05-15 15:52                   ` Jean-Philippe Brucker
  2019-05-15 16:25                     ` Jacob Pan
  0 siblings, 1 reply; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-15 15:52 UTC (permalink / raw)
  To: Jacob Pan, Auger Eric
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Tian, Kevin, Raj Ashok, Andriy Shevchenko

On 14/05/2019 18:55, Jacob Pan wrote:
> Yes, I agree to replace the standalone __64 pasid with this struct.
> Looks more inline with address selective info., Just to double confirm
> the new struct.
> 
> Jean, will you put this in your sva/api repo?

Yes, I pushed it along with some documentation fixes (mainly getting rid
of scripts/kernel-doc warnings and outputting valid rst)

Thanks,
Jean

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API
  2019-05-15 15:52                   ` Jean-Philippe Brucker
@ 2019-05-15 16:25                     ` Jacob Pan
  0 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-15 16:25 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Auger Eric, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko,
	jacob.jun.pan

On Wed, 15 May 2019 16:52:46 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 14/05/2019 18:55, Jacob Pan wrote:
> > Yes, I agree to replace the standalone __64 pasid with this struct.
> > Looks more inline with address selective info., Just to double
> > confirm the new struct.
> > 
> > Jean, will you put this in your sva/api repo?  
> 
> Yes, I pushed it along with some documentation fixes (mainly getting
> rid of scripts/kernel-doc warnings and outputting valid rst)
> 
Just pulled, I am rebasing on top of this branch. If you could also
include our api for bind guest pasid, then we have a complete set of
common APIs in one place.
https://lkml.org/lkml/2019/5/3/775

I just need to add a small tweak for supporting non-identity guest-host
PASID mapping for the next version.

> Thanks,
> Jean

[Jacob Pan]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support
  2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (15 preceding siblings ...)
  2019-05-03 22:32 ` [PATCH v3 16/16] iommu/vt-d: Add svm/sva invalidate function Jacob Pan
@ 2019-05-15 16:31 ` Jacob Pan
  16 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-15 16:31 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, jacob.jun.pan

Hi all,
Just wondering if you have any more feedbacks other than the cache
invalidate API change for archid?
I plan to do the next version on top of Jean's sva/api branch (common
iommu APIs) with minor tweak to support non-identity guest-host PASID
mapping. It would be great if I can address additional comments
together.

Thanks!

Jacob

On Fri,  3 May 2019 15:32:01 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on
> Intel platforms allow address space sharing between device DMA and
> applications. SVA can reduce programming complexity and enhance
> security. This series is intended to enable SVA virtualization, i.e.
> shared guest application address space and physical device DMA
> address. Only IOMMU portion of the changes are included in this
> series. Additional support is needed in VFIO and QEMU (will be
> submitted separately) to complete this functionality.
> 
> To make incremental changes and reduce the size of each patchset.
> This series does not inlcude support for page request services.
> 
> In VT-d implementation, PASID table is per device and maintained in
> the host. Guest PASID table is shadowed in VMM where virtual IOMMU is
> emulated.
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process CR3, FL only|
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'                       |
>     |             |                       V
>     |             |                CR3 in GPA
>     '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>       v        v                          v
> Host
>     .-------------.  .----------------------.
>     |   pIOMMU    |  | Bind FL for GVA-GPA  |
>     |             |  '----------------------'
>     .----------------/  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------\.------------------------------.
>     |             |   |SL for GPA-HPA, default domain|
>     |             |   '------------------------------'
>     '-------------'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
> 
> 
> This work is based on collaboration with other developers on the IOMMU
> mailing list. Notably,
> 
> [1] [PATCH v6 00/22] SMMUv3 Nested Stage Setup by Eric Auger
> https://lkml.org/lkml/2019/3/17/124
> 
> [2] [RFC PATCH 2/6] drivers core: Add I/O ASID allocator by
> Jean-Philippe Brucker
> https://www.spinics.net/lists/iommu/msg30639.html
> 
> [3] [RFC PATCH 0/5] iommu: APIs for paravirtual PASID allocation by
> Lu Baolu https://lkml.org/lkml/2018/11/12/1921
> 
> [4] [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual
>     Address (SVA)
>     https://lwn.net/Articles/754331/
> 
> There are roughly three parts:
> 1. Generic PASID allocator [1] with extension to support custom
> allocator 2. IOMMU cache invalidation passdown from guest to host
> 3. Guest PASID bind for nested translation
> 
> All generic IOMMU APIs are reused from [1], which has a v7 just
> published with no real impact to the patches used here. It is worth
> noting that unlike sMMU nested stage setup, where PASID table is
> owned by the guest, VT-d PASID table is owned by the host, individual
> PASIDs are bound instead of the PASID table.
> 
> This series is based on the new VT-d 3.0 Specification
> (https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf).
> This is different than the older series in [4] which was based on the
> older specification that does not have scalable mode.
> 
> 
> ChangeLog:
> 	- V3
> 	  - Addressed thorough review comments from Eric Auger (Thank
> you!)
> 	  - Moved IOASID allocator from driver core to IOMMU code per
> 	    suggestion by Christoph Hellwig
> 	    (https://lkml.org/lkml/2019/4/26/462)
> 	  - Rebased on top of Jean's SVA API branch and Eric's v7[1]
> 	    (git://linux-arm.org/linux-jpb.git sva/api)
> 	  - All IOMMU APIs are unmodified (except the new bind guest
> PASID call in patch 9/16)
> 
> 	- V2
> 	  - Rebased on Joerg's IOMMU x86/vt-d branch v5.1-rc4
> 	  - Integrated with Eric Auger's new v7 series for common APIs
> 	  (https://github.com/eauger/linux/tree/v5.1-rc3-2stage-v7)
> 	  - Addressed review comments from Andy Shevchenko and Alex
> Williamson on IOASID custom allocator.
> 	  - Support multiple custom IOASID allocators (vIOMMUs) and
> dynamic registration.
> 
> 
> Jacob Pan (13):
>   iommu: Introduce attach/detach_pasid_table API
>   ioasid: Add custom IOASID allocator
>   iommu/vt-d: Add custom allocator for IOASID
>   iommu/vtd: Optimize tlb invalidation for vIOMMU
>   iommu/vt-d: Replace Intel specific PASID allocator with IOASID
>   iommu: Introduce guest PASID bind function
>   iommu/vt-d: Move domain helper to header
>   iommu/vt-d: Avoid duplicated code for PASID setup
>   iommu/vt-d: Add nested translation helper function
>   iommu/vt-d: Clean up for SVM device list
>   iommu/vt-d: Add bind guest PASID support
>   iommu/vt-d: Support flushing more translation cache types
>   iommu/vt-d: Add svm/sva invalidate function
> 
> Jean-Philippe Brucker (1):
>   iommu: Add I/O ASID allocator
> 
> Liu, Yi L (1):
>   iommu: Introduce cache_invalidate API
> 
> Lu Baolu (1):
>   iommu/vt-d: Enlightened PASID allocation
> 
>  drivers/iommu/Kconfig       |   7 ++
>  drivers/iommu/Makefile      |   1 +
>  drivers/iommu/dmar.c        |  50 ++++++++
>  drivers/iommu/intel-iommu.c | 241
> ++++++++++++++++++++++++++++++++++-- drivers/iommu/intel-pasid.c |
> 223 +++++++++++++++++++++++++-------- drivers/iommu/intel-pasid.h |
> 24 +++- drivers/iommu/intel-svm.c   | 293
> +++++++++++++++++++++++++++++++++++---------
> drivers/iommu/ioasid.c      | 265
> +++++++++++++++++++++++++++++++++++++++ drivers/iommu/iommu.c
> |  53 ++++++++ include/linux/intel-iommu.h |  41 ++++++-
> include/linux/intel-svm.h   |   7 ++ include/linux/ioasid.h      |
> 67 ++++++++++ include/linux/iommu.h       |  43 ++++++-
>  include/uapi/linux/iommu.h  | 140 +++++++++++++++++++++
>  14 files changed, 1328 insertions(+), 127 deletions(-)
>  create mode 100644 drivers/iommu/ioasid.c
>  create mode 100644 include/linux/ioasid.h
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function
  2019-05-03 22:32 ` [PATCH v3 09/16] iommu: Introduce guest PASID bind function Jacob Pan
@ 2019-05-16 14:14   ` Jean-Philippe Brucker
  2019-05-16 16:14     ` Jacob Pan
  0 siblings, 1 reply; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-16 14:14 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Eric Auger, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Andriy Shevchenko

Hi Jacob,

On 03/05/2019 23:32, Jacob Pan wrote:
> +/**
> + * struct gpasid_bind_data - Information about device and guest PASID binding
> + * @gcr3:	Guest CR3 value from guest mm
> + * @pasid:	Process address space ID used for the guest mm
> + * @addr_width:	Guest address width. Paging mode can also be derived.
> + */
> +struct gpasid_bind_data {
> +	__u64 gcr3;
> +	__u32 pasid;
> +	__u32 addr_width;
> +	__u32 flags;
> +#define	IOMMU_SVA_GPASID_SRE	BIT(0) /* supervisor request */
> +	__u8 padding[4];
> +};

Could you wrap this structure into a generic one like we now do for
bind_pasid_table? It would make the API easier to extend, because if we
ever add individual PASID bind on Arm (something I'd like to do for
virtio-iommu, eventually) it will have different parameters, as our
PASID table entry has a lot of fields describing the page table format.

Maybe something like the following would do?

struct gpasid_bind_data {
#define IOMMU_GPASID_BIND_VERSION_1 1
	__u32 version;
#define IOMMU_GPASID_BIND_FORMAT_INTEL_VTD	1
	__u32 format;
	union {
		// the current gpasid_bind_data:
		struct gpasid_bind_intel_vtd vtd;
	};
};

Thanks,
Jean

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function
  2019-05-16 14:14   ` Jean-Philippe Brucker
@ 2019-05-16 16:14     ` Jacob Pan
  2019-05-20 19:22       ` Jacob Pan
  0 siblings, 1 reply; 52+ messages in thread
From: Jacob Pan @ 2019-05-16 16:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko,
	jacob.jun.pan

On Thu, 16 May 2019 15:14:40 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Hi Jacob,
> 
> On 03/05/2019 23:32, Jacob Pan wrote:
> > +/**
> > + * struct gpasid_bind_data - Information about device and guest
> > PASID binding
> > + * @gcr3:	Guest CR3 value from guest mm
> > + * @pasid:	Process address space ID used for the guest mm
> > + * @addr_width:	Guest address width. Paging mode can also
> > be derived.
> > + */
> > +struct gpasid_bind_data {
> > +	__u64 gcr3;
> > +	__u32 pasid;
> > +	__u32 addr_width;
> > +	__u32 flags;
> > +#define	IOMMU_SVA_GPASID_SRE	BIT(0) /* supervisor
> > request */
> > +	__u8 padding[4];
> > +};  
> 
> Could you wrap this structure into a generic one like we now do for
> bind_pasid_table? It would make the API easier to extend, because if
> we ever add individual PASID bind on Arm (something I'd like to do for
> virtio-iommu, eventually) it will have different parameters, as our
> PASID table entry has a lot of fields describing the page table
> format.
> 
> Maybe something like the following would do?
> 
> struct gpasid_bind_data {
> #define IOMMU_GPASID_BIND_VERSION_1 1
> 	__u32 version;
> #define IOMMU_GPASID_BIND_FORMAT_INTEL_VTD	1
> 	__u32 format;
> 	union {
> 		// the current gpasid_bind_data:
> 		struct gpasid_bind_intel_vtd vtd;
> 	};
> };
> 
OK, sounds great.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function
  2019-05-16 16:14     ` Jacob Pan
@ 2019-05-20 19:22       ` Jacob Pan
  2019-05-21 16:09         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 52+ messages in thread
From: Jacob Pan @ 2019-05-20 19:22 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko,
	jacob.jun.pan

On Thu, 16 May 2019 09:14:29 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> On Thu, 16 May 2019 15:14:40 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
> > Hi Jacob,
> > 
> > On 03/05/2019 23:32, Jacob Pan wrote:  
> > > +/**
> > > + * struct gpasid_bind_data - Information about device and guest
> > > PASID binding
> > > + * @gcr3:	Guest CR3 value from guest mm
> > > + * @pasid:	Process address space ID used for the guest mm
> > > + * @addr_width:	Guest address width. Paging mode can also
> > > be derived.
> > > + */
> > > +struct gpasid_bind_data {
> > > +	__u64 gcr3;
> > > +	__u32 pasid;
> > > +	__u32 addr_width;
> > > +	__u32 flags;
> > > +#define	IOMMU_SVA_GPASID_SRE	BIT(0) /* supervisor
> > > request */
> > > +	__u8 padding[4];
> > > +};    
> > 
> > Could you wrap this structure into a generic one like we now do for
> > bind_pasid_table? It would make the API easier to extend, because if
> > we ever add individual PASID bind on Arm (something I'd like to do
> > for virtio-iommu, eventually) it will have different parameters, as
> > our PASID table entry has a lot of fields describing the page table
> > format.
> > 
> > Maybe something like the following would do?
> > 
> > struct gpasid_bind_data {
> > #define IOMMU_GPASID_BIND_VERSION_1 1
> > 	__u32 version;
> > #define IOMMU_GPASID_BIND_FORMAT_INTEL_VTD	1
> > 	__u32 format;
> > 	union {
> > 		// the current gpasid_bind_data:
> > 		struct gpasid_bind_intel_vtd vtd;
> > 	};
> > };
> >   

Could you review the struct below? I am trying to extract the
common fileds as much as possible. Didn't do exactly as you suggested
to keep vendor specific data in separate struct under the same union.

Also, can you review the v3 ioasid allocator common code patches? I am
hoping we can get the common code in v5.3 so that we can focus on the
vendor specific part. The common code should include bind_guest_pasid
and ioasid allocator.
https://lkml.org/lkml/2019/5/3/787
https://lkml.org/lkml/2019/5/3/780

Thanks,

Jacob


/**
 * struct gpasid_bind_data_vtd - Intel VT-d specific data on device and guest
 * SVA binding.
 *
 * @flags:	VT-d PASID table entry attributes
 * @pat:	Page attribute table data to compute effective memory type
 * @emt:	Extended memory type
 *
 * Only guest vIOMMU selectable and effective options are passed down to
 * the host IOMMU.
 */
struct gpasid_bind_data_vtd {
#define	IOMMU_SVA_VTD_GPASID_SRE	BIT(0) /* supervisor request */
#define	IOMMU_SVA_VTD_GPASID_EAFE	BIT(1) /* extended access enable */
#define	IOMMU_SVA_VTD_GPASID_PCD	BIT(2) /* page-level cache disable */
#define	IOMMU_SVA_VTD_GPASID_PWT	BIT(3) /* page-level write through */
#define	IOMMU_SVA_VTD_GPASID_EMTE	BIT(4) /* extended memory type enable */
#define	IOMMU_SVA_VTD_GPASID_CD		BIT(5) /* PASID-level cache disable */
	__u64 flags;
	__u32 pat;
	__u32 emt;
};

/**
 * struct gpasid_bind_data - Information about device and guest PASID binding
 * @version:	Version of this data structure
 * @format:	PASID table entry format
 * @flags:	Additional information on guest bind request
 * @gpgd:	Guest page directory base of the guest mm to bind
 * @hpasid:	Process address space ID used for the guest mm in host IOMMU
 * @gpasid:	Process address space ID used for the guest mm in guest IOMMU
 * @addr_width:	Guest address width. Paging mode can also be derived.
 * @vtd:	Intel VT-d specific data
 */
struct gpasid_bind_data {
#define IOMMU_GPASID_BIND_VERSION_1	1
	__u32 version;
#define IOMMU_PASID_FORMAT_INTEL_VTD	1
	__u32 format;
#define	IOMMU_SVA_GPASID_VAL	BIT(1) /* guest PASID valid */
	__u64 flags;
	__u64 gpgd;
	__u64 hpasid;
	__u64 gpasid;
	__u32 addr_width;
	/* Vendor specific data */
	union {
		struct gpasid_bind_data_vtd vtd;
	};
};


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 03/16] iommu: Add I/O ASID allocator
  2019-05-03 22:32 ` [PATCH v3 03/16] iommu: Add I/O ASID allocator Jacob Pan
@ 2019-05-21  8:21   ` Auger Eric
  2019-05-21 17:03     ` Jacob Pan
  2019-05-21  9:41   ` Auger Eric
  1 sibling, 1 reply; 52+ messages in thread
From: Auger Eric @ 2019-05-21  8:21 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi,

On 5/4/19 12:32 AM, Jacob Pan wrote:
> From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> 
> Some devices might support multiple DMA address spaces, in particular
> those that have the PCI PASID feature. PASID (Process Address Space ID)
> allows to share process address spaces with devices (SVA), partition a
> device into VM-assignable entities (VFIO mdev) or simply provide
> multiple DMA address space to kernel drivers. Add a global PASID
> allocator usable by different drivers at the same time. Name it I/O ASID
> to avoid confusion with ASIDs allocated by arch code, which are usually
> a separate ID space.
> 
> The IOASID space is global. Each device can have its own PASID space,
> but by convention the IOMMU ended up having a global PASID space, so
> that with SVA, each mm_struct is associated to a single PASID.
> 
> The allocator is primarily used by IOMMU subsystem but in rare occasions
> drivers would like to allocate PASIDs for devices that aren't managed by
> an IOMMU, using the same ID space as IOMMU.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Link: https://lkml.org/lkml/2019/4/26/462
> ---
>  drivers/iommu/Kconfig  |   6 +++
>  drivers/iommu/Makefile |   1 +
>  drivers/iommu/ioasid.c | 140 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/ioasid.h |  67 +++++++++++++++++++++++
>  4 files changed, 214 insertions(+)
>  create mode 100644 drivers/iommu/ioasid.c
>  create mode 100644 include/linux/ioasid.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 6f07f3b..75e7f97 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -2,6 +2,12 @@
>  config IOMMU_IOVA
>  	tristate
>  
> +config IOASID
> +	bool
don't we want a tristate here too?

Also refering to the past discussions we could add "# The IOASID library
may also be used by non-IOMMU_API users"
> +	help
> +	  Enable the I/O Address Space ID allocator. A single ID space shared
> +	  between different users.
> +
>  # IOMMU_API always gets selected by whoever wants it.
>  config IOMMU_API
>  	bool
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 8c71a15..0efac6f 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> +obj-$(CONFIG_IOASID) += ioasid.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> new file mode 100644
> index 0000000..99f5e0a
> --- /dev/null
> +++ b/drivers/iommu/ioasid.c
> @@ -0,0 +1,140 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * I/O Address Space ID allocator. There is one global IOASID space, split into
> + * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
> + * free IOASIDs with ioasid_alloc and ioasid_free.
> + */
> +#include <linux/xarray.h>
> +#include <linux/ioasid.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +struct ioasid_data {
> +	ioasid_t id;
> +	struct ioasid_set *set;
> +	void *private;
> +	struct rcu_head rcu;
> +};
> +
> +static DEFINE_XARRAY_ALLOC(ioasid_xa);
> +
> +/**
> + * ioasid_set_data - Set private data for an allocated ioasid
> + * @ioasid: the ID to set data
> + * @data:   the private data
> + *
> + * For IOASID that is already allocated, private data can be set
> + * via this API. Future lookup can be done via ioasid_find.
> + */
> +int ioasid_set_data(ioasid_t ioasid, void *data)
> +{
> +	struct ioasid_data *ioasid_data;
> +	int ret = 0;
> +
> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> +	if (ioasid_data)
> +		ioasid_data->private = data;
> +	else
> +		ret = -ENOENT;
> +
> +	/* getter may use the private data */
> +	synchronize_rcu();
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_set_data);
> +
> +/**
> + * ioasid_alloc - Allocate an IOASID
> + * @set: the IOASID set
> + * @min: the minimum ID (inclusive)
> + * @max: the maximum ID (inclusive)
> + * @private: data private to the caller
> + *
> + * Allocate an ID between @min and @max (or %0 and %INT_MAX).
I think we agreed to drop (or %0 and %INT_MAX)
 Return the
> + * allocated ID on success, or INVALID_IOASID on failure. The @private pointer
> + * is stored internally and can be retrieved with ioasid_find().
> + */
> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
> +		      void *private)
> +{
> +	int id = INVALID_IOASID;
isn't it an unsigned?
> +	struct ioasid_data *data;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return INVALID_IOASID;
> +
> +	data->set = set;
> +	data->private = private;
> +
> +	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
> +		pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
> +		goto exit_free;
> +	}
> +	data->id = id;
> +
> +exit_free:
> +	if (id < 0 || id == INVALID_IOASID) {
< 0?
> +		kfree(data);
> +		return INVALID_IOASID;
> +	}
> +	return id;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_alloc);
> +
> +/**
> + * ioasid_free - Free an IOASID
> + * @ioasid: the ID to remove
> + */
> +void ioasid_free(ioasid_t ioasid)
> +{
> +	struct ioasid_data *ioasid_data;
> +
> +	ioasid_data = xa_erase(&ioasid_xa, ioasid);
> +
> +	kfree_rcu(ioasid_data, rcu);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_free);
> +
> +/**
> + * ioasid_find - Find IOASID data
> + * @set: the IOASID set
> + * @ioasid: the IOASID to find
> + * @getter: function to call on the found object
> + *
> + * The optional getter function allows to take a reference to the found object
> + * under the rcu lock. The function can also check if the object is still valid:
> + * if @getter returns false, then the object is invalid and NULL is returned.
> + *
> + * If the IOASID has been allocated for this set, return the private pointer
> + * passed to ioasid_alloc. Private data can be NULL if not set. Return an error
> + * if the IOASID is not found or not belong to the set.
do not belong
> + */
> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +		  bool (*getter)(void *))
> +{
> +	void *priv = NULL;
> +	struct ioasid_data *ioasid_data;
> +
> +	rcu_read_lock();
> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> +	if (!ioasid_data) {
> +		priv = ERR_PTR(-ENOENT);
> +		goto unlock;
> +	}
> +	if (set && ioasid_data->set != set) {
> +		/* data found but does not belong to the set */
> +		priv = ERR_PTR(-EACCES);
> +		goto unlock;
> +	}
> +	/* Now IOASID and its set is verified, we can return the private data */
> +	priv = ioasid_data->private;
> +	if (getter && !getter(priv))
> +		priv = NULL;
> +unlock:
> +	rcu_read_unlock();
> +
> +	return priv;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_find);
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> new file mode 100644
> index 0000000..41de5e4
> --- /dev/null
> +++ b/include/linux/ioasid.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_IOASID_H
> +#define __LINUX_IOASID_H
> +
> +#define INVALID_IOASID ((ioasid_t)-1)
> +typedef unsigned int ioasid_t;
> +typedef int (*ioasid_iter_t)(ioasid_t ioasid, void *private, void *data);
> +typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data);
> +typedef void (*ioasid_free_fn_t)(ioasid_t ioasid, void *data);
> +
> +struct ioasid_set {
> +	int dummy;
> +};
> +
> +struct ioasid_allocator {
> +	ioasid_alloc_fn_t alloc;
> +	ioasid_free_fn_t free;
> +	void *pdata;
> +	struct list_head list;
> +};
> +
> +#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> +
> +#ifdef CONFIG_IOASID
> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
> +		      void *private);
> +void ioasid_free(ioasid_t ioasid);
> +
> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +		  bool (*getter)(void *));
> +int ioasid_register_allocator(struct ioasid_allocator *allocator);
> +void ioasid_unregister_allocator(struct ioasid_allocator *allocator);
> +
> +int ioasid_set_data(ioasid_t ioasid, void *data);
> +
> +#else /* !CONFIG_IOASID */
> +static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> +				    ioasid_t max, void *private)
> +{
> +	return INVALID_IOASID;
> +}
> +
> +static inline void ioasid_free(ioasid_t ioasid)
> +{
> +}
> +
> +static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +				bool (*getter)(void *))
> +{
> +	return NULL;
> +}
> +static inline int ioasid_register_allocator(struct ioasid_allocator *allocator)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void ioasid_unregister_allocator(struct ioasid_allocator *allocator)
> +{
> +}
> +
> +static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> +{
> +	return -ENODEV;
> +}
> +
> +#endif /* CONFIG_IOASID */
> +#endif /* __LINUX_IOASID_H */
> 
Thanks

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 03/16] iommu: Add I/O ASID allocator
  2019-05-03 22:32 ` [PATCH v3 03/16] iommu: Add I/O ASID allocator Jacob Pan
  2019-05-21  8:21   ` Auger Eric
@ 2019-05-21  9:41   ` Auger Eric
  2019-05-21 17:05     ` Jacob Pan
  1 sibling, 1 reply; 52+ messages in thread
From: Auger Eric @ 2019-05-21  9:41 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi,

On 5/4/19 12:32 AM, Jacob Pan wrote:
> From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> 
> Some devices might support multiple DMA address spaces, in particular
> those that have the PCI PASID feature. PASID (Process Address Space ID)
> allows to share process address spaces with devices (SVA), partition a
> device into VM-assignable entities (VFIO mdev) or simply provide
> multiple DMA address space to kernel drivers. Add a global PASID
> allocator usable by different drivers at the same time. Name it I/O ASID
> to avoid confusion with ASIDs allocated by arch code, which are usually
> a separate ID space.
> 
> The IOASID space is global. Each device can have its own PASID space,
> but by convention the IOMMU ended up having a global PASID space, so
> that with SVA, each mm_struct is associated to a single PASID.
> 
> The allocator is primarily used by IOMMU subsystem but in rare occasions
> drivers would like to allocate PASIDs for devices that aren't managed by
> an IOMMU, using the same ID space as IOMMU.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Link: https://lkml.org/lkml/2019/4/26/462
> ---
>  drivers/iommu/Kconfig  |   6 +++
>  drivers/iommu/Makefile |   1 +
>  drivers/iommu/ioasid.c | 140 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/ioasid.h |  67 +++++++++++++++++++++++
>  4 files changed, 214 insertions(+)
>  create mode 100644 drivers/iommu/ioasid.c
>  create mode 100644 include/linux/ioasid.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 6f07f3b..75e7f97 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -2,6 +2,12 @@
>  config IOMMU_IOVA
>  	tristate
>  
> +config IOASID
> +	bool
> +	help
> +	  Enable the I/O Address Space ID allocator. A single ID space shared
> +	  between different users.
> +
>  # IOMMU_API always gets selected by whoever wants it.
>  config IOMMU_API
>  	bool
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 8c71a15..0efac6f 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> +obj-$(CONFIG_IOASID) += ioasid.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> new file mode 100644
> index 0000000..99f5e0a
> --- /dev/null
> +++ b/drivers/iommu/ioasid.c
> @@ -0,0 +1,140 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * I/O Address Space ID allocator. There is one global IOASID space, split into
> + * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
> + * free IOASIDs with ioasid_alloc and ioasid_free.
> + */
> +#include <linux/xarray.h>
> +#include <linux/ioasid.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +struct ioasid_data {
> +	ioasid_t id;
> +	struct ioasid_set *set;
> +	void *private;
> +	struct rcu_head rcu;
> +};
> +
> +static DEFINE_XARRAY_ALLOC(ioasid_xa);
> +
> +/**
> + * ioasid_set_data - Set private data for an allocated ioasid
> + * @ioasid: the ID to set data
> + * @data:   the private data
> + *
> + * For IOASID that is already allocated, private data can be set
> + * via this API. Future lookup can be done via ioasid_find.
> + */
> +int ioasid_set_data(ioasid_t ioasid, void *data)
> +{
> +	struct ioasid_data *ioasid_data;
> +	int ret = 0;
> +
> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> +	if (ioasid_data)
> +		ioasid_data->private = data;
> +	else
> +		ret = -ENOENT;
> +
> +	/* getter may use the private data */
> +	synchronize_rcu();
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_set_data);
> +
> +/**
> + * ioasid_alloc - Allocate an IOASID
> + * @set: the IOASID set
> + * @min: the minimum ID (inclusive)
> + * @max: the maximum ID (inclusive)
> + * @private: data private to the caller
> + *
> + * Allocate an ID between @min and @max (or %0 and %INT_MAX). Return the
> + * allocated ID on success, or INVALID_IOASID on failure. The @private pointer
> + * is stored internally and can be retrieved with ioasid_find().
> + */
> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
> +		      void *private)
> +{
> +	int id = INVALID_IOASID;
> +	struct ioasid_data *data;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return INVALID_IOASID;
> +
> +	data->set = set;
> +	data->private = private;
> +
> +	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
> +		pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
> +		goto exit_free;
> +	}
> +	data->id = id;
> +
> +exit_free:
> +	if (id < 0 || id == INVALID_IOASID) {
> +		kfree(data);
> +		return INVALID_IOASID;
> +	}
> +	return id;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_alloc);
> +
> +/**
> + * ioasid_free - Free an IOASID
> + * @ioasid: the ID to remove
> + */
> +void ioasid_free(ioasid_t ioasid)
> +{
> +	struct ioasid_data *ioasid_data;
> +
> +	ioasid_data = xa_erase(&ioasid_xa, ioasid);
> +
> +	kfree_rcu(ioasid_data, rcu);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_free);
> +
> +/**
> + * ioasid_find - Find IOASID data
> + * @set: the IOASID set
> + * @ioasid: the IOASID to find
> + * @getter: function to call on the found object
> + *
> + * The optional getter function allows to take a reference to the found object
> + * under the rcu lock. The function can also check if the object is still valid:
> + * if @getter returns false, then the object is invalid and NULL is returned.
> + *
> + * If the IOASID has been allocated for this set, return the private pointer
> + * passed to ioasid_alloc. Private data can be NULL if not set. Return an error
> + * if the IOASID is not found or not belong to the set.
> + */
> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +		  bool (*getter)(void *))
> +{
> +	void *priv = NULL;
> +	struct ioasid_data *ioasid_data;
> +
> +	rcu_read_lock();
> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> +	if (!ioasid_data) {
> +		priv = ERR_PTR(-ENOENT);
> +		goto unlock;
> +	}
> +	if (set && ioasid_data->set != set) {
> +		/* data found but does not belong to the set */
> +		priv = ERR_PTR(-EACCES);
> +		goto unlock;
> +	}
> +	/* Now IOASID and its set is verified, we can return the private data */
> +	priv = ioasid_data->private;
> +	if (getter && !getter(priv))
> +		priv = NULL;
> +unlock:
> +	rcu_read_unlock();
> +
> +	return priv;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_find);
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> new file mode 100644
> index 0000000..41de5e4
> --- /dev/null
> +++ b/include/linux/ioasid.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_IOASID_H
> +#define __LINUX_IOASID_H
> +
> +#define INVALID_IOASID ((ioasid_t)-1)
> +typedef unsigned int ioasid_t;
> +typedef int (*ioasid_iter_t)(ioasid_t ioasid, void *private, void *data);
not used as reported during v2 review: https://lkml.org/lkml/2019/4/25/341

Thanks

Eric
> +typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data);
> +typedef void (*ioasid_free_fn_t)(ioasid_t ioasid, void *data);
> +
> +struct ioasid_set {
> +	int dummy;
> +};
> +
> +struct ioasid_allocator {
> +	ioasid_alloc_fn_t alloc;
> +	ioasid_free_fn_t free;
> +	void *pdata;
> +	struct list_head list;
> +};
> +
> +#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> +
> +#ifdef CONFIG_IOASID
> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
> +		      void *private);
> +void ioasid_free(ioasid_t ioasid);
> +
> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +		  bool (*getter)(void *));
> +int ioasid_register_allocator(struct ioasid_allocator *allocator);
> +void ioasid_unregister_allocator(struct ioasid_allocator *allocator);
> +
> +int ioasid_set_data(ioasid_t ioasid, void *data);
> +
> +#else /* !CONFIG_IOASID */
> +static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> +				    ioasid_t max, void *private)
> +{
> +	return INVALID_IOASID;
> +}
> +
> +static inline void ioasid_free(ioasid_t ioasid)
> +{
> +}
> +
> +static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +				bool (*getter)(void *))
> +{
> +	return NULL;
> +}
> +static inline int ioasid_register_allocator(struct ioasid_allocator *allocator)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void ioasid_unregister_allocator(struct ioasid_allocator *allocator)
> +{
> +}
> +
> +static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> +{
> +	return -ENODEV;
> +}
> +
> +#endif /* CONFIG_IOASID */
> +#endif /* __LINUX_IOASID_H */
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 04/16] ioasid: Add custom IOASID allocator
  2019-05-03 22:32 ` [PATCH v3 04/16] ioasid: Add custom IOASID allocator Jacob Pan
@ 2019-05-21  9:55   ` Auger Eric
  2019-05-22 19:42     ` Jacob Pan
  0 siblings, 1 reply; 52+ messages in thread
From: Auger Eric @ 2019-05-21  9:55 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi Jacob,

On 5/4/19 12:32 AM, Jacob Pan wrote:
> Sometimes, IOASID allocation must be handled by platform specific
> code. The use cases are guest vIOMMU and pvIOMMU where IOASIDs need
> to be allocated by the host via enlightened or paravirt interfaces.
> 
> This patch adds an extension to the IOASID allocator APIs such that
> platform drivers can register a custom allocator, possibly at boot
> time, to take over the allocation. Xarray is still used for tracking
> and searching purposes internal to the IOASID code. Private data of
> an IOASID can also be set after the allocation.
> 
> There can be multiple custom allocators registered but only one is
> used at a time. In case of hot removal of devices that provides the
> allocator, all IOASIDs must be freed prior to unregistering the
> allocator. Default XArray based allocator cannot be mixed with
> custom allocators, i.e. custom allocators will not be used if there
> are outstanding IOASIDs allocated by the default XA allocator.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/ioasid.c | 125 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 125 insertions(+)
> 
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 99f5e0a..ed2915a 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -17,6 +17,100 @@ struct ioasid_data {
>  };
>  
>  static DEFINE_XARRAY_ALLOC(ioasid_xa);
> +static DEFINE_MUTEX(ioasid_allocator_lock);
> +static struct ioasid_allocator *active_custom_allocator;
> +
> +static LIST_HEAD(custom_allocators);
> +/*
> + * A flag to track if ioasid default allocator is in use, this will
> + * prevent custom allocator from being used. The reason is that custom allocator
> + * must have unadulterated space to track private data with xarray, there cannot
> + * be a mix been default and custom allocated IOASIDs.
> + */
> +static int default_allocator_active;
> +
> +/**
> + * ioasid_register_allocator - register a custom allocator
> + * @allocator: the custom allocator to be registered
> + *
> + * Custom allocators take precedence over the default xarray based allocator.
> + * Private data associated with the ASID are managed by ASID common code
> + * similar to data stored in xa.
> + *
> + * There can be multiple allocators registered but only one is active. In case
> + * of runtime removal of a custom allocator, the next one is activated based
> + * on the registration ordering.
> + */
> +int ioasid_register_allocator(struct ioasid_allocator *allocator)
> +{
> +	struct ioasid_allocator *pallocator;
> +	int ret = 0;
> +
> +	if (!allocator)
> +		return -EINVAL;
is it really necessary? Sin't it the caller responsibility?
> +
> +	mutex_lock(&ioasid_allocator_lock);
> +	/*
> +	 * No particular preference since all custom allocators end up calling
> +	 * the host to allocate IOASIDs. We activate the first one and keep
> +	 * the later registered allocators in a list in case the first one gets
> +	 * removed due to hotplug.
> +	 */
> +	if (list_empty(&custom_allocators))
> +		active_custom_allocator = allocator;> +	else {
> +		/* Check if the allocator is already registered */
> +		list_for_each_entry(pallocator, &custom_allocators, list) {
> +			if (pallocator == allocator) {
> +				pr_err("IOASID allocator already registered\n");
> +				ret = -EEXIST;
> +				goto out_unlock;
> +			}
> +		}
> +	}
> +	list_add_tail(&allocator->list, &custom_allocators);
> +
> +out_unlock:
> +	mutex_unlock(&ioasid_allocator_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_register_allocator);
> +
> +/**
> + * ioasid_unregister_allocator - Remove a custom IOASID allocator
> + * @allocator: the custom allocator to be removed
> + *
> + * Remove an allocator from the list, activate the next allocator in
> + * the order it was registered.
> + */
> +void ioasid_unregister_allocator(struct ioasid_allocator *allocator)
> +{
> +	if (!allocator)
> +		return;
is it really necessary?
> +
> +	if (list_empty(&custom_allocators)) {
> +		pr_warn("No custom IOASID allocators active!\n");
> +		return;
> +	}
> +
> +	mutex_lock(&ioasid_allocator_lock);
> +	list_del(&allocator->list);
> +	if (list_empty(&custom_allocators)) {
> +		pr_info("No custom IOASID allocators\n")> +		/*
> +		 * All IOASIDs should have been freed before the last custom
> +		 * allocator is unregistered. Unless default allocator is in
> +		 * use.
> +		 */
> +		BUG_ON(!xa_empty(&ioasid_xa) && !default_allocator_active);
> +		active_custom_allocator = NULL;
> +	} else if (allocator == active_custom_allocator) {
In case you are removing the active custom allocator don't you also need
to check that all ioasids were freed. Otherwise you are likely to switch
to a different allocator whereas the asid space is partially populated.
> +		active_custom_allocator = list_entry(&custom_allocators, struct ioasid_allocator, list);
> +		pr_info("IOASID allocator changed");
> +	}
> +	mutex_unlock(&ioasid_allocator_lock);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);
>  
>  /**
>   * ioasid_set_data - Set private data for an allocated ioasid
> @@ -68,6 +162,29 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
>  	data->set = set;
>  	data->private = private;
>  
> +	mutex_lock(&ioasid_allocator_lock);
> +	/*
> +	 * Use custom allocator if available, otherwise use default.
> +	 * However, if there are active IOASIDs already been allocated by default
> +	 * allocator, custom allocator cannot be used.
> +	 */
> +	if (!default_allocator_active && active_custom_allocator) {
> +		id = active_custom_allocator->alloc(min, max, active_custom_allocator->pdata);
> +		if (id == INVALID_IOASID) {
> +			pr_err("Failed ASID allocation by custom allocator\n");
> +			mutex_unlock(&ioasid_allocator_lock);
> +			goto exit_free;
> +		}
> +		/*
> +		 * Use XA to manage private data also sanitiy check custom
> +		 * allocator for duplicates.
> +		 */
> +		min = id;
> +		max = id + 1;
> +	} else
> +		default_allocator_active = 1;
nit: true?
> +	mutex_unlock(&ioasid_allocator_lock);
> +
>  	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
>  		pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
>  		goto exit_free;> @@ -91,9 +208,17 @@ void ioasid_free(ioasid_t ioasid)
>  {
>  	struct ioasid_data *ioasid_data;
>  
> +	mutex_lock(&ioasid_allocator_lock);
> +	if (active_custom_allocator)
> +		active_custom_allocator->free(ioasid, active_custom_allocator->pdata);
> +	mutex_unlock(&ioasid_allocator_lock);
> +
>  	ioasid_data = xa_erase(&ioasid_xa, ioasid);
>  
>  	kfree_rcu(ioasid_data, rcu);
> +
> +	if (xa_empty(&ioasid_xa))
> +		default_allocator_active = 0;
Isn't it racy? what if an xa_alloc occurs inbetween?


>  }
>  EXPORT_SYMBOL_GPL(ioasid_free);
>  
> 

Thanks

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function
  2019-05-20 19:22       ` Jacob Pan
@ 2019-05-21 16:09         ` Jean-Philippe Brucker
  2019-05-21 22:50           ` Jacob Pan
  0 siblings, 1 reply; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-21 16:09 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko

On 20/05/2019 20:22, Jacob Pan wrote:
> On Thu, 16 May 2019 09:14:29 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
>> On Thu, 16 May 2019 15:14:40 +0100
>> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
>>
>>> Hi Jacob,
>>>
>>> On 03/05/2019 23:32, Jacob Pan wrote:  
>>>> +/**
>>>> + * struct gpasid_bind_data - Information about device and guest
>>>> PASID binding
>>>> + * @gcr3:	Guest CR3 value from guest mm
>>>> + * @pasid:	Process address space ID used for the guest mm
>>>> + * @addr_width:	Guest address width. Paging mode can also
>>>> be derived.
>>>> + */
>>>> +struct gpasid_bind_data {
>>>> +	__u64 gcr3;
>>>> +	__u32 pasid;
>>>> +	__u32 addr_width;
>>>> +	__u32 flags;
>>>> +#define	IOMMU_SVA_GPASID_SRE	BIT(0) /* supervisor
>>>> request */
>>>> +	__u8 padding[4];
>>>> +};    
>>>
>>> Could you wrap this structure into a generic one like we now do for
>>> bind_pasid_table? It would make the API easier to extend, because if
>>> we ever add individual PASID bind on Arm (something I'd like to do
>>> for virtio-iommu, eventually) it will have different parameters, as
>>> our PASID table entry has a lot of fields describing the page table
>>> format.
>>>
>>> Maybe something like the following would do?
>>>
>>> struct gpasid_bind_data {
>>> #define IOMMU_GPASID_BIND_VERSION_1 1
>>> 	__u32 version;
>>> #define IOMMU_GPASID_BIND_FORMAT_INTEL_VTD	1
>>> 	__u32 format;
>>> 	union {
>>> 		// the current gpasid_bind_data:
>>> 		struct gpasid_bind_intel_vtd vtd;
>>> 	};
>>> };
>>>   
> 
> Could you review the struct below? I am trying to extract the
> common fileds as much as possible. Didn't do exactly as you suggested
> to keep vendor specific data in separate struct under the same union.

Thanks, it looks good and I think we can reuse it for SMMUv2 and v3.
Some comments below.

> 
> Also, can you review the v3 ioasid allocator common code patches? I am
> hoping we can get the common code in v5.3 so that we can focus on the
> vendor specific part. The common code should include bind_guest_pasid
> and ioasid allocator.
> https://lkml.org/lkml/2019/5/3/787
> https://lkml.org/lkml/2019/5/3/780
> 
> Thanks,
> 
> Jacob
> 
> 
> /**
>  * struct gpasid_bind_data_vtd - Intel VT-d specific data on device and guest
>  * SVA binding.
>  *
>  * @flags:	VT-d PASID table entry attributes
>  * @pat:	Page attribute table data to compute effective memory type
>  * @emt:	Extended memory type
>  *
>  * Only guest vIOMMU selectable and effective options are passed down to
>  * the host IOMMU.
>  */
> struct gpasid_bind_data_vtd {
> #define	IOMMU_SVA_VTD_GPASID_SRE	BIT(0) /* supervisor request */
> #define	IOMMU_SVA_VTD_GPASID_EAFE	BIT(1) /* extended access enable */
> #define	IOMMU_SVA_VTD_GPASID_PCD	BIT(2) /* page-level cache disable */
> #define	IOMMU_SVA_VTD_GPASID_PWT	BIT(3) /* page-level write through */
> #define	IOMMU_SVA_VTD_GPASID_EMTE	BIT(4) /* extended memory type enable */
> #define	IOMMU_SVA_VTD_GPASID_CD		BIT(5) /* PASID-level cache disable */

It doesn't seem like the BIT() macro is exported to userspace, so we
can't use it here

> 	__u64 flags;
> 	__u32 pat;
> 	__u32 emt;
> };
> 
> /**
>  * struct gpasid_bind_data - Information about device and guest PASID binding
>  * @version:	Version of this data structure
>  * @format:	PASID table entry format
>  * @flags:	Additional information on guest bind request
>  * @gpgd:	Guest page directory base of the guest mm to bind
>  * @hpasid:	Process address space ID used for the guest mm in host IOMMU
>  * @gpasid:	Process address space ID used for the guest mm in guest IOMMU

Trying to understand the full flow:
* @gpasid is the one allocated by the guest using a virtual command. The
guest writes @gpgd into the virtual PASID table at index @gpasid, then
sends an invalidate command to QEMU.
* QEMU issues a gpasid_bind ioctl (on the mdev or its container?). VFIO
forwards. The IOMMU driver installs @gpgd into the PASID table using
@hpasid, which is associated with the auxiliary domain.

But why do we need the @hpasid field here? Does userspace know about it
at all, and does VFIO need to pass it to the IOMMU driver?

>  * @addr_width:	Guest address width. Paging mode can also be derived.

What does the last sentence mean? @addr_width should probably be in @vtd
if it provides implicit information.

>  * @vtd:	Intel VT-d specific data
>  */
> struct gpasid_bind_data {
> #define IOMMU_GPASID_BIND_VERSION_1	1
> 	__u32 version;
> #define IOMMU_PASID_FORMAT_INTEL_VTD	1
> 	__u32 format;
> #define	IOMMU_SVA_GPASID_VAL	BIT(1) /* guest PASID valid */

(There are tabs between define and name here, as well as in the VT-d
specific data)

> 	__u64 flags;
> 	__u64 gpgd;
> 	__u64 hpasid;
> 	__u64 gpasid;
> 	__u32 addr_width;

I think the union has to be aligned on 64-bit, otherwise a compiler
might insert padding (https://lkml.org/lkml/2019/1/11/1207)

Thanks,
Jean

> 	/* Vendor specific data */
> 	union {
> 		struct gpasid_bind_data_vtd vtd;
> 	};
> };
> 
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 03/16] iommu: Add I/O ASID allocator
  2019-05-21  8:21   ` Auger Eric
@ 2019-05-21 17:03     ` Jacob Pan
  2019-05-22 12:19       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 52+ messages in thread
From: Jacob Pan @ 2019-05-21 17:03 UTC (permalink / raw)
  To: Auger Eric
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, jacob.jun.pan

On Tue, 21 May 2019 10:21:55 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi,
> 
> On 5/4/19 12:32 AM, Jacob Pan wrote:
> > From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> > 
> > Some devices might support multiple DMA address spaces, in
> > particular those that have the PCI PASID feature. PASID (Process
> > Address Space ID) allows to share process address spaces with
> > devices (SVA), partition a device into VM-assignable entities (VFIO
> > mdev) or simply provide multiple DMA address space to kernel
> > drivers. Add a global PASID allocator usable by different drivers
> > at the same time. Name it I/O ASID to avoid confusion with ASIDs
> > allocated by arch code, which are usually a separate ID space.
> > 
> > The IOASID space is global. Each device can have its own PASID
> > space, but by convention the IOMMU ended up having a global PASID
> > space, so that with SVA, each mm_struct is associated to a single
> > PASID.
> > 
> > The allocator is primarily used by IOMMU subsystem but in rare
> > occasions drivers would like to allocate PASIDs for devices that
> > aren't managed by an IOMMU, using the same ID space as IOMMU.
> > 
> > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Link: https://lkml.org/lkml/2019/4/26/462
> > ---
> >  drivers/iommu/Kconfig  |   6 +++
> >  drivers/iommu/Makefile |   1 +
> >  drivers/iommu/ioasid.c | 140
> > +++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/ioasid.h |  67 +++++++++++++++++++++++ 4 files
> > changed, 214 insertions(+) create mode 100644 drivers/iommu/ioasid.c
> >  create mode 100644 include/linux/ioasid.h
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 6f07f3b..75e7f97 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -2,6 +2,12 @@
> >  config IOMMU_IOVA
> >  	tristate
> >  
> > +config IOASID
> > +	bool  
> don't we want a tristate here too?
> 
> Also refering to the past discussions we could add "# The IOASID
> library may also be used by non-IOMMU_API users"
I agree. For device driver modules to use ioasid w/o iommu, this does
not have to be built-in.
Jean, would you agree?

> > +	help
> > +	  Enable the I/O Address Space ID allocator. A single ID
> > space shared
> > +	  between different users.
> > +
> >  # IOMMU_API always gets selected by whoever wants it.
> >  config IOMMU_API
> >  	bool
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index 8c71a15..0efac6f 100644
> > --- a/drivers/iommu/Makefile
> > +++ b/drivers/iommu/Makefile
> > @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> > +obj-$(CONFIG_IOASID) += ioasid.o
> >  obj-$(CONFIG_IOMMU_IOVA) += iova.o
> >  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
> >  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > new file mode 100644
> > index 0000000..99f5e0a
> > --- /dev/null
> > +++ b/drivers/iommu/ioasid.c
> > @@ -0,0 +1,140 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * I/O Address Space ID allocator. There is one global IOASID
> > space, split into
> > + * subsets. Users create a subset with DECLARE_IOASID_SET, then
> > allocate and
> > + * free IOASIDs with ioasid_alloc and ioasid_free.
> > + */
> > +#include <linux/xarray.h>
> > +#include <linux/ioasid.h>
> > +#include <linux/slab.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct ioasid_data {
> > +	ioasid_t id;
> > +	struct ioasid_set *set;
> > +	void *private;
> > +	struct rcu_head rcu;
> > +};
> > +
> > +static DEFINE_XARRAY_ALLOC(ioasid_xa);
> > +
> > +/**
> > + * ioasid_set_data - Set private data for an allocated ioasid
> > + * @ioasid: the ID to set data
> > + * @data:   the private data
> > + *
> > + * For IOASID that is already allocated, private data can be set
> > + * via this API. Future lookup can be done via ioasid_find.
> > + */
> > +int ioasid_set_data(ioasid_t ioasid, void *data)
> > +{
> > +	struct ioasid_data *ioasid_data;
> > +	int ret = 0;
> > +
> > +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> > +	if (ioasid_data)
> > +		ioasid_data->private = data;
> > +	else
> > +		ret = -ENOENT;
> > +
> > +	/* getter may use the private data */
> > +	synchronize_rcu();
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_set_data);
> > +
> > +/**
> > + * ioasid_alloc - Allocate an IOASID
> > + * @set: the IOASID set
> > + * @min: the minimum ID (inclusive)
> > + * @max: the maximum ID (inclusive)
> > + * @private: data private to the caller
> > + *
> > + * Allocate an ID between @min and @max (or %0 and %INT_MAX).  
> I think we agreed to drop (or %0 and %INT_MAX)
>  Return the
sorry I don't recall but works for me.

> > + * allocated ID on success, or INVALID_IOASID on failure. The
> > @private pointer
> > + * is stored internally and can be retrieved with ioasid_find().
> > + */
> > +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> > ioasid_t max,
> > +		      void *private)
> > +{
> > +	int id = INVALID_IOASID;  
> isn't it an unsigned?
Right

> > +	struct ioasid_data *data;
> > +
> > +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +	if (!data)
> > +		return INVALID_IOASID;
> > +
> > +	data->set = set;
> > +	data->private = private;
> > +
> > +	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max),
> > GFP_KERNEL)) {
> > +		pr_err("Failed to alloc ioasid from %d to %d\n",
> > min, max);
> > +		goto exit_free;
> > +	}
> > +	data->id = id;
> > +
> > +exit_free:
> > +	if (id < 0 || id == INVALID_IOASID) {  
> < 0?
Agreed.
> > +		kfree(data);
> > +		return INVALID_IOASID;
> > +	}
> > +	return id;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_alloc);
> > +
> > +/**
> > + * ioasid_free - Free an IOASID
> > + * @ioasid: the ID to remove
> > + */
> > +void ioasid_free(ioasid_t ioasid)
> > +{
> > +	struct ioasid_data *ioasid_data;
> > +
> > +	ioasid_data = xa_erase(&ioasid_xa, ioasid);
> > +
> > +	kfree_rcu(ioasid_data, rcu);
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_free);
> > +
> > +/**
> > + * ioasid_find - Find IOASID data
> > + * @set: the IOASID set
> > + * @ioasid: the IOASID to find
> > + * @getter: function to call on the found object
> > + *
> > + * The optional getter function allows to take a reference to the
> > found object
> > + * under the rcu lock. The function can also check if the object
> > is still valid:
> > + * if @getter returns false, then the object is invalid and NULL
> > is returned.
> > + *
> > + * If the IOASID has been allocated for this set, return the
> > private pointer
> > + * passed to ioasid_alloc. Private data can be NULL if not set.
> > Return an error
> > + * if the IOASID is not found or not belong to the set.  
> do not belong
Will fix, "does not belong"

> > + */
> > +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> > +		  bool (*getter)(void *))
> > +{
> > +	void *priv = NULL;
> > +	struct ioasid_data *ioasid_data;
> > +
> > +	rcu_read_lock();
> > +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> > +	if (!ioasid_data) {
> > +		priv = ERR_PTR(-ENOENT);
> > +		goto unlock;
> > +	}
> > +	if (set && ioasid_data->set != set) {
> > +		/* data found but does not belong to the set */
> > +		priv = ERR_PTR(-EACCES);
> > +		goto unlock;
> > +	}
> > +	/* Now IOASID and its set is verified, we can return the
> > private data */
> > +	priv = ioasid_data->private;
> > +	if (getter && !getter(priv))
> > +		priv = NULL;
> > +unlock:
> > +	rcu_read_unlock();
> > +
> > +	return priv;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_find);
> > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > new file mode 100644
> > index 0000000..41de5e4
> > --- /dev/null
> > +++ b/include/linux/ioasid.h
> > @@ -0,0 +1,67 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __LINUX_IOASID_H
> > +#define __LINUX_IOASID_H
> > +
> > +#define INVALID_IOASID ((ioasid_t)-1)
> > +typedef unsigned int ioasid_t;
> > +typedef int (*ioasid_iter_t)(ioasid_t ioasid, void *private, void
> > *data); +typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min,
> > ioasid_t max, void *data); +typedef void
> > (*ioasid_free_fn_t)(ioasid_t ioasid, void *data); +
> > +struct ioasid_set {
> > +	int dummy;
> > +};
> > +
> > +struct ioasid_allocator {
> > +	ioasid_alloc_fn_t alloc;
> > +	ioasid_free_fn_t free;
> > +	void *pdata;
> > +	struct list_head list;
> > +};
> > +
> > +#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> > +
> > +#ifdef CONFIG_IOASID
> > +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> > ioasid_t max,
> > +		      void *private);
> > +void ioasid_free(ioasid_t ioasid);
> > +
> > +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> > +		  bool (*getter)(void *));
> > +int ioasid_register_allocator(struct ioasid_allocator *allocator);
> > +void ioasid_unregister_allocator(struct ioasid_allocator
> > *allocator); +
> > +int ioasid_set_data(ioasid_t ioasid, void *data);
> > +
> > +#else /* !CONFIG_IOASID */
> > +static inline ioasid_t ioasid_alloc(struct ioasid_set *set,
> > ioasid_t min,
> > +				    ioasid_t max, void *private)
> > +{
> > +	return INVALID_IOASID;
> > +}
> > +
> > +static inline void ioasid_free(ioasid_t ioasid)
> > +{
> > +}
> > +
> > +static inline void *ioasid_find(struct ioasid_set *set, ioasid_t
> > ioasid,
> > +				bool (*getter)(void *))
> > +{
> > +	return NULL;
> > +}
> > +static inline int ioasid_register_allocator(struct
> > ioasid_allocator *allocator) +{
> > +	return -ENODEV;
> > +}
> > +
> > +static inline void ioasid_unregister_allocator(struct
> > ioasid_allocator *allocator) +{
> > +}
> > +
> > +static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> > +{
> > +	return -ENODEV;
> > +}
> > +
> > +#endif /* CONFIG_IOASID */
> > +#endif /* __LINUX_IOASID_H */
> >   
> Thanks
> 
> Eric
Thank you for the review

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 03/16] iommu: Add I/O ASID allocator
  2019-05-21  9:41   ` Auger Eric
@ 2019-05-21 17:05     ` Jacob Pan
  0 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-21 17:05 UTC (permalink / raw)
  To: Auger Eric
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, jacob.jun.pan

On Tue, 21 May 2019 11:41:52 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi,
> 
> On 5/4/19 12:32 AM, Jacob Pan wrote:
> > From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> > 
> > Some devices might support multiple DMA address spaces, in
> > particular those that have the PCI PASID feature. PASID (Process
> > Address Space ID) allows to share process address spaces with
> > devices (SVA), partition a device into VM-assignable entities (VFIO
> > mdev) or simply provide multiple DMA address space to kernel
> > drivers. Add a global PASID allocator usable by different drivers
> > at the same time. Name it I/O ASID to avoid confusion with ASIDs
> > allocated by arch code, which are usually a separate ID space.
> > 
> > The IOASID space is global. Each device can have its own PASID
> > space, but by convention the IOMMU ended up having a global PASID
> > space, so that with SVA, each mm_struct is associated to a single
> > PASID.
> > 
> > The allocator is primarily used by IOMMU subsystem but in rare
> > occasions drivers would like to allocate PASIDs for devices that
> > aren't managed by an IOMMU, using the same ID space as IOMMU.
> > 
> > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Link: https://lkml.org/lkml/2019/4/26/462
> > ---
> >  drivers/iommu/Kconfig  |   6 +++
> >  drivers/iommu/Makefile |   1 +
> >  drivers/iommu/ioasid.c | 140
> > +++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/ioasid.h |  67 +++++++++++++++++++++++ 4 files
> > changed, 214 insertions(+) create mode 100644 drivers/iommu/ioasid.c
> >  create mode 100644 include/linux/ioasid.h
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 6f07f3b..75e7f97 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -2,6 +2,12 @@
> >  config IOMMU_IOVA
> >  	tristate
> >  
> > +config IOASID
> > +	bool
> > +	help
> > +	  Enable the I/O Address Space ID allocator. A single ID
> > space shared
> > +	  between different users.
> > +
> >  # IOMMU_API always gets selected by whoever wants it.
> >  config IOMMU_API
> >  	bool
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index 8c71a15..0efac6f 100644
> > --- a/drivers/iommu/Makefile
> > +++ b/drivers/iommu/Makefile
> > @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> > +obj-$(CONFIG_IOASID) += ioasid.o
> >  obj-$(CONFIG_IOMMU_IOVA) += iova.o
> >  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
> >  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > new file mode 100644
> > index 0000000..99f5e0a
> > --- /dev/null
> > +++ b/drivers/iommu/ioasid.c
> > @@ -0,0 +1,140 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * I/O Address Space ID allocator. There is one global IOASID
> > space, split into
> > + * subsets. Users create a subset with DECLARE_IOASID_SET, then
> > allocate and
> > + * free IOASIDs with ioasid_alloc and ioasid_free.
> > + */
> > +#include <linux/xarray.h>
> > +#include <linux/ioasid.h>
> > +#include <linux/slab.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct ioasid_data {
> > +	ioasid_t id;
> > +	struct ioasid_set *set;
> > +	void *private;
> > +	struct rcu_head rcu;
> > +};
> > +
> > +static DEFINE_XARRAY_ALLOC(ioasid_xa);
> > +
> > +/**
> > + * ioasid_set_data - Set private data for an allocated ioasid
> > + * @ioasid: the ID to set data
> > + * @data:   the private data
> > + *
> > + * For IOASID that is already allocated, private data can be set
> > + * via this API. Future lookup can be done via ioasid_find.
> > + */
> > +int ioasid_set_data(ioasid_t ioasid, void *data)
> > +{
> > +	struct ioasid_data *ioasid_data;
> > +	int ret = 0;
> > +
> > +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> > +	if (ioasid_data)
> > +		ioasid_data->private = data;
> > +	else
> > +		ret = -ENOENT;
> > +
> > +	/* getter may use the private data */
> > +	synchronize_rcu();
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_set_data);
> > +
> > +/**
> > + * ioasid_alloc - Allocate an IOASID
> > + * @set: the IOASID set
> > + * @min: the minimum ID (inclusive)
> > + * @max: the maximum ID (inclusive)
> > + * @private: data private to the caller
> > + *
> > + * Allocate an ID between @min and @max (or %0 and %INT_MAX).
> > Return the
> > + * allocated ID on success, or INVALID_IOASID on failure. The
> > @private pointer
> > + * is stored internally and can be retrieved with ioasid_find().
> > + */
> > +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> > ioasid_t max,
> > +		      void *private)
> > +{
> > +	int id = INVALID_IOASID;
> > +	struct ioasid_data *data;
> > +
> > +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +	if (!data)
> > +		return INVALID_IOASID;
> > +
> > +	data->set = set;
> > +	data->private = private;
> > +
> > +	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max),
> > GFP_KERNEL)) {
> > +		pr_err("Failed to alloc ioasid from %d to %d\n",
> > min, max);
> > +		goto exit_free;
> > +	}
> > +	data->id = id;
> > +
> > +exit_free:
> > +	if (id < 0 || id == INVALID_IOASID) {
> > +		kfree(data);
> > +		return INVALID_IOASID;
> > +	}
> > +	return id;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_alloc);
> > +
> > +/**
> > + * ioasid_free - Free an IOASID
> > + * @ioasid: the ID to remove
> > + */
> > +void ioasid_free(ioasid_t ioasid)
> > +{
> > +	struct ioasid_data *ioasid_data;
> > +
> > +	ioasid_data = xa_erase(&ioasid_xa, ioasid);
> > +
> > +	kfree_rcu(ioasid_data, rcu);
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_free);
> > +
> > +/**
> > + * ioasid_find - Find IOASID data
> > + * @set: the IOASID set
> > + * @ioasid: the IOASID to find
> > + * @getter: function to call on the found object
> > + *
> > + * The optional getter function allows to take a reference to the
> > found object
> > + * under the rcu lock. The function can also check if the object
> > is still valid:
> > + * if @getter returns false, then the object is invalid and NULL
> > is returned.
> > + *
> > + * If the IOASID has been allocated for this set, return the
> > private pointer
> > + * passed to ioasid_alloc. Private data can be NULL if not set.
> > Return an error
> > + * if the IOASID is not found or not belong to the set.
> > + */
> > +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> > +		  bool (*getter)(void *))
> > +{
> > +	void *priv = NULL;
> > +	struct ioasid_data *ioasid_data;
> > +
> > +	rcu_read_lock();
> > +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> > +	if (!ioasid_data) {
> > +		priv = ERR_PTR(-ENOENT);
> > +		goto unlock;
> > +	}
> > +	if (set && ioasid_data->set != set) {
> > +		/* data found but does not belong to the set */
> > +		priv = ERR_PTR(-EACCES);
> > +		goto unlock;
> > +	}
> > +	/* Now IOASID and its set is verified, we can return the
> > private data */
> > +	priv = ioasid_data->private;
> > +	if (getter && !getter(priv))
> > +		priv = NULL;
> > +unlock:
> > +	rcu_read_unlock();
> > +
> > +	return priv;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_find);
> > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > new file mode 100644
> > index 0000000..41de5e4
> > --- /dev/null
> > +++ b/include/linux/ioasid.h
> > @@ -0,0 +1,67 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __LINUX_IOASID_H
> > +#define __LINUX_IOASID_H
> > +
> > +#define INVALID_IOASID ((ioasid_t)-1)
> > +typedef unsigned int ioasid_t;
> > +typedef int (*ioasid_iter_t)(ioasid_t ioasid, void *private, void
> > *data);  
> not used as reported during v2 review:
> https://lkml.org/lkml/2019/4/25/341
> 
I missed it, thanks.
> Thanks
> 
> Eric
> > +typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max,
> > void *data); +typedef void (*ioasid_free_fn_t)(ioasid_t ioasid,
> > void *data); +
> > +struct ioasid_set {
> > +	int dummy;
> > +};
> > +
> > +struct ioasid_allocator {
> > +	ioasid_alloc_fn_t alloc;
> > +	ioasid_free_fn_t free;
> > +	void *pdata;
> > +	struct list_head list;
> > +};
> > +
> > +#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> > +
> > +#ifdef CONFIG_IOASID
> > +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> > ioasid_t max,
> > +		      void *private);
> > +void ioasid_free(ioasid_t ioasid);
> > +
> > +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> > +		  bool (*getter)(void *));
> > +int ioasid_register_allocator(struct ioasid_allocator *allocator);
> > +void ioasid_unregister_allocator(struct ioasid_allocator
> > *allocator); +
> > +int ioasid_set_data(ioasid_t ioasid, void *data);
> > +
> > +#else /* !CONFIG_IOASID */
> > +static inline ioasid_t ioasid_alloc(struct ioasid_set *set,
> > ioasid_t min,
> > +				    ioasid_t max, void *private)
> > +{
> > +	return INVALID_IOASID;
> > +}
> > +
> > +static inline void ioasid_free(ioasid_t ioasid)
> > +{
> > +}
> > +
> > +static inline void *ioasid_find(struct ioasid_set *set, ioasid_t
> > ioasid,
> > +				bool (*getter)(void *))
> > +{
> > +	return NULL;
> > +}
> > +static inline int ioasid_register_allocator(struct
> > ioasid_allocator *allocator) +{
> > +	return -ENODEV;
> > +}
> > +
> > +static inline void ioasid_unregister_allocator(struct
> > ioasid_allocator *allocator) +{
> > +}
> > +
> > +static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> > +{
> > +	return -ENODEV;
> > +}
> > +
> > +#endif /* CONFIG_IOASID */
> > +#endif /* __LINUX_IOASID_H */
> >   

[Jacob Pan]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function
  2019-05-21 16:09         ` Jean-Philippe Brucker
@ 2019-05-21 22:50           ` Jacob Pan
  2019-05-22 15:05             ` Jean-Philippe Brucker
  0 siblings, 1 reply; 52+ messages in thread
From: Jacob Pan @ 2019-05-21 22:50 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko,
	jacob.jun.pan

On Tue, 21 May 2019 17:09:40 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 20/05/2019 20:22, Jacob Pan wrote:
> > On Thu, 16 May 2019 09:14:29 -0700
> > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> >   
> >> On Thu, 16 May 2019 15:14:40 +0100
> >> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> >>  
> >>> Hi Jacob,
> >>>
> >>> On 03/05/2019 23:32, Jacob Pan wrote:    
> >>>> +/**
> >>>> + * struct gpasid_bind_data - Information about device and guest
> >>>> PASID binding
> >>>> + * @gcr3:	Guest CR3 value from guest mm
> >>>> + * @pasid:	Process address space ID used for the guest mm
> >>>> + * @addr_width:	Guest address width. Paging mode can also
> >>>> be derived.
> >>>> + */
> >>>> +struct gpasid_bind_data {
> >>>> +	__u64 gcr3;
> >>>> +	__u32 pasid;
> >>>> +	__u32 addr_width;
> >>>> +	__u32 flags;
> >>>> +#define	IOMMU_SVA_GPASID_SRE	BIT(0) /* supervisor
> >>>> request */
> >>>> +	__u8 padding[4];
> >>>> +};      
> >>>
> >>> Could you wrap this structure into a generic one like we now do
> >>> for bind_pasid_table? It would make the API easier to extend,
> >>> because if we ever add individual PASID bind on Arm (something
> >>> I'd like to do for virtio-iommu, eventually) it will have
> >>> different parameters, as our PASID table entry has a lot of
> >>> fields describing the page table format.
> >>>
> >>> Maybe something like the following would do?
> >>>
> >>> struct gpasid_bind_data {
> >>> #define IOMMU_GPASID_BIND_VERSION_1 1
> >>> 	__u32 version;
> >>> #define IOMMU_GPASID_BIND_FORMAT_INTEL_VTD	1
> >>> 	__u32 format;
> >>> 	union {
> >>> 		// the current gpasid_bind_data:
> >>> 		struct gpasid_bind_intel_vtd vtd;
> >>> 	};
> >>> };
> >>>     
> > 
> > Could you review the struct below? I am trying to extract the
> > common fileds as much as possible. Didn't do exactly as you
> > suggested to keep vendor specific data in separate struct under the
> > same union.  
> 
> Thanks, it looks good and I think we can reuse it for SMMUv2 and v3.
> Some comments below.
> 
> > 
> > Also, can you review the v3 ioasid allocator common code patches? I
> > am hoping we can get the common code in v5.3 so that we can focus
> > on the vendor specific part. The common code should include
> > bind_guest_pasid and ioasid allocator.
> > https://lkml.org/lkml/2019/5/3/787
> > https://lkml.org/lkml/2019/5/3/780
> > 
> > Thanks,
> > 
> > Jacob
> > 
> > 
> > /**
> >  * struct gpasid_bind_data_vtd - Intel VT-d specific data on device
> > and guest
> >  * SVA binding.
> >  *
> >  * @flags:	VT-d PASID table entry attributes
> >  * @pat:	Page attribute table data to compute effective
> > memory type
> >  * @emt:	Extended memory type
> >  *
> >  * Only guest vIOMMU selectable and effective options are passed
> > down to
> >  * the host IOMMU.
> >  */
> > struct gpasid_bind_data_vtd {
> > #define	IOMMU_SVA_VTD_GPASID_SRE	BIT(0) /* supervisor
> > request */ #define	IOMMU_SVA_VTD_GPASID_EAFE
> > BIT(1) /* extended access enable */ #define
> > IOMMU_SVA_VTD_GPASID_PCD	BIT(2) /* page-level cache disable
> > */ #define	IOMMU_SVA_VTD_GPASID_PWT	BIT(3) /*
> > page-level write through */ #define
> > IOMMU_SVA_VTD_GPASID_EMTE	BIT(4) /* extended memory type
> > enable */ #define	IOMMU_SVA_VTD_GPASID_CD
> > BIT(5) /* PASID-level cache disable */  
> 
> It doesn't seem like the BIT() macro is exported to userspace, so we
> can't use it here
> 
good point, will avoid BIT()
> > 	__u64 flags;
> > 	__u32 pat;
> > 	__u32 emt;
> > };
> > 
> > /**
> >  * struct gpasid_bind_data - Information about device and guest
> > PASID binding
> >  * @version:	Version of this data structure
> >  * @format:	PASID table entry format
> >  * @flags:	Additional information on guest bind request
> >  * @gpgd:	Guest page directory base of the guest mm to bind
> >  * @hpasid:	Process address space ID used for the guest mm
> > in host IOMMU
> >  * @gpasid:	Process address space ID used for the guest mm
> > in guest IOMMU  
> 
> Trying to understand the full flow:
> * @gpasid is the one allocated by the guest using a virtual command.
> The guest writes @gpgd into the virtual PASID table at index @gpasid,
> then sends an invalidate command to QEMU.
yes
> * QEMU issues a gpasid_bind ioctl (on the mdev or its container?).
> VFIO forwards. The IOMMU driver installs @gpgd into the PASID table
> using @hpasid, which is associated with the auxiliary domain.
> 
> But why do we need the @hpasid field here? Does userspace know about
> it at all, and does VFIO need to pass it to the IOMMU driver?
> 
We need to support two guest-host PASID mappings through this API. Idea
comes from Kevin & Yi.
1. identity mapping between host and guest PASID
2. guest owns its own pasid space

For option 1, which will plan to support first in this series. There is
no need for gpasid field since gpasid=hpasid. Guest allocates PASID
using virtual command interface which gets a host PASID. Then PASID
cache invalidation in the guest will result in bind_gpasid(), @gpasid is
not valid in the bind data (indicated by the IOMMU_SVA_GPASID_VAL flag).

For option 2, guest still uses virtual command to allocate guest pasid,
but this time QEMU does the allocation for gpasid, at the same time
QEMU will allocate a host pasid then maintain a G->H PASID lookup.
When guest invalidate its PASID cache with GPASID, QEMU will find the
match host PASID then pass both gpasid and hpasid down to the host IOMMU
driver.
Host IOMMU driver will store the gpgd at the hpasid entry but keep
track of the gpasid->hpasid mapping. Host will never program gpasid in
the IOMMU HW. Host IOMMU driver provides G->H PASID translation for PF
device drivers that emulates mdev config space, i.e. virtual device
composition module
(https://events.linuxfoundation.org/wp-content/uploads/2017/12/Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf).

These two options is a per VM choice. Hopefully the two diagrams below
can help to explain. I will put them in the next patch headers.


Option 1. Identity G-H PASID mapping diagram.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process mm, FL only |
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'\                      |
    |             | \                     |
    |             |  \                    |
    '-------------'   \________________   |
                        GPASID = HPASID   |
Guest                  ^      ^           |
------| Shadow |-------| VCMD |-----------|------------
      v        v       |      |           |
QEMU                   v      v           |
------------------------------------------|------------
Host             HPASID = ioasid_alloc()  |
                    |                     v
                    |       sva_bind_gpasid(HPASID)
                    |
    .-------------. |  .----------------------.
    |   pIOMMU    | |  | Bind FL for GVA-GPA  |
    |             | | /'----------------------'
    .----------------'  |
    | PASID Entry |     V (Nested xlate)
    '----------------..---------------------.
    |             |   |Set SL to GPA-HPA    |
    |             |   '---------------------'
    '-------------'



Option 2. Non-identity G-H PASID mapping diagram.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process mm, FL only |
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'\                      | .-------------.
    |             | \                     | |Guest driver |
    |             |  \                    | |writes GPASID|
    '-------------'   \________________   | '-------------'
                        GPASID            |             |
Guest                  ^      ^           |             |
------| Shadow |-------| VCMD |-----------|------------ |
      v        v       |      |           |             |
QEMU                   v      v           |             |
	    GPASID = qemu_gpasid_alloc()  |             |
            keep G->H PASID lookup        |             |
                   ^                      v             |
		   |                 lookup G->H PASID  |
-------------------|----------------------|------------ |
Host             HPASID = ioasid_alloc()  |             |
                    |                     v             |
                    |     sva_bind_gpasid(HPASID,GPASID)|
                    |     keep H-G PASID lookup         |
                    |                          \  -------------------.
    .-------------. |  .----------------------. \|    VDCM           |
    |   pIOMMU    | |  | Bind FL for GVA-GPA  |  | H = lookup(GPASID)|
    |             | | /'----------------------'  | write H to dev    |
    .----------------'  |                         '------------------'
    | PASID Entry |     V (Nested xlate)
    '----------------..---------------------.
    |             |   |Set SL to GPA-HPA    |
    |             |   '---------------------'
    '-------------'
There is also implications in G-H pasid lookup for PRQ, that would be
in the later series.

> >  * @addr_width:	Guest address width. Paging mode can also be
> > derived.  
> 
> What does the last sentence mean? @addr_width should probably be in
> @vtd if it provides implicit information.
> 
Derive 4 or 5 level paging mode from the address width. It can be in
@vtd but i thought this can be generic.

> >  * @vtd:	Intel VT-d specific data
> >  */
> > struct gpasid_bind_data {
> > #define IOMMU_GPASID_BIND_VERSION_1	1
> > 	__u32 version;
> > #define IOMMU_PASID_FORMAT_INTEL_VTD	1
> > 	__u32 format;
> > #define	IOMMU_SVA_GPASID_VAL	BIT(1) /* guest PASID
> > valid */  
> 
> (There are tabs between define and name here, as well as in the VT-d
> specific data)
> 
> > 	__u64 flags;
> > 	__u64 gpgd;
> > 	__u64 hpasid;
> > 	__u64 gpasid;
> > 	__u32 addr_width;  
> 
> I think the union has to be aligned on 64-bit, otherwise a compiler
> might insert padding (https://lkml.org/lkml/2019/1/11/1207)
> 
good point, will fix.
> Thanks,
> Jean
> 
> > 	/* Vendor specific data */
> > 	union {
> > 		struct gpasid_bind_data_vtd vtd;
> > 	};
> > };
> > 
> >   
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 03/16] iommu: Add I/O ASID allocator
  2019-05-21 17:03     ` Jacob Pan
@ 2019-05-22 12:19       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-22 12:19 UTC (permalink / raw)
  To: Jacob Pan, Auger Eric
  Cc: Tian, Kevin, Raj Ashok, iommu, LKML, Alex Williamson,
	Andriy Shevchenko, David Woodhouse

On 21/05/2019 18:03, Jacob Pan wrote:
> On Tue, 21 May 2019 10:21:55 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
>>> +config IOASID
>>> +	bool  
>> don't we want a tristate here too?
>>
>> Also refering to the past discussions we could add "# The IOASID
>> library may also be used by non-IOMMU_API users"
> I agree. For device driver modules to use ioasid w/o iommu, this does
> not have to be built-in.
> Jean, would you agree?

Yes we can make it tristate. There is a couple of things missing to
build it as a module:
* Add MODULE_LICENSE("GPL") to ioasid.c
* Use #if IS_ENABLED(CONFIG_IOASID) in ioasid.h rather than #ifdef
CONFIG_IOASID

>>> +	help
>>> +	  Enable the I/O Address Space ID allocator. A single ID
>>> space shared
>>> +	  between different users.
>>> +
>>>  # IOMMU_API always gets selected by whoever wants it.
>>>  config IOMMU_API
>>>  	bool
>>> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
>>> index 8c71a15..0efac6f 100644
>>> --- a/drivers/iommu/Makefile
>>> +++ b/drivers/iommu/Makefile
>>> @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>>>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>>>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>>>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
>>> +obj-$(CONFIG_IOASID) += ioasid.o
>>>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>>>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>>>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
>>> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
>>> new file mode 100644
>>> index 0000000..99f5e0a
>>> --- /dev/null
>>> +++ b/drivers/iommu/ioasid.c
>>> @@ -0,0 +1,140 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * I/O Address Space ID allocator. There is one global IOASID
>>> space, split into
>>> + * subsets. Users create a subset with DECLARE_IOASID_SET, then
>>> allocate and
>>> + * free IOASIDs with ioasid_alloc and ioasid_free.
>>> + */
>>> +#include <linux/xarray.h>
>>> +#include <linux/ioasid.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/spinlock.h>

nit: sort alphabetically

>>> +
>>> +struct ioasid_data {
>>> +	ioasid_t id;
>>> +	struct ioasid_set *set;
>>> +	void *private;
>>> +	struct rcu_head rcu;
>>> +};
>>> +
>>> +static DEFINE_XARRAY_ALLOC(ioasid_xa);
>>> +
>>> +/**
>>> + * ioasid_set_data - Set private data for an allocated ioasid
>>> + * @ioasid: the ID to set data
>>> + * @data:   the private data
>>> + *
>>> + * For IOASID that is already allocated, private data can be set
>>> + * via this API. Future lookup can be done via ioasid_find.
>>> + */
>>> +int ioasid_set_data(ioasid_t ioasid, void *data)
>>> +{
>>> +	struct ioasid_data *ioasid_data;
>>> +	int ret = 0;
>>> +
>>> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
>>> +	if (ioasid_data)
>>> +		ioasid_data->private = data;

I think we might be in trouble if this function runs concurrently with
ioasid_free(). ioasid_data may be freed between xa_load() and this
assignment. It's probably not a valid use at the moment but we might as
well make this code robust (or describe the constraints of
ioasid_set_data() in the comment).

I'm still uneasy about this, but I think we need the following sequence:

	xa_lock();
	ioasid_data = xa_load()
	if (ioasid_data)
		rcu_assign_pointer(ioasid_data->private, data);
	else
		ret = -ENOENT;
	xa_unlock();

>>> +	else
>>> +		ret = -ENOENT;
>>> +
>>> +	/* getter may use the private data */
>>> +	synchronize_rcu();

If I understand correctly, this allows our caller to safely free the old
data, if any? Is there any other reason to have a synchronize_rcu()?
Otherwise the comment could be more precise:

/* Wait for readers to stop accessing the old private data so the caller
can free it. */

>>> +
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(ioasid_set_data);
[...]
>>> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
>>> +		  bool (*getter)(void *))
>>> +{
>>> +	void *priv = NULL;
>>> +	struct ioasid_data *ioasid_data;
>>> +
>>> +	rcu_read_lock();
>>> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
>>> +	if (!ioasid_data) {
>>> +		priv = ERR_PTR(-ENOENT);
>>> +		goto unlock;
>>> +	}
>>> +	if (set && ioasid_data->set != set) {
>>> +		/* data found but does not belong to the set */
>>> +		priv = ERR_PTR(-EACCES);
>>> +		goto unlock;
>>> +	}
>>> +	/* Now IOASID and its set is verified, we can return the
>>> private data */
>>> +	priv = ioasid_data->private;

And here, I suppose we need:

	priv = rcu_dereference(ioasid_data->private);

Thanks,
Jean

>>> +	if (getter && !getter(priv))
>>> +		priv = NULL;
>>> +unlock:
>>> +	rcu_read_unlock();
>>> +
>>> +	return priv;
>>> +}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function
  2019-05-21 22:50           ` Jacob Pan
@ 2019-05-22 15:05             ` Jean-Philippe Brucker
  2019-05-22 17:15               ` Jacob Pan
  0 siblings, 1 reply; 52+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-22 15:05 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko

On 21/05/2019 23:50, Jacob Pan wrote:
>>> /**
>>>  * struct gpasid_bind_data - Information about device and guest
>>> PASID binding
>>>  * @version:	Version of this data structure
>>>  * @format:	PASID table entry format
>>>  * @flags:	Additional information on guest bind request
>>>  * @gpgd:	Guest page directory base of the guest mm to bind
>>>  * @hpasid:	Process address space ID used for the guest mm
>>> in host IOMMU
>>>  * @gpasid:	Process address space ID used for the guest mm
>>> in guest IOMMU  
>>
>> Trying to understand the full flow:
>> * @gpasid is the one allocated by the guest using a virtual command.
>> The guest writes @gpgd into the virtual PASID table at index @gpasid,
>> then sends an invalidate command to QEMU.
> yes
>> * QEMU issues a gpasid_bind ioctl (on the mdev or its container?).
>> VFIO forwards. The IOMMU driver installs @gpgd into the PASID table
>> using @hpasid, which is associated with the auxiliary domain.
>>
>> But why do we need the @hpasid field here? Does userspace know about
>> it at all, and does VFIO need to pass it to the IOMMU driver?
>>
> We need to support two guest-host PASID mappings through this API. Idea
> comes from Kevin & Yi.
> 1. identity mapping between host and guest PASID
> 2. guest owns its own pasid space
> 
> For option 1, which will plan to support first in this series. There is
> no need for gpasid field since gpasid=hpasid. Guest allocates PASID
> using virtual command interface which gets a host PASID. Then PASID
> cache invalidation in the guest will result in bind_gpasid(), @gpasid is
> not valid in the bind data (indicated by the IOMMU_SVA_GPASID_VAL flag).
> 
> For option 2, guest still uses virtual command to allocate guest pasid,
> but this time QEMU does the allocation for gpasid, at the same time
> QEMU will allocate a host pasid then maintain a G->H PASID lookup.
> When guest invalidate its PASID cache with GPASID, QEMU will find the
> match host PASID then pass both gpasid and hpasid down to the host IOMMU
> driver.
> Host IOMMU driver will store the gpgd at the hpasid entry but keep
> track of the gpasid->hpasid mapping. Host will never program gpasid in
> the IOMMU HW. Host IOMMU driver provides G->H PASID translation for PF
> device drivers that emulates mdev config space, i.e. virtual device
> composition module
> (https://events.linuxfoundation.org/wp-content/uploads/2017/12/Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf).
> 
> These two options is a per VM choice. Hopefully the two diagrams below
> can help to explain. I will put them in the next patch headers.

Thanks for the explanation, makes sense to me now. So the host kernel
needs to know G->H because the guest may write GPASID into the config
space emulated by the host device driver, and device driver then
retrieves the HPASID via an iommu_ops callback? But the device driver
keeps track of aux domains so isn't HPASID retrievable with
aux_get_pasid() already?

> 
> Option 1. Identity G-H PASID mapping diagram.
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process mm, FL only |
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'\                      |
>     |             | \                     |
>     |             |  \                    |
>     '-------------'   \________________   |
>                         GPASID = HPASID   |
> Guest                  ^      ^           |
> ------| Shadow |-------| VCMD |-----------|------------
>       v        v       |      |           |
> QEMU                   v      v           |
> ------------------------------------------|------------
> Host             HPASID = ioasid_alloc()  |
>                     |                     v
>                     |       sva_bind_gpasid(HPASID)
>                     |
>     .-------------. |  .----------------------.
>     |   pIOMMU    | |  | Bind FL for GVA-GPA  |
>     |             | | /'----------------------'
>     .----------------'  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------..---------------------.
>     |             |   |Set SL to GPA-HPA    |
>     |             |   '---------------------'
>     '-------------'
> 
> 
> 
> Option 2. Non-identity G-H PASID mapping diagram.
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process mm, FL only |
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'\                      | .-------------.
>     |             | \                     | |Guest driver |
>     |             |  \                    | |writes GPASID|
>     '-------------'   \________________   | '-------------'
>                         GPASID            |             |
> Guest                  ^      ^           |             |
> ------| Shadow |-------| VCMD |-----------|------------ |
>       v        v       |      |           |             |
> QEMU                   v      v           |             |
> 	    GPASID = qemu_gpasid_alloc()  |             |
>             keep G->H PASID lookup        |             |
>                    ^                      v             |
> 		   |                 lookup G->H PASID  |
> -------------------|----------------------|------------ |
> Host             HPASID = ioasid_alloc()  |             |
>                     |                     v             |
>                     |     sva_bind_gpasid(HPASID,GPASID)|
>                     |     keep H-G PASID lookup         |
>                     |                          \  -------------------.
>     .-------------. |  .----------------------. \|    VDCM           |
>     |   pIOMMU    | |  | Bind FL for GVA-GPA  |  | H = lookup(GPASID)|
>     |             | | /'----------------------'  | write H to dev    |
>     .----------------'  |                         '------------------'
>     | PASID Entry |     V (Nested xlate)
>     '----------------..---------------------.
>     |             |   |Set SL to GPA-HPA    |
>     |             |   '---------------------'
>     '-------------'
> There is also implications in G-H pasid lookup for PRQ, that would be
> in the later series.
> 
>>>  * @addr_width:	Guest address width. Paging mode can also be
>>> derived.  
>>
>> What does the last sentence mean? @addr_width should probably be in
>> @vtd if it provides implicit information.
>>
> Derive 4 or 5 level paging mode from the address width. It can be in
> @vtd but i thought this can be generic.

Yes I think it's generic enough. It may be worth stating that this is
the *virtual* address width, and removing or clarifying what the paging
mode is (the sentence could be confusing on Arm, as we have different
page granules which cannot be derived from the address width)

Thanks,
Jean

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function
  2019-05-22 15:05             ` Jean-Philippe Brucker
@ 2019-05-22 17:15               ` Jacob Pan
  0 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-22 17:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Tian, Kevin, Raj Ashok, Andriy Shevchenko,
	jacob.jun.pan

On Wed, 22 May 2019 16:05:53 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 21/05/2019 23:50, Jacob Pan wrote:
> >>> /**
> >>>  * struct gpasid_bind_data - Information about device and guest
> >>> PASID binding
> >>>  * @version:	Version of this data structure
> >>>  * @format:	PASID table entry format
> >>>  * @flags:	Additional information on guest bind request
> >>>  * @gpgd:	Guest page directory base of the guest mm to bind
> >>>  * @hpasid:	Process address space ID used for the guest mm
> >>> in host IOMMU
> >>>  * @gpasid:	Process address space ID used for the guest mm
> >>> in guest IOMMU    
> >>
> >> Trying to understand the full flow:
> >> * @gpasid is the one allocated by the guest using a virtual
> >> command. The guest writes @gpgd into the virtual PASID table at
> >> index @gpasid, then sends an invalidate command to QEMU.  
> > yes  
> >> * QEMU issues a gpasid_bind ioctl (on the mdev or its container?).
> >> VFIO forwards. The IOMMU driver installs @gpgd into the PASID table
> >> using @hpasid, which is associated with the auxiliary domain.
> >>
> >> But why do we need the @hpasid field here? Does userspace know
> >> about it at all, and does VFIO need to pass it to the IOMMU driver?
> >>  
> > We need to support two guest-host PASID mappings through this API.
> > Idea comes from Kevin & Yi.
> > 1. identity mapping between host and guest PASID
> > 2. guest owns its own pasid space
> > 
> > For option 1, which will plan to support first in this series.
> > There is no need for gpasid field since gpasid=hpasid. Guest
> > allocates PASID using virtual command interface which gets a host
> > PASID. Then PASID cache invalidation in the guest will result in
> > bind_gpasid(), @gpasid is not valid in the bind data (indicated by
> > the IOMMU_SVA_GPASID_VAL flag).
> > 
> > For option 2, guest still uses virtual command to allocate guest
> > pasid, but this time QEMU does the allocation for gpasid, at the
> > same time QEMU will allocate a host pasid then maintain a G->H
> > PASID lookup. When guest invalidate its PASID cache with GPASID,
> > QEMU will find the match host PASID then pass both gpasid and
> > hpasid down to the host IOMMU driver.
> > Host IOMMU driver will store the gpgd at the hpasid entry but keep
> > track of the gpasid->hpasid mapping. Host will never program gpasid
> > in the IOMMU HW. Host IOMMU driver provides G->H PASID translation
> > for PF device drivers that emulates mdev config space, i.e. virtual
> > device composition module
> > (https://events.linuxfoundation.org/wp-content/uploads/2017/12/Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf).
> > 
> > These two options is a per VM choice. Hopefully the two diagrams
> > below can help to explain. I will put them in the next patch
> > headers.  
> 
> Thanks for the explanation, makes sense to me now. So the host kernel
> needs to know G->H because the guest may write GPASID into the config
> space emulated by the host device driver, and device driver then
> retrieves the HPASID via an iommu_ops callback? But the device driver
> keeps track of aux domains so isn't HPASID retrievable with
> aux_get_pasid() already?
> 
aux_get_pasid() will get domain's default pasid, which is used for
non-svm traffic on mdev. Here the gpasid bind is for svm only.
> > 
> > Option 1. Identity G-H PASID mapping diagram.
> > 
> >     .-------------.  .---------------------------.
> >     |   vIOMMU    |  | Guest process mm, FL only |
> >     |             |  '---------------------------'
> >     .----------------/
> >     | PASID Entry |--- PASID cache flush -
> >     '-------------'\                      |
> >     |             | \                     |
> >     |             |  \                    |
> >     '-------------'   \________________   |
> >                         GPASID = HPASID   |
> > Guest                  ^      ^           |
> > ------| Shadow |-------| VCMD |-----------|------------
> >       v        v       |      |           |
> > QEMU                   v      v           |
> > ------------------------------------------|------------
> > Host             HPASID = ioasid_alloc()  |
> >                     |                     v
> >                     |       sva_bind_gpasid(HPASID)
> >                     |
> >     .-------------. |  .----------------------.
> >     |   pIOMMU    | |  | Bind FL for GVA-GPA  |
> >     |             | | /'----------------------'
> >     .----------------'  |
> >     | PASID Entry |     V (Nested xlate)
> >     '----------------..---------------------.
> >     |             |   |Set SL to GPA-HPA    |
> >     |             |   '---------------------'
> >     '-------------'
> > 
> > 
> > 
> > Option 2. Non-identity G-H PASID mapping diagram.
> > 
> >     .-------------.  .---------------------------.
> >     |   vIOMMU    |  | Guest process mm, FL only |
> >     |             |  '---------------------------'
> >     .----------------/
> >     | PASID Entry |--- PASID cache flush -
> >     '-------------'\                      | .-------------.
> >     |             | \                     | |Guest driver |
> >     |             |  \                    | |writes GPASID|
> >     '-------------'   \________________   | '-------------'
> >                         GPASID            |             |
> > Guest                  ^      ^           |             |
> > ------| Shadow |-------| VCMD |-----------|------------ |
> >       v        v       |      |           |             |
> > QEMU                   v      v           |             |
> > 	    GPASID = qemu_gpasid_alloc()  |             |
> >             keep G->H PASID lookup        |             |
> >                    ^                      v             |
> > 		   |                 lookup G->H PASID  |
> > -------------------|----------------------|------------ |
> > Host             HPASID = ioasid_alloc()  |             |
> >                     |                     v             |
> >                     |     sva_bind_gpasid(HPASID,GPASID)|
> >                     |     keep H-G PASID lookup         |
> >                     |                          \
> > -------------------. .-------------. |  .----------------------.
> > \|    VDCM           | |   pIOMMU    | |  | Bind FL for GVA-GPA  |
> > | H = lookup(GPASID)| |             | | /'----------------------'
> > | write H to dev    | .----------------'  |
> > '------------------' | PASID Entry |     V (Nested xlate)
> >     '----------------..---------------------.
> >     |             |   |Set SL to GPA-HPA    |
> >     |             |   '---------------------'
> >     '-------------'
> > There is also implications in G-H pasid lookup for PRQ, that would
> > be in the later series.
> >   
> >>>  * @addr_width:	Guest address width. Paging mode can also
> >>> be derived.    
> >>
> >> What does the last sentence mean? @addr_width should probably be in
> >> @vtd if it provides implicit information.
> >>  
> > Derive 4 or 5 level paging mode from the address width. It can be in
> > @vtd but i thought this can be generic.  
> 
> Yes I think it's generic enough. It may be worth stating that this is
> the *virtual* address width, and removing or clarifying what the
> paging mode is (the sentence could be confusing on Arm, as we have
> different page granules which cannot be derived from the address
> width)
>
OK, will keep addr_width as a generic field, then remove the paging
mode comment.

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 04/16] ioasid: Add custom IOASID allocator
  2019-05-21  9:55   ` Auger Eric
@ 2019-05-22 19:42     ` Jacob Pan
  2019-05-23  7:14       ` Auger Eric
  0 siblings, 1 reply; 52+ messages in thread
From: Jacob Pan @ 2019-05-22 19:42 UTC (permalink / raw)
  To: Auger Eric
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, jacob.jun.pan

On Tue, 21 May 2019 11:55:55 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Jacob,
> 
> On 5/4/19 12:32 AM, Jacob Pan wrote:
> > Sometimes, IOASID allocation must be handled by platform specific
> > code. The use cases are guest vIOMMU and pvIOMMU where IOASIDs need
> > to be allocated by the host via enlightened or paravirt interfaces.
> > 
> > This patch adds an extension to the IOASID allocator APIs such that
> > platform drivers can register a custom allocator, possibly at boot
> > time, to take over the allocation. Xarray is still used for tracking
> > and searching purposes internal to the IOASID code. Private data of
> > an IOASID can also be set after the allocation.
> > 
> > There can be multiple custom allocators registered but only one is
> > used at a time. In case of hot removal of devices that provides the
> > allocator, all IOASIDs must be freed prior to unregistering the
> > allocator. Default XArray based allocator cannot be mixed with
> > custom allocators, i.e. custom allocators will not be used if there
> > are outstanding IOASIDs allocated by the default XA allocator.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  drivers/iommu/ioasid.c | 125
> > +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed,
> > 125 insertions(+)
> > 
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > index 99f5e0a..ed2915a 100644
> > --- a/drivers/iommu/ioasid.c
> > +++ b/drivers/iommu/ioasid.c
> > @@ -17,6 +17,100 @@ struct ioasid_data {
> >  };
> >  
> >  static DEFINE_XARRAY_ALLOC(ioasid_xa);
> > +static DEFINE_MUTEX(ioasid_allocator_lock);
> > +static struct ioasid_allocator *active_custom_allocator;
> > +
> > +static LIST_HEAD(custom_allocators);
> > +/*
> > + * A flag to track if ioasid default allocator is in use, this will
> > + * prevent custom allocator from being used. The reason is that
> > custom allocator
> > + * must have unadulterated space to track private data with
> > xarray, there cannot
> > + * be a mix been default and custom allocated IOASIDs.
> > + */
> > +static int default_allocator_active;
> > +
> > +/**
> > + * ioasid_register_allocator - register a custom allocator
> > + * @allocator: the custom allocator to be registered
> > + *
> > + * Custom allocators take precedence over the default xarray based
> > allocator.
> > + * Private data associated with the ASID are managed by ASID
> > common code
> > + * similar to data stored in xa.
> > + *
> > + * There can be multiple allocators registered but only one is
> > active. In case
> > + * of runtime removal of a custom allocator, the next one is
> > activated based
> > + * on the registration ordering.
> > + */
> > +int ioasid_register_allocator(struct ioasid_allocator *allocator)
> > +{
> > +	struct ioasid_allocator *pallocator;
> > +	int ret = 0;
> > +
> > +	if (!allocator)
> > +		return -EINVAL;  
> is it really necessary? Sin't it the caller responsibility?
makes sense. will remove this one and below.
> > +
> > +	mutex_lock(&ioasid_allocator_lock);
> > +	/*
> > +	 * No particular preference since all custom allocators
> > end up calling
> > +	 * the host to allocate IOASIDs. We activate the first one
> > and keep
> > +	 * the later registered allocators in a list in case the
> > first one gets
> > +	 * removed due to hotplug.
> > +	 */
> > +	if (list_empty(&custom_allocators))
> > +		active_custom_allocator = allocator;> +
> > else {
> > +		/* Check if the allocator is already registered */
> > +		list_for_each_entry(pallocator,
> > &custom_allocators, list) {
> > +			if (pallocator == allocator) {
> > +				pr_err("IOASID allocator already
> > registered\n");
> > +				ret = -EEXIST;
> > +				goto out_unlock;
> > +			}
> > +		}
> > +	}
> > +	list_add_tail(&allocator->list, &custom_allocators);
> > +
> > +out_unlock:
> > +	mutex_unlock(&ioasid_allocator_lock);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_register_allocator);
> > +
> > +/**
> > + * ioasid_unregister_allocator - Remove a custom IOASID allocator
> > + * @allocator: the custom allocator to be removed
> > + *
> > + * Remove an allocator from the list, activate the next allocator
> > in
> > + * the order it was registered.
> > + */
> > +void ioasid_unregister_allocator(struct ioasid_allocator
> > *allocator) +{
> > +	if (!allocator)
> > +		return;  
> is it really necessary?
> > +
> > +	if (list_empty(&custom_allocators)) {
> > +		pr_warn("No custom IOASID allocators active!\n");
> > +		return;
> > +	}
> > +
> > +	mutex_lock(&ioasid_allocator_lock);
> > +	list_del(&allocator->list);
> > +	if (list_empty(&custom_allocators)) {
> > +		pr_info("No custom IOASID allocators\n")>
> > +		/*
> > +		 * All IOASIDs should have been freed before the
> > last custom
> > +		 * allocator is unregistered. Unless default
> > allocator is in
> > +		 * use.
> > +		 */
> > +		BUG_ON(!xa_empty(&ioasid_xa)
> > && !default_allocator_active);
> > +		active_custom_allocator = NULL;
> > +	} else if (allocator == active_custom_allocator) {  
> In case you are removing the active custom allocator don't you also
> need to check that all ioasids were freed. Otherwise you are likely
> to switch to a different allocator whereas the asid space is
> partially populated.
The assumption is that all custom allocators on the same guest will end
up calling the same host allocator. Having multiple custom allocators in
the list is just a way to support multiple (p)vIOMMUs with hotplug.
Therefore, we cannot nor need to free all PASIDs when one custom
allocator goes away. This is a different situation then switching
between default allocator and custom allocator, where custom allocator
has to start with a clean space.

 
> > +		active_custom_allocator =
> > list_entry(&custom_allocators, struct ioasid_allocator, list);
> > +		pr_info("IOASID allocator changed");
> > +	}
> > +	mutex_unlock(&ioasid_allocator_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);
> >  
> >  /**
> >   * ioasid_set_data - Set private data for an allocated ioasid
> > @@ -68,6 +162,29 @@ ioasid_t ioasid_alloc(struct ioasid_set *set,
> > ioasid_t min, ioasid_t max, data->set = set;
> >  	data->private = private;
> >  
> > +	mutex_lock(&ioasid_allocator_lock);
> > +	/*
> > +	 * Use custom allocator if available, otherwise use
> > default.
> > +	 * However, if there are active IOASIDs already been
> > allocated by default
> > +	 * allocator, custom allocator cannot be used.
> > +	 */
> > +	if (!default_allocator_active && active_custom_allocator) {
> > +		id = active_custom_allocator->alloc(min, max,
> > active_custom_allocator->pdata);
> > +		if (id == INVALID_IOASID) {
> > +			pr_err("Failed ASID allocation by custom
> > allocator\n");
> > +			mutex_unlock(&ioasid_allocator_lock);
> > +			goto exit_free;
> > +		}
> > +		/*
> > +		 * Use XA to manage private data also sanitiy
> > check custom
> > +		 * allocator for duplicates.
> > +		 */
> > +		min = id;
> > +		max = id + 1;
> > +	} else
> > +		default_allocator_active = 1;  
> nit: true?
yes, i can turn default_allocator_active into a bool type.

> > +	mutex_unlock(&ioasid_allocator_lock);
> > +
> >  	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max),
> > GFP_KERNEL)) { pr_err("Failed to alloc ioasid from %d to %d\n",
> > min, max); goto exit_free;> @@ -91,9 +208,17 @@ void
> > ioasid_free(ioasid_t ioasid) {
> >  	struct ioasid_data *ioasid_data;
> >  
> > +	mutex_lock(&ioasid_allocator_lock);
> > +	if (active_custom_allocator)
> > +		active_custom_allocator->free(ioasid,
> > active_custom_allocator->pdata);
> > +	mutex_unlock(&ioasid_allocator_lock);
> > +
> >  	ioasid_data = xa_erase(&ioasid_xa, ioasid);
> >  
> >  	kfree_rcu(ioasid_data, rcu);
> > +
> > +	if (xa_empty(&ioasid_xa))
> > +		default_allocator_active = 0;  
> Isn't it racy? what if an xa_alloc occurs inbetween?
> 
> 
yes, i will move it under the mutex. Thanks.
> >  }
> >  EXPORT_SYMBOL_GPL(ioasid_free);
> >  
> >   
> 
> Thanks
> 
> Eric

[Jacob Pan]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 04/16] ioasid: Add custom IOASID allocator
  2019-05-22 19:42     ` Jacob Pan
@ 2019-05-23  7:14       ` Auger Eric
  2019-05-23 15:40         ` Jacob Pan
  0 siblings, 1 reply; 52+ messages in thread
From: Auger Eric @ 2019-05-23  7:14 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko

Hi Jacob,

On 5/22/19 9:42 PM, Jacob Pan wrote:
> On Tue, 21 May 2019 11:55:55 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Jacob,
>>
>> On 5/4/19 12:32 AM, Jacob Pan wrote:
>>> Sometimes, IOASID allocation must be handled by platform specific
>>> code. The use cases are guest vIOMMU and pvIOMMU where IOASIDs need
>>> to be allocated by the host via enlightened or paravirt interfaces.
>>>
>>> This patch adds an extension to the IOASID allocator APIs such that
>>> platform drivers can register a custom allocator, possibly at boot
>>> time, to take over the allocation. Xarray is still used for tracking
>>> and searching purposes internal to the IOASID code. Private data of
>>> an IOASID can also be set after the allocation.
>>>
>>> There can be multiple custom allocators registered but only one is
>>> used at a time. In case of hot removal of devices that provides the
>>> allocator, all IOASIDs must be freed prior to unregistering the
>>> allocator. Default XArray based allocator cannot be mixed with
>>> custom allocators, i.e. custom allocators will not be used if there
>>> are outstanding IOASIDs allocated by the default XA allocator.
>>>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> ---
>>>  drivers/iommu/ioasid.c | 125
>>> +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed,
>>> 125 insertions(+)
>>>
>>> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
>>> index 99f5e0a..ed2915a 100644
>>> --- a/drivers/iommu/ioasid.c
>>> +++ b/drivers/iommu/ioasid.c
>>> @@ -17,6 +17,100 @@ struct ioasid_data {
>>>  };
>>>  
>>>  static DEFINE_XARRAY_ALLOC(ioasid_xa);
>>> +static DEFINE_MUTEX(ioasid_allocator_lock);
>>> +static struct ioasid_allocator *active_custom_allocator;
>>> +
>>> +static LIST_HEAD(custom_allocators);
>>> +/*
>>> + * A flag to track if ioasid default allocator is in use, this will
>>> + * prevent custom allocator from being used. The reason is that
>>> custom allocator
>>> + * must have unadulterated space to track private data with
>>> xarray, there cannot
>>> + * be a mix been default and custom allocated IOASIDs.
>>> + */
>>> +static int default_allocator_active;
>>> +
>>> +/**
>>> + * ioasid_register_allocator - register a custom allocator
>>> + * @allocator: the custom allocator to be registered
>>> + *
>>> + * Custom allocators take precedence over the default xarray based
>>> allocator.
>>> + * Private data associated with the ASID are managed by ASID
>>> common code
>>> + * similar to data stored in xa.
>>> + *
>>> + * There can be multiple allocators registered but only one is
>>> active. In case
>>> + * of runtime removal of a custom allocator, the next one is
>>> activated based
>>> + * on the registration ordering.
>>> + */
>>> +int ioasid_register_allocator(struct ioasid_allocator *allocator)
>>> +{
>>> +	struct ioasid_allocator *pallocator;
>>> +	int ret = 0;
>>> +
>>> +	if (!allocator)
>>> +		return -EINVAL;  
>> is it really necessary? Sin't it the caller responsibility?
> makes sense. will remove this one and below.
>>> +
>>> +	mutex_lock(&ioasid_allocator_lock);
>>> +	/*
>>> +	 * No particular preference since all custom allocators
>>> end up calling
>>> +	 * the host to allocate IOASIDs. We activate the first one
>>> and keep
>>> +	 * the later registered allocators in a list in case the
>>> first one gets
>>> +	 * removed due to hotplug.
>>> +	 */
>>> +	if (list_empty(&custom_allocators))
>>> +		active_custom_allocator = allocator;> +
>>> else {
>>> +		/* Check if the allocator is already registered */
>>> +		list_for_each_entry(pallocator,
>>> &custom_allocators, list) {
>>> +			if (pallocator == allocator) {
>>> +				pr_err("IOASID allocator already
>>> registered\n");
>>> +				ret = -EEXIST;
>>> +				goto out_unlock;
>>> +			}
>>> +		}
>>> +	}
>>> +	list_add_tail(&allocator->list, &custom_allocators);
>>> +
>>> +out_unlock:
>>> +	mutex_unlock(&ioasid_allocator_lock);
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(ioasid_register_allocator);
>>> +
>>> +/**
>>> + * ioasid_unregister_allocator - Remove a custom IOASID allocator
>>> + * @allocator: the custom allocator to be removed
>>> + *
>>> + * Remove an allocator from the list, activate the next allocator
>>> in
>>> + * the order it was registered.
>>> + */
>>> +void ioasid_unregister_allocator(struct ioasid_allocator
>>> *allocator) +{
>>> +	if (!allocator)
>>> +		return;  
>> is it really necessary?
>>> +
>>> +	if (list_empty(&custom_allocators)) {
>>> +		pr_warn("No custom IOASID allocators active!\n");
>>> +		return;
>>> +	}
>>> +
>>> +	mutex_lock(&ioasid_allocator_lock);
>>> +	list_del(&allocator->list);
>>> +	if (list_empty(&custom_allocators)) {
>>> +		pr_info("No custom IOASID allocators\n")>
>>> +		/*
>>> +		 * All IOASIDs should have been freed before the
>>> last custom
>>> +		 * allocator is unregistered. Unless default
>>> allocator is in
>>> +		 * use.
>>> +		 */
>>> +		BUG_ON(!xa_empty(&ioasid_xa)
>>> && !default_allocator_active);
>>> +		active_custom_allocator = NULL;
>>> +	} else if (allocator == active_custom_allocator) {  
>> In case you are removing the active custom allocator don't you also
>> need to check that all ioasids were freed. Otherwise you are likely
>> to switch to a different allocator whereas the asid space is
>> partially populated.
> The assumption is that all custom allocators on the same guest will end
> up calling the same host allocator. Having multiple custom allocators in
> the list is just a way to support multiple (p)vIOMMUs with hotplug.
> Therefore, we cannot nor need to free all PASIDs when one custom
> allocator goes away. This is a different situation then switching
> between default allocator and custom allocator, where custom allocator
> has to start with a clean space.
Although I understand your specific usecase, this framework may have
other users, where custom allocators behave differently.

Also the commit msg says:"
In case of hot removal of devices that provides the
allocator, all IOASIDs must be freed prior to unregistering the
allocator."

Thanks

Eric
> 
>  
>>> +		active_custom_allocator =
>>> list_entry(&custom_allocators, struct ioasid_allocator, list);
>>> +		pr_info("IOASID allocator changed");
>>> +	}
>>> +	mutex_unlock(&ioasid_allocator_lock);
>>> +}
>>> +EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);
>>>  
>>>  /**
>>>   * ioasid_set_data - Set private data for an allocated ioasid
>>> @@ -68,6 +162,29 @@ ioasid_t ioasid_alloc(struct ioasid_set *set,
>>> ioasid_t min, ioasid_t max, data->set = set;
>>>  	data->private = private;
>>>  
>>> +	mutex_lock(&ioasid_allocator_lock);
>>> +	/*
>>> +	 * Use custom allocator if available, otherwise use
>>> default.
>>> +	 * However, if there are active IOASIDs already been
>>> allocated by default
>>> +	 * allocator, custom allocator cannot be used.
>>> +	 */
>>> +	if (!default_allocator_active && active_custom_allocator) {
>>> +		id = active_custom_allocator->alloc(min, max,
>>> active_custom_allocator->pdata);
>>> +		if (id == INVALID_IOASID) {
>>> +			pr_err("Failed ASID allocation by custom
>>> allocator\n");
>>> +			mutex_unlock(&ioasid_allocator_lock);
>>> +			goto exit_free;
>>> +		}
>>> +		/*
>>> +		 * Use XA to manage private data also sanitiy
>>> check custom
>>> +		 * allocator for duplicates.
>>> +		 */
>>> +		min = id;
>>> +		max = id + 1;
>>> +	} else
>>> +		default_allocator_active = 1;  
>> nit: true?
> yes, i can turn default_allocator_active into a bool type.
> 
>>> +	mutex_unlock(&ioasid_allocator_lock);
>>> +
>>>  	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max),
>>> GFP_KERNEL)) { pr_err("Failed to alloc ioasid from %d to %d\n",
>>> min, max); goto exit_free;> @@ -91,9 +208,17 @@ void
>>> ioasid_free(ioasid_t ioasid) {
>>>  	struct ioasid_data *ioasid_data;
>>>  
>>> +	mutex_lock(&ioasid_allocator_lock);
>>> +	if (active_custom_allocator)
>>> +		active_custom_allocator->free(ioasid,
>>> active_custom_allocator->pdata);
>>> +	mutex_unlock(&ioasid_allocator_lock);
>>> +
>>>  	ioasid_data = xa_erase(&ioasid_xa, ioasid);
>>>  
>>>  	kfree_rcu(ioasid_data, rcu);
>>> +
>>> +	if (xa_empty(&ioasid_xa))
>>> +		default_allocator_active = 0;  
>> Isn't it racy? what if an xa_alloc occurs inbetween?
>>
>>
> yes, i will move it under the mutex. Thanks.
>>>  }
>>>  EXPORT_SYMBOL_GPL(ioasid_free);
>>>  
>>>   
>>
>> Thanks
>>
>> Eric
> 
> [Jacob Pan]
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 04/16] ioasid: Add custom IOASID allocator
  2019-05-23  7:14       ` Auger Eric
@ 2019-05-23 15:40         ` Jacob Pan
  0 siblings, 0 replies; 52+ messages in thread
From: Jacob Pan @ 2019-05-23 15:40 UTC (permalink / raw)
  To: Auger Eric
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, jacob.jun.pan

On Thu, 23 May 2019 09:14:07 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Jacob,
> 
> On 5/22/19 9:42 PM, Jacob Pan wrote:
> > On Tue, 21 May 2019 11:55:55 +0200
> > Auger Eric <eric.auger@redhat.com> wrote:
> >   
> >> Hi Jacob,
> >>
> >> On 5/4/19 12:32 AM, Jacob Pan wrote:  
> >>> Sometimes, IOASID allocation must be handled by platform specific
> >>> code. The use cases are guest vIOMMU and pvIOMMU where IOASIDs
> >>> need to be allocated by the host via enlightened or paravirt
> >>> interfaces.
> >>>
> >>> This patch adds an extension to the IOASID allocator APIs such
> >>> that platform drivers can register a custom allocator, possibly
> >>> at boot time, to take over the allocation. Xarray is still used
> >>> for tracking and searching purposes internal to the IOASID code.
> >>> Private data of an IOASID can also be set after the allocation.
> >>>
> >>> There can be multiple custom allocators registered but only one is
> >>> used at a time. In case of hot removal of devices that provides
> >>> the allocator, all IOASIDs must be freed prior to unregistering
> >>> the allocator. Default XArray based allocator cannot be mixed with
> >>> custom allocators, i.e. custom allocators will not be used if
> >>> there are outstanding IOASIDs allocated by the default XA
> >>> allocator.
> >>>
> >>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >>> ---
> >>>  drivers/iommu/ioasid.c | 125
> >>> +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed,
> >>> 125 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> >>> index 99f5e0a..ed2915a 100644
> >>> --- a/drivers/iommu/ioasid.c
> >>> +++ b/drivers/iommu/ioasid.c
> >>> @@ -17,6 +17,100 @@ struct ioasid_data {
> >>>  };
> >>>  
> >>>  static DEFINE_XARRAY_ALLOC(ioasid_xa);
> >>> +static DEFINE_MUTEX(ioasid_allocator_lock);
> >>> +static struct ioasid_allocator *active_custom_allocator;
> >>> +
> >>> +static LIST_HEAD(custom_allocators);
> >>> +/*
> >>> + * A flag to track if ioasid default allocator is in use, this
> >>> will
> >>> + * prevent custom allocator from being used. The reason is that
> >>> custom allocator
> >>> + * must have unadulterated space to track private data with
> >>> xarray, there cannot
> >>> + * be a mix been default and custom allocated IOASIDs.
> >>> + */
> >>> +static int default_allocator_active;
> >>> +
> >>> +/**
> >>> + * ioasid_register_allocator - register a custom allocator
> >>> + * @allocator: the custom allocator to be registered
> >>> + *
> >>> + * Custom allocators take precedence over the default xarray
> >>> based allocator.
> >>> + * Private data associated with the ASID are managed by ASID
> >>> common code
> >>> + * similar to data stored in xa.
> >>> + *
> >>> + * There can be multiple allocators registered but only one is
> >>> active. In case
> >>> + * of runtime removal of a custom allocator, the next one is
> >>> activated based
> >>> + * on the registration ordering.
> >>> + */
> >>> +int ioasid_register_allocator(struct ioasid_allocator *allocator)
> >>> +{
> >>> +	struct ioasid_allocator *pallocator;
> >>> +	int ret = 0;
> >>> +
> >>> +	if (!allocator)
> >>> +		return -EINVAL;    
> >> is it really necessary? Sin't it the caller responsibility?  
> > makes sense. will remove this one and below.  
> >>> +
> >>> +	mutex_lock(&ioasid_allocator_lock);
> >>> +	/*
> >>> +	 * No particular preference since all custom allocators
> >>> end up calling
> >>> +	 * the host to allocate IOASIDs. We activate the first
> >>> one and keep
> >>> +	 * the later registered allocators in a list in case the
> >>> first one gets
> >>> +	 * removed due to hotplug.
> >>> +	 */
> >>> +	if (list_empty(&custom_allocators))
> >>> +		active_custom_allocator = allocator;> +
> >>> else {
> >>> +		/* Check if the allocator is already registered
> >>> */
> >>> +		list_for_each_entry(pallocator,
> >>> &custom_allocators, list) {
> >>> +			if (pallocator == allocator) {
> >>> +				pr_err("IOASID allocator already
> >>> registered\n");
> >>> +				ret = -EEXIST;
> >>> +				goto out_unlock;
> >>> +			}
> >>> +		}
> >>> +	}
> >>> +	list_add_tail(&allocator->list, &custom_allocators);
> >>> +
> >>> +out_unlock:
> >>> +	mutex_unlock(&ioasid_allocator_lock);
> >>> +	return ret;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(ioasid_register_allocator);
> >>> +
> >>> +/**
> >>> + * ioasid_unregister_allocator - Remove a custom IOASID allocator
> >>> + * @allocator: the custom allocator to be removed
> >>> + *
> >>> + * Remove an allocator from the list, activate the next allocator
> >>> in
> >>> + * the order it was registered.
> >>> + */
> >>> +void ioasid_unregister_allocator(struct ioasid_allocator
> >>> *allocator) +{
> >>> +	if (!allocator)
> >>> +		return;    
> >> is it really necessary?  
> >>> +
> >>> +	if (list_empty(&custom_allocators)) {
> >>> +		pr_warn("No custom IOASID allocators active!\n");
> >>> +		return;
> >>> +	}
> >>> +
> >>> +	mutex_lock(&ioasid_allocator_lock);
> >>> +	list_del(&allocator->list);
> >>> +	if (list_empty(&custom_allocators)) {
> >>> +		pr_info("No custom IOASID allocators\n")>
> >>> +		/*
> >>> +		 * All IOASIDs should have been freed before the
> >>> last custom
> >>> +		 * allocator is unregistered. Unless default
> >>> allocator is in
> >>> +		 * use.
> >>> +		 */
> >>> +		BUG_ON(!xa_empty(&ioasid_xa)
> >>> && !default_allocator_active);
> >>> +		active_custom_allocator = NULL;
> >>> +	} else if (allocator == active_custom_allocator) {    
> >> In case you are removing the active custom allocator don't you also
> >> need to check that all ioasids were freed. Otherwise you are likely
> >> to switch to a different allocator whereas the asid space is
> >> partially populated.  
> > The assumption is that all custom allocators on the same guest will
> > end up calling the same host allocator. Having multiple custom
> > allocators in the list is just a way to support multiple (p)vIOMMUs
> > with hotplug. Therefore, we cannot nor need to free all PASIDs when
> > one custom allocator goes away. This is a different situation then
> > switching between default allocator and custom allocator, where
> > custom allocator has to start with a clean space.  
> Although I understand your specific usecase, this framework may have
> other users, where custom allocators behave differently.
> 
> Also the commit msg says:"
> In case of hot removal of devices that provides the
> allocator, all IOASIDs must be freed prior to unregistering the
> allocator."
> 
Right, it is inconsistent.
Consider the following scenario on a single guest with two vIOMMUs:
1. vIOMMU1 register allocator A1 first
2. vIOMMU2 register allocator A2 stored to the allocator list
3. device belong to vIOMMU1 bind_sva(), allocate PASID1 from A1
4. device belong to vIOMMU2 bind_sva(), allocate PASID2 from A1
5. vIOMMU1 hot removed, free PASID1, then unregister A1
6. IOASID framework will try to free A1 then install A2 as the active
allocator but PASID2 is in use. It will be unnecessarily disruptive to
free PASID2

I can think of a solution:
 - Add a flag when registering ioasid custom allocator,
IOASID_ALLOC_RETAIN, which means when switching to another custom
allocator, all outstanding PASIDs will be retained. Of course it does
not include switching to default allocator which does not have this
RETAIN flag.

 - For the allocators do not have this flag, their PASIDs must be freed
upon unregistering.

Any thoughts?

Jacob

> Thanks
> 
> Eric
> > 
> >    
> >>> +		active_custom_allocator =
> >>> list_entry(&custom_allocators, struct ioasid_allocator, list);
> >>> +		pr_info("IOASID allocator changed");
> >>> +	}
> >>> +	mutex_unlock(&ioasid_allocator_lock);
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);
> >>>  
> >>>  /**
> >>>   * ioasid_set_data - Set private data for an allocated ioasid
> >>> @@ -68,6 +162,29 @@ ioasid_t ioasid_alloc(struct ioasid_set *set,
> >>> ioasid_t min, ioasid_t max, data->set = set;
> >>>  	data->private = private;
> >>>  
> >>> +	mutex_lock(&ioasid_allocator_lock);
> >>> +	/*
> >>> +	 * Use custom allocator if available, otherwise use
> >>> default.
> >>> +	 * However, if there are active IOASIDs already been
> >>> allocated by default
> >>> +	 * allocator, custom allocator cannot be used.
> >>> +	 */
> >>> +	if (!default_allocator_active &&
> >>> active_custom_allocator) {
> >>> +		id = active_custom_allocator->alloc(min, max,
> >>> active_custom_allocator->pdata);
> >>> +		if (id == INVALID_IOASID) {
> >>> +			pr_err("Failed ASID allocation by custom
> >>> allocator\n");
> >>> +			mutex_unlock(&ioasid_allocator_lock);
> >>> +			goto exit_free;
> >>> +		}
> >>> +		/*
> >>> +		 * Use XA to manage private data also sanitiy
> >>> check custom
> >>> +		 * allocator for duplicates.
> >>> +		 */
> >>> +		min = id;
> >>> +		max = id + 1;
> >>> +	} else
> >>> +		default_allocator_active = 1;    
> >> nit: true?  
> > yes, i can turn default_allocator_active into a bool type.
> >   
> >>> +	mutex_unlock(&ioasid_allocator_lock);
> >>> +
> >>>  	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max),
> >>> GFP_KERNEL)) { pr_err("Failed to alloc ioasid from %d to %d\n",
> >>> min, max); goto exit_free;> @@ -91,9 +208,17 @@ void
> >>> ioasid_free(ioasid_t ioasid) {
> >>>  	struct ioasid_data *ioasid_data;
> >>>  
> >>> +	mutex_lock(&ioasid_allocator_lock);
> >>> +	if (active_custom_allocator)
> >>> +		active_custom_allocator->free(ioasid,
> >>> active_custom_allocator->pdata);
> >>> +	mutex_unlock(&ioasid_allocator_lock);
> >>> +
> >>>  	ioasid_data = xa_erase(&ioasid_xa, ioasid);
> >>>  
> >>>  	kfree_rcu(ioasid_data, rcu);
> >>> +
> >>> +	if (xa_empty(&ioasid_xa))
> >>> +		default_allocator_active = 0;    
> >> Isn't it racy? what if an xa_alloc occurs inbetween?
> >>
> >>  
> > yes, i will move it under the mutex. Thanks.  
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(ioasid_free);
> >>>  
> >>>     
> >>
> >> Thanks
> >>
> >> Eric  
> > 
> > [Jacob Pan]
> >   

[Jacob Pan]

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, back to index

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-03 22:32 [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan
2019-05-03 22:32 ` [PATCH v3 01/16] iommu: Introduce attach/detach_pasid_table API Jacob Pan
2019-05-03 22:32 ` [PATCH v3 02/16] iommu: Introduce cache_invalidate API Jacob Pan
2019-05-13  9:14   ` Auger Eric
2019-05-13 11:20     ` Jean-Philippe Brucker
2019-05-13 16:50       ` Auger Eric
2019-05-13 17:09         ` Jean-Philippe Brucker
2019-05-13 22:16           ` Jacob Pan
2019-05-14  7:36             ` Auger Eric
2019-05-14 10:41               ` Jean-Philippe Brucker
2019-05-14 17:44                 ` Jacob Pan
2019-05-14 17:57                   ` Jacob Pan
2019-05-15 11:03                   ` Jean-Philippe Brucker
2019-05-15 14:47                     ` Tian, Kevin
2019-05-15 15:25                       ` Jean-Philippe Brucker
2019-05-14  7:46           ` Auger Eric
2019-05-14 10:42             ` Jean-Philippe Brucker
2019-05-14 11:02               ` Auger Eric
2019-05-14 17:55                 ` Jacob Pan
2019-05-15 15:52                   ` Jean-Philippe Brucker
2019-05-15 16:25                     ` Jacob Pan
2019-05-03 22:32 ` [PATCH v3 03/16] iommu: Add I/O ASID allocator Jacob Pan
2019-05-21  8:21   ` Auger Eric
2019-05-21 17:03     ` Jacob Pan
2019-05-22 12:19       ` Jean-Philippe Brucker
2019-05-21  9:41   ` Auger Eric
2019-05-21 17:05     ` Jacob Pan
2019-05-03 22:32 ` [PATCH v3 04/16] ioasid: Add custom IOASID allocator Jacob Pan
2019-05-21  9:55   ` Auger Eric
2019-05-22 19:42     ` Jacob Pan
2019-05-23  7:14       ` Auger Eric
2019-05-23 15:40         ` Jacob Pan
2019-05-03 22:32 ` [PATCH v3 05/16] iommu/vt-d: Enlightened PASID allocation Jacob Pan
2019-05-03 22:32 ` [PATCH v3 06/16] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
2019-05-03 22:32 ` [PATCH v3 07/16] iommu/vtd: Optimize tlb invalidation for vIOMMU Jacob Pan
2019-05-03 22:32 ` [PATCH v3 08/16] iommu/vt-d: Replace Intel specific PASID allocator with IOASID Jacob Pan
2019-05-03 22:32 ` [PATCH v3 09/16] iommu: Introduce guest PASID bind function Jacob Pan
2019-05-16 14:14   ` Jean-Philippe Brucker
2019-05-16 16:14     ` Jacob Pan
2019-05-20 19:22       ` Jacob Pan
2019-05-21 16:09         ` Jean-Philippe Brucker
2019-05-21 22:50           ` Jacob Pan
2019-05-22 15:05             ` Jean-Philippe Brucker
2019-05-22 17:15               ` Jacob Pan
2019-05-03 22:32 ` [PATCH v3 10/16] iommu/vt-d: Move domain helper to header Jacob Pan
2019-05-03 22:32 ` [PATCH v3 11/16] iommu/vt-d: Avoid duplicated code for PASID setup Jacob Pan
2019-05-03 22:32 ` [PATCH v3 12/16] iommu/vt-d: Add nested translation helper function Jacob Pan
2019-05-03 22:32 ` [PATCH v3 13/16] iommu/vt-d: Clean up for SVM device list Jacob Pan
2019-05-03 22:32 ` [PATCH v3 14/16] iommu/vt-d: Add bind guest PASID support Jacob Pan
2019-05-03 22:32 ` [PATCH v3 15/16] iommu/vt-d: Support flushing more translation cache types Jacob Pan
2019-05-03 22:32 ` [PATCH v3 16/16] iommu/vt-d: Add svm/sva invalidate function Jacob Pan
2019-05-15 16:31 ` [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support Jacob Pan

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lkml.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lkml.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lkml.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lkml.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lkml.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lkml.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lkml.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lkml.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lkml.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox