LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier()
@ 2021-11-15  2:05 Lu Baolu
  2021-11-15  2:05 ` [PATCH 01/11] iommu: Add device dma ownership set/release interfaces Lu Baolu
                   ` (10 more replies)
  0 siblings, 11 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

Hi folks,

The iommu group is the minimal isolation boundary for DMA. Devices in
a group can access each other's MMIO registers via peer to peer DMA
and also need share the same I/O address space.

Once the I/O address space is assigned to user control it is no longer
available to the dma_map* API, which effectively makes the DMA API
non-working.

Second, userspace can use DMA initiated by a device that it controls
to access the MMIO spaces of other devices in the group. This allows
userspace to indirectly attack any kernel owned device and it's driver.

Therefore groups must either be entirely under kernel control or
userspace control, never a mixture. Unfortunately some systems have
problems with the granularity of groups and there are a couple of
important exceptions:

 - pci_stub allows the admin to block driver binding on a device and
   make it permanently shared with userspace. Since PCI stub does not
   do DMA it is safe, however the admin must understand that using
   pci_stub allows userspace to attack whatever device it was bound
   it.

 - PCI bridges are sometimes included in groups. Typically PCI bridges
   do not use DMA, and generally do not have MMIO regions.

Generally any device that does not have any MMIO registers is a
possible candidate for an exception.

Currently vfio adopts a workaround to detect violations of the above
restrictions by monitoring the driver core BOUND event, and hardwiring
the above exceptions. Since there is no way for vfio to reject driver
binding at this point, BUG_ON() is triggered if a violation is
captured (kernel driver BOUND event on a group which already has some
devices assigned to userspace). Aside from the bad user experience
this opens a way for root userspace to crash the kernel, even in high
integrity configurations, by manipulating the module binding and
triggering the BUG_ON.

This series solves this problem by making the user/kernel ownership a
core concept at the IOMMU layer. The driver core enforces kernel
ownership while drivers are bound and violations now result in a error
codes during probe, not BUG_ON failures.

Patch partitions:
  [PATCH 1-2]: Detect DMA ownership conflicts during driver binding;
  [PATCH 3-6]: Add security context management for assigned devices;
  [PATCH 7-11]: Various cleanups.

Ideas contributed by:
  Jason Gunthorpe <jgg@nvidia.com>
  Kevin Tian <kevin.tian@intel.com>
  Ashok Raj <ashok.raj@intel.com>
  Lu Baolu <baolu.lu@linux.intel.com>

Review contributors:
  Jason Gunthorpe <jgg@nvidia.com>
  Kevin Tian <kevin.tian@intel.com>
  Ashok Raj <ashok.raj@intel.com>
  Liu Yi L <yi.l.liu@intel.com>
  Jacob jun Pan <jacob.jun.pan@intel.com>
  Chaitanya Kulkarni <kch@nvidia.com>

This also is part one of three initial series for IOMMUFD:
 * Move IOMMU Group security into the iommu layer
 - Generic IOMMUFD implementation
 - VFIO ability to consume IOMMUFD

This is based on v5.16-rc1 and available on github:
https://github.com/LuBaolu/intel-iommu/commits/iommu-dma-ownership-v1

Best regards,
baolu

Jason Gunthorpe (1):
  vfio: Delete the unbound_list

Lu Baolu (10):
  iommu: Add device dma ownership set/release interfaces
  driver core: Set DMA ownership during driver bind/unbind
  PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  iommu: Add security context management for assigned devices
  iommu: Expose group variants of dma ownership interfaces
  vfio: Use DMA_OWNER_USER to declaim passthrough devices
  vfio: Remove use of vfio_group_viable()
  vfio: Remove iommu group notifier
  iommu: Remove iommu group changes notifier

 include/linux/device/driver.h         |   7 +-
 include/linux/iommu.h                 |  75 ++++---
 drivers/base/dd.c                     |  12 ++
 drivers/iommu/iommu.c                 | 274 ++++++++++++++++++--------
 drivers/pci/pci-stub.c                |   3 +
 drivers/pci/pcie/portdrv_pci.c        |   2 +
 drivers/vfio/fsl-mc/vfio_fsl_mc.c     |   1 +
 drivers/vfio/pci/vfio_pci.c           |   3 +
 drivers/vfio/platform/vfio_amba.c     |   1 +
 drivers/vfio/platform/vfio_platform.c |   1 +
 drivers/vfio/vfio.c                   | 247 ++---------------------
 11 files changed, 294 insertions(+), 332 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  2021-11-15 13:14   ` Christoph Hellwig
  2021-11-15 20:38   ` Bjorn Helgaas
  2021-11-15  2:05 ` [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind Lu Baolu
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

From the perspective of who is initiating the device to do DMA, device
DMA could be divided into the following types:

        DMA_OWNER_KERNEL: kernel device driver intiates the DMA
        DMA_OWNER_USER: userspace device driver intiates the DMA

DMA_OWNER_KERNEL and DMA_OWNER_USER are exclusive for all devices in
same iommu group as an iommu group is the smallest granularity of device
isolation and protection that the IOMMU subsystem can guarantee. This
extends the iommu core to enforce this exclusion when devices are
assigned to userspace.

Basically two new interfaces are provided:

        int iommu_device_set_dma_owner(struct device *dev,
                enum iommu_dma_owner mode, struct file *user_file);
        void iommu_device_release_dma_owner(struct device *dev,
                enum iommu_dma_owner mode);

Although above interfaces are per-device, DMA owner is tracked per group
under the hood. An iommu group cannot have both DMA_OWNER_KERNEL
and DMA_OWNER_USER set at the same time. Violation of this assumption
fails iommu_device_set_dma_owner().

Kernel driver which does DMA have DMA_OWNER_KENREL automatically
set/released in the driver binding process (see next patch).

Kernel driver which doesn't do DMA should not set the owner type (via a
new suppress flag in next patch). Device bound to such driver is considered
same as a driver-less device which is compatible to all owner types.

Userspace driver framework (e.g. vfio) should set DMA_OWNER_USER for
a device before the userspace is allowed to access it, plus a fd pointer to
mark the user identity so a single group cannot be operated by multiple
users simultaneously. Vice versa, the owner type should be released after
the user access permission is withdrawn.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h |  31 ++++++++++++
 drivers/iommu/iommu.c | 106 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 137 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d2f3435e7d17..f77eb9e7788a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -162,6 +162,18 @@ enum iommu_dev_features {
 	IOMMU_DEV_FEAT_IOPF,
 };
 
+/**
+ * enum iommu_dma_owner - IOMMU DMA ownership
+ * @DMA_OWNER_NONE: No DMA ownership
+ * @DMA_OWNER_KERNEL: Device DMAs are initiated by a kernel driver
+ * @DMA_OWNER_USER: Device DMAs are initiated by a userspace driver
+ */
+enum iommu_dma_owner {
+	DMA_OWNER_NONE,
+	DMA_OWNER_KERNEL,
+	DMA_OWNER_USER,
+};
+
 #define IOMMU_PASID_INVALID	(-1U)
 
 #ifdef CONFIG_IOMMU_API
@@ -681,6 +693,10 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 void iommu_sva_unbind_device(struct iommu_sva *handle);
 u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 
+int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
+			       struct file *user_file);
+void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1081,6 +1097,21 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
 }
+
+static inline int iommu_device_set_dma_owner(struct device *dev,
+					     enum iommu_dma_owner owner,
+					     struct file *user_file)
+{
+	if (owner != DMA_OWNER_KERNEL)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline void iommu_device_release_dma_owner(struct device *dev,
+						  enum iommu_dma_owner owner)
+{
+}
 #endif /* CONFIG_IOMMU_API */
 
 /**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8b86406b7162..39493b1b3edf 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -26,6 +26,7 @@
 #include <linux/fsl/mc.h>
 #include <linux/module.h>
 #include <linux/cc_platform.h>
+#include <linux/file.h>
 #include <trace/events/iommu.h>
 
 static struct kset *iommu_group_kset;
@@ -48,6 +49,9 @@ struct iommu_group {
 	struct iommu_domain *default_domain;
 	struct iommu_domain *domain;
 	struct list_head entry;
+	enum iommu_dma_owner dma_owner;
+	refcount_t owner_cnt;
+	struct file *owner_user_file;
 };
 
 struct group_device {
@@ -621,6 +625,7 @@ struct iommu_group *iommu_group_alloc(void)
 	INIT_LIST_HEAD(&group->devices);
 	INIT_LIST_HEAD(&group->entry);
 	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
+	group->dma_owner = DMA_OWNER_NONE;
 
 	ret = ida_simple_get(&iommu_group_ida, 0, 0, GFP_KERNEL);
 	if (ret < 0) {
@@ -3351,3 +3356,104 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 
 	return ret;
 }
+
+static int __iommu_group_set_dma_owner(struct iommu_group *group,
+				       enum iommu_dma_owner owner,
+				       struct file *user_file)
+{
+	if (group->dma_owner != DMA_OWNER_NONE && group->dma_owner != owner)
+		return -EBUSY;
+
+	if (owner == DMA_OWNER_USER) {
+		if (!user_file)
+			return -EINVAL;
+
+		if (group->owner_user_file && group->owner_user_file != user_file)
+			return -EPERM;
+	}
+
+	if (!refcount_inc_not_zero(&group->owner_cnt)) {
+		group->dma_owner = owner;
+		refcount_set(&group->owner_cnt, 1);
+
+		if (owner == DMA_OWNER_USER) {
+			get_file(user_file);
+			group->owner_user_file = user_file;
+		}
+	}
+
+	return 0;
+}
+
+static void __iommu_group_release_dma_owner(struct iommu_group *group,
+					    enum iommu_dma_owner owner)
+{
+	if (WARN_ON(group->dma_owner != owner))
+		return;
+
+	if (refcount_dec_and_test(&group->owner_cnt)) {
+		group->dma_owner = DMA_OWNER_NONE;
+
+		if (owner == DMA_OWNER_USER) {
+			fput(group->owner_user_file);
+			group->owner_user_file = NULL;
+		}
+	}
+}
+
+/**
+ * iommu_device_set_dma_owner() - Set DMA ownership of a device
+ * @dev: The device.
+ * @owner: DMA_OWNER_KERNEL or DMA_OWNER_USER.
+ * @user_file: The device fd when DMA_OWNER_USER is about to set.
+ *
+ * Set the DMA ownership of a device. The KERNEL and USER ownership are
+ * exclusive. For DMA_OWNER_USER, the caller should also specify the fd
+ * through which the I/O address spaces are managed for the target device.
+ * This interface guarantees that the USER DMA ownership is only assigned
+ * to the same fd.
+ */
+int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
+			       struct file *user_file)
+{
+	struct iommu_group *group = iommu_group_get(dev);
+	int ret;
+
+	if (!group) {
+		if (owner == DMA_OWNER_KERNEL)
+			return 0;
+		else
+			return -ENODEV;
+	}
+
+	mutex_lock(&group->mutex);
+	ret = __iommu_group_set_dma_owner(group, owner, user_file);
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_device_set_dma_owner);
+
+/**
+ * iommu_device_release_dma_owner() - Release DMA ownership of a device
+ * @dev: The device.
+ * @owner: DMA_OWNER_KERNEL or DMA_OWNER_USER.
+ *
+ * Release the DMA ownership claimed by iommu_device_set_dma_owner().
+ */
+void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner)
+{
+	struct iommu_group *group = iommu_group_get(dev);
+
+	if (!group) {
+		WARN_ON(owner != DMA_OWNER_KERNEL);
+		return;
+	}
+
+	mutex_lock(&group->mutex);
+	__iommu_group_release_dma_owner(group, owner);
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+}
+EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
  2021-11-15  2:05 ` [PATCH 01/11] iommu: Add device dma ownership set/release interfaces Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  2021-11-15  6:59   ` Greg Kroah-Hartman
  2021-11-15 13:19   ` Christoph Hellwig
  2021-11-15  2:05 ` [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming Lu Baolu
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

This extends really_probe() to allow checking for dma ownership conflict
during the driver binding process. By default, the DMA_OWNER_KERNEL is
claimed for the bound driver before calling its .probe() callback. If this
operation fails (e.g. the iommu group of the target device already has the
DMA_OWNER_USER set), the binding process is aborted to avoid breaking the
security contract for devices in the iommu group.

Without this change, the vfio driver has to listen to a bus BOUND_DRIVER
event and then BUG_ON() in case of dma ownership conflict. This leads to
bad user experience since careless driver binding operation may crash the
system if the admin overlooks the group restriction.

Aside from bad design, this leads to a security problem as a root user,
even with lockdown=integrity, can force the kernel to BUG.

Driver may set a new flag (suppress_auto_claim_dma_owner) to disable auto
claim in the binding process. Examples include kernel drivers (pci_stub,
PCI bridge drivers, etc.) which don't trigger DMA at all thus can be safely
exempted in DMA ownership check and userspace framework drivers (vfio/vdpa
etc.) which need to manually claim DMA_OWNER_USER when assigning a device
to userspace.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/linux-iommu/20210922123931.GI327412@nvidia.com/
Link: https://lore.kernel.org/linux-iommu/20210928115751.GK964074@nvidia.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/device/driver.h |  7 ++++++-
 drivers/base/dd.c             | 12 ++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index a498ebcf4993..25d39c64c4d9 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -54,6 +54,10 @@ enum probe_type {
  * @owner:	The module owner.
  * @mod_name:	Used for built-in modules.
  * @suppress_bind_attrs: Disables bind/unbind via sysfs.
+ * @suppress_auto_claim_dma_owner: Disable auto claiming of kernel DMA owner.
+ *		Drivers which don't require DMA or want to manually claim the
+ *		owner type (e.g. userspace driver frameworks) could set this
+ *		flag.
  * @probe_type:	Type of the probe (synchronous or asynchronous) to use.
  * @of_match_table: The open firmware table.
  * @acpi_match_table: The ACPI match table.
@@ -99,7 +103,8 @@ struct device_driver {
 	struct module		*owner;
 	const char		*mod_name;	/* used for built-in modules */
 
-	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
+	bool suppress_bind_attrs:1;	/* disables bind/unbind via sysfs */
+	bool suppress_auto_claim_dma_owner:1;
 	enum probe_type probe_type;
 
 	const struct of_device_id	*of_match_table;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 68ea1f949daa..ab3333351f19 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -28,6 +28,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/pinctrl/devinfo.h>
 #include <linux/slab.h>
+#include <linux/iommu.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -566,6 +567,12 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 		goto done;
 	}
 
+	if (!drv->suppress_auto_claim_dma_owner) {
+		ret = iommu_device_set_dma_owner(dev, DMA_OWNER_KERNEL, NULL);
+		if (ret)
+			return ret;
+	}
+
 re_probe:
 	dev->driver = drv;
 
@@ -673,6 +680,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 		dev->pm_domain->dismiss(dev);
 	pm_runtime_reinit(dev);
 	dev_pm_set_driver_flags(dev, 0);
+	if (!drv->suppress_auto_claim_dma_owner)
+		iommu_device_release_dma_owner(dev, DMA_OWNER_KERNEL);
 done:
 	return ret;
 }
@@ -1215,6 +1224,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
 		pm_runtime_reinit(dev);
 		dev_pm_set_driver_flags(dev, 0);
 
+		if (!drv->suppress_auto_claim_dma_owner)
+			iommu_device_release_dma_owner(dev, DMA_OWNER_KERNEL);
+
 		klist_remove(&dev->p->knode_driver);
 		device_pm_check_callbacks(dev);
 		if (dev->bus)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
  2021-11-15  2:05 ` [PATCH 01/11] iommu: Add device dma ownership set/release interfaces Lu Baolu
  2021-11-15  2:05 ` [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  2021-11-15 13:21   ` Christoph Hellwig
                     ` (2 more replies)
  2021-11-15  2:05 ` [PATCH 04/11] PCI: portdrv: " Lu Baolu
                   ` (7 subsequent siblings)
  10 siblings, 3 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

pci_stub allows the admin to block driver binding on a device and make
it permanently shared with userspace. Since pci_stub does not do DMA,
it is safe. However the admin must understand that using pci_stub allows
userspace to attack whatever device it was bound to.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/pci/pci-stub.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
index e408099fea52..6324c68602b4 100644
--- a/drivers/pci/pci-stub.c
+++ b/drivers/pci/pci-stub.c
@@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
 	.name		= "pci-stub",
 	.id_table	= NULL,	/* only dynamic id's */
 	.probe		= pci_stub_probe,
+	.driver		= {
+		.suppress_auto_claim_dma_owner = true,
+	},
 };
 
 static int __init pci_stub_init(void)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 04/11] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (2 preceding siblings ...)
  2021-11-15  2:05 ` [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  2021-11-15 20:44   ` Bjorn Helgaas
  2021-11-15  2:05 ` [PATCH 05/11] iommu: Add security context management for assigned devices Lu Baolu
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
then all of the downstream devices will be part of the same IOMMU group
as the bridge. As long as the bridge kernel driver doesn't map and
access any PCI mmio bar, it's safe to bind it to the device in a USER-
owned group. Hence, safe to suppress the kernel DMA ownership auto-
claiming.

The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") permitted a
class of kernel drivers. This is not always safe. For example, the SHPC
system design requires that it must be integrated into a PCI-to-PCI
bridge or a host bridge. The shpchp_core driver relies on the PCI mmio
bar access for the controller functionality. Binding it to the device
belonging to a USER-owned group will allow the user to change the
controller via p2p transactions which is unknown to the hot-plug driver
and could lead to some unpredictable consequences.

Now that we have driver self-declaration of safety we should rely on that.
This change may cause regression on some platforms, since all bridges were
exempted before, but now they have to be manually audited before doing so.
This is actually the desired outcome anyway.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/pci/pcie/portdrv_pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 35eca6277a96..1285862a9aa8 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -203,6 +203,8 @@ static struct pci_driver pcie_portdriver = {
 	.err_handler	= &pcie_portdrv_err_handler,
 
 	.driver.pm	= PCIE_PORTDRV_PM_OPS,
+
+	.driver.suppress_auto_claim_dma_owner = true,
 };
 
 static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 05/11] iommu: Add security context management for assigned devices
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (3 preceding siblings ...)
  2021-11-15  2:05 ` [PATCH 04/11] PCI: portdrv: " Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  2021-11-15 13:22   ` Christoph Hellwig
  2021-11-15  2:05 ` [PATCH 06/11] iommu: Expose group variants of dma ownership interfaces Lu Baolu
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

When an iommu group has DMA_OWNER_USER set for the first time, it is a
contract that the group could be assigned to userspace from now on. It
must be detached from the default iommu domain and all devices in this
group are blocked from doing DMA until it is attached to a user I/O
address space. Vice versa, the default domain should be re-attached to
the group after the last DMA_OWNER_USER is released.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 39493b1b3edf..916a4d448150 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -293,7 +293,11 @@ int iommu_probe_device(struct device *dev)
 	mutex_lock(&group->mutex);
 	iommu_alloc_default_domain(group, dev);
 
-	if (group->default_domain) {
+	/*
+	 * If any device in the group has been initialized for user dma,
+	 * avoid attaching the default domain.
+	 */
+	if (group->default_domain && group->dma_owner != DMA_OWNER_USER) {
 		ret = __iommu_attach_device(group->default_domain, dev);
 		if (ret) {
 			mutex_unlock(&group->mutex);
@@ -2325,7 +2329,7 @@ static int __iommu_attach_group(struct iommu_domain *domain,
 {
 	int ret;
 
-	if (group->default_domain && group->domain != group->default_domain)
+	if (group->domain && group->domain != group->default_domain)
 		return -EBUSY;
 
 	ret = __iommu_group_for_each_dev(group, domain,
@@ -2362,7 +2366,11 @@ static void __iommu_detach_group(struct iommu_domain *domain,
 {
 	int ret;
 
-	if (!group->default_domain) {
+	/*
+	 * If any device in the group has been initialized for user dma,
+	 * avoid re-attaching the default domain.
+	 */
+	if (!group->default_domain || group->dma_owner == DMA_OWNER_USER) {
 		__iommu_group_for_each_dev(group, domain,
 					   iommu_group_do_detach_device);
 		group->domain = NULL;
@@ -3377,6 +3385,21 @@ static int __iommu_group_set_dma_owner(struct iommu_group *group,
 		refcount_set(&group->owner_cnt, 1);
 
 		if (owner == DMA_OWNER_USER) {
+			/*
+			 * The UNMANAGED domain shouldn't be attached before
+			 * claiming the USER ownership for the first time.
+			 */
+			if (group->domain) {
+				if (group->domain != group->default_domain) {
+					group->dma_owner = DMA_OWNER_NONE;
+					refcount_set(&group->owner_cnt, 0);
+
+					return -EBUSY;
+				}
+
+				__iommu_detach_group(group->domain, group);
+			}
+
 			get_file(user_file);
 			group->owner_user_file = user_file;
 		}
@@ -3397,6 +3420,13 @@ static void __iommu_group_release_dma_owner(struct iommu_group *group,
 		if (owner == DMA_OWNER_USER) {
 			fput(group->owner_user_file);
 			group->owner_user_file = NULL;
+
+			/*
+			 * The UNMANAGED domain should be detached before all USER
+			 * owners have been released.
+			 */
+			if (!WARN_ON(group->domain) && group->default_domain)
+				__iommu_attach_group(group->default_domain, group);
 		}
 	}
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 06/11] iommu: Expose group variants of dma ownership interfaces
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (4 preceding siblings ...)
  2021-11-15  2:05 ` [PATCH 05/11] iommu: Add security context management for assigned devices Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  2021-11-15 13:27   ` Christoph Hellwig
  2021-11-15  2:05 ` [PATCH 07/11] vfio: Use DMA_OWNER_USER to declaim passthrough devices Lu Baolu
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

The vfio needs to set DMA_OWNER_USER for the entire group when attaching
it to a vfio container. So expose group variants of setting/releasing dma
ownership for this purpose.

This also exposes the helper iommu_group_dma_owner_unclaimed() for vfio
report to userspace if the group is viable to user assignment, for
compatibility with VFIO_GROUP_FLAGS_VIABLE.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h | 21 ++++++++++++++++
 drivers/iommu/iommu.c | 57 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f77eb9e7788a..3d2dfd220d3c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -696,6 +696,10 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
 			       struct file *user_file);
 void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner);
+int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner,
+			      struct file *user_file);
+void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner);
+bool iommu_group_dma_owner_unclaimed(struct iommu_group *group);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -1112,6 +1116,23 @@ static inline void iommu_device_release_dma_owner(struct device *dev,
 						  enum iommu_dma_owner owner)
 {
 }
+
+static inline int iommu_group_set_dma_owner(struct iommu_group *group,
+					    enum iommu_dma_owner owner,
+					    struct file *user_file)
+{
+	return -EINVAL;
+}
+
+static inline void iommu_group_release_dma_owner(struct iommu_group *group,
+						 enum iommu_dma_owner owner)
+{
+}
+
+static inline bool iommu_group_dma_owner_unclaimed(struct iommu_group *group)
+{
+	return false;
+}
 #endif /* CONFIG_IOMMU_API */
 
 /**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 916a4d448150..3dcd3fc4290a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3431,6 +3431,63 @@ static void __iommu_group_release_dma_owner(struct iommu_group *group,
 	}
 }
 
+/**
+ * iommu_group_set_dma_owner() - Set DMA ownership of a group
+ * @group: The group.
+ * @owner: DMA_OWNER_KERNEL or DMA_OWNER_USER.
+ * @user_file: The device fd when set USER ownership.
+ *
+ * This is to support backward compatibility for legacy vfio which manages
+ * dma ownership in group level. New invocations on this interface should be
+ * prohibited. Instead, please turn to iommu_device_set_dma_owner().
+ */
+int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner,
+			      struct file *user_file)
+{
+	int ret;
+
+	mutex_lock(&group->mutex);
+	ret = __iommu_group_set_dma_owner(group, owner, user_file);
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_group_set_dma_owner);
+
+/**
+ * iommu_group_release_dma_owner() - Release DMA ownership of a group
+ * @group: The group.
+ * @owner: DMA_OWNER_KERNEL or DMA_OWNER_USER.
+ *
+ * Release the DMA ownership claimed by iommu_group_set_dma_owner().
+ */
+void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner)
+{
+	mutex_lock(&group->mutex);
+	__iommu_group_release_dma_owner(group, owner);
+	mutex_unlock(&group->mutex);
+}
+EXPORT_SYMBOL_GPL(iommu_group_release_dma_owner);
+
+/**
+ * iommu_group_dma_owner_unclaimed() - Is group dma ownership claimed
+ * @group: The group.
+ *
+ * This provides status check on a given group. It is racey and only for
+ * non-binding status reporting.
+ */
+bool iommu_group_dma_owner_unclaimed(struct iommu_group *group)
+{
+	enum iommu_dma_owner owner;
+
+	mutex_lock(&group->mutex);
+	owner = group->dma_owner;
+	mutex_unlock(&group->mutex);
+
+	return owner == DMA_OWNER_NONE;
+}
+EXPORT_SYMBOL_GPL(iommu_group_dma_owner_unclaimed);
+
 /**
  * iommu_device_set_dma_owner() - Set DMA ownership of a device
  * @dev: The device.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 07/11] vfio: Use DMA_OWNER_USER to declaim passthrough devices
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (5 preceding siblings ...)
  2021-11-15  2:05 ` [PATCH 06/11] iommu: Expose group variants of dma ownership interfaces Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  2021-11-15  2:05 ` [PATCH 08/11] vfio: Remove use of vfio_group_viable() Lu Baolu
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

Set DMA_OWNER_USER when an iommu group is set to a container, and
release DMA_OWNER_USER once the iommu group is unset from a container.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/fsl-mc/vfio_fsl_mc.c     |  1 +
 drivers/vfio/pci/vfio_pci.c           |  3 +++
 drivers/vfio/platform/vfio_amba.c     |  1 +
 drivers/vfio/platform/vfio_platform.c |  1 +
 drivers/vfio/vfio.c                   | 12 +++++++++++-
 5 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index 6e2e62c6f47a..b749d092a185 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -587,6 +587,7 @@ static struct fsl_mc_driver vfio_fsl_mc_driver = {
 	.driver	= {
 		.name	= "vfio-fsl-mc",
 		.owner	= THIS_MODULE,
+		.suppress_auto_claim_dma_owner = true,
 	},
 };
 
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index a5ce92beb655..8961bfaf0eb5 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -193,6 +193,9 @@ static struct pci_driver vfio_pci_driver = {
 	.remove			= vfio_pci_remove,
 	.sriov_configure	= vfio_pci_sriov_configure,
 	.err_handler		= &vfio_pci_core_err_handlers,
+	.driver = {
+		.suppress_auto_claim_dma_owner = true,
+	},
 };
 
 static void __init vfio_pci_fill_ids(void)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index badfffea14fb..2146ee52901a 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -94,6 +94,7 @@ static struct amba_driver vfio_amba_driver = {
 	.drv = {
 		.name = "vfio-amba",
 		.owner = THIS_MODULE,
+		.suppress_auto_claim_dma_owner = true,
 	},
 };
 
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 68a1c87066d7..5ef06e668192 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -75,6 +75,7 @@ static struct platform_driver vfio_platform_driver = {
 	.remove		= vfio_platform_remove,
 	.driver	= {
 		.name	= "vfio-platform",
+		.suppress_auto_claim_dma_owner = true,
 	},
 };
 
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 82fb75464f92..4e21b37e0ea8 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1198,6 +1198,8 @@ static void __vfio_group_unset_container(struct vfio_group *group)
 		driver->ops->detach_group(container->iommu_data,
 					  group->iommu_group);
 
+	iommu_group_release_dma_owner(group->iommu_group, DMA_OWNER_USER);
+
 	group->container = NULL;
 	wake_up(&group->container_q);
 	list_del(&group->container_next);
@@ -1282,13 +1284,21 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 		goto unlock_out;
 	}
 
+	ret = iommu_group_set_dma_owner(group->iommu_group,
+					DMA_OWNER_USER, f.file);
+	if (ret)
+		goto unlock_out;
+
 	driver = container->iommu_driver;
 	if (driver) {
 		ret = driver->ops->attach_group(container->iommu_data,
 						group->iommu_group,
 						group->type);
-		if (ret)
+		if (ret) {
+			iommu_group_release_dma_owner(group->iommu_group,
+						      DMA_OWNER_USER);
 			goto unlock_out;
+		}
 	}
 
 	group->container = container;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 08/11] vfio: Remove use of vfio_group_viable()
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (6 preceding siblings ...)
  2021-11-15  2:05 ` [PATCH 07/11] vfio: Use DMA_OWNER_USER to declaim passthrough devices Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  2021-11-15  2:05 ` [PATCH 09/11] vfio: Delete the unbound_list Lu Baolu
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

As DMA_OWNER_USER is claimed for the iommu group when a vfio group is
added to a vfio container, the vfio group viability is guaranteed as
long as group->container_users > 0. Remove those unnecessary group
viability checks which are only hit when group->container_users is not
zero.

The only remaining reference is in GROUP_GET_STATUS, which could be
called at any time when group fd is valid. Here we just replace the
vfio_group_viable() by directly calling iommu core to get viability
status.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 4e21b37e0ea8..6847117fac6a 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1315,12 +1315,6 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 	return ret;
 }
 
-static bool vfio_group_viable(struct vfio_group *group)
-{
-	return (iommu_group_for_each_dev(group->iommu_group,
-					 group, vfio_dev_viable) == 0);
-}
-
 static int vfio_group_add_container_user(struct vfio_group *group)
 {
 	if (!atomic_inc_not_zero(&group->container_users))
@@ -1330,7 +1324,7 @@ static int vfio_group_add_container_user(struct vfio_group *group)
 		atomic_dec(&group->container_users);
 		return -EPERM;
 	}
-	if (!group->container->iommu_driver || !vfio_group_viable(group)) {
+	if (!group->container->iommu_driver) {
 		atomic_dec(&group->container_users);
 		return -EINVAL;
 	}
@@ -1348,7 +1342,7 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 	int ret = 0;
 
 	if (0 == atomic_read(&group->container_users) ||
-	    !group->container->iommu_driver || !vfio_group_viable(group))
+	    !group->container->iommu_driver)
 		return -EINVAL;
 
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
@@ -1440,11 +1434,11 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 
 		status.flags = 0;
 
-		if (vfio_group_viable(group))
-			status.flags |= VFIO_GROUP_FLAGS_VIABLE;
-
 		if (group->container)
-			status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET;
+			status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
+					VFIO_GROUP_FLAGS_VIABLE;
+		else if (iommu_group_dma_owner_unclaimed(group->iommu_group))
+			status.flags |= VFIO_GROUP_FLAGS_VIABLE;
 
 		if (copy_to_user((void __user *)arg, &status, minsz))
 			return -EFAULT;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 09/11] vfio: Delete the unbound_list
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (7 preceding siblings ...)
  2021-11-15  2:05 ` [PATCH 08/11] vfio: Remove use of vfio_group_viable() Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  2021-11-15  2:05 ` [PATCH 10/11] vfio: Remove iommu group notifier Lu Baolu
  2021-11-15  2:05 ` [PATCH 11/11] iommu: Remove iommu group changes notifier Lu Baolu
  10 siblings, 0 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

From: Jason Gunthorpe <jgg@nvidia.com>

commit 60720a0fc646 ("vfio: Add device tracking during unbind") added the
unbound list to plug a problem with KVM where KVM_DEV_VFIO_GROUP_DEL
relied on vfio_group_get_external_user() succeeding to return the
vfio_group from a group file descriptor. The unbound list allowed
vfio_group_get_external_user() to continue to succeed in edge cases.

However commit 5d6dee80a1e9 ("vfio: New external user group/file match")
deleted the call to vfio_group_get_external_user() during
KVM_DEV_VFIO_GROUP_DEL. Instead vfio_external_group_match_file() is used
to directly match the file descriptor to the group pointer.

This in turn avoids the call down to vfio_dev_viable() during
KVM_DEV_VFIO_GROUP_DEL and also avoids the trouble the first commit was
trying to fix.

There are no other users of vfio_dev_viable() that care about the time
after vfio_unregister_group_dev() returns, so simply delete the
unbound_list entirely.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio.c | 74 ++-------------------------------------------
 1 file changed, 2 insertions(+), 72 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6847117fac6a..8c317d1a0f3c 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -62,11 +62,6 @@ struct vfio_container {
 	bool				noiommu;
 };
 
-struct vfio_unbound_dev {
-	struct device			*dev;
-	struct list_head		unbound_next;
-};
-
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -79,8 +74,6 @@ struct vfio_group {
 	struct notifier_block		nb;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
-	struct list_head		unbound_list;
-	struct mutex			unbound_lock;
 	atomic_t			opened;
 	wait_queue_head_t		container_q;
 	enum vfio_group_type		type;
@@ -340,16 +333,8 @@ vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 static void vfio_group_release(struct device *dev)
 {
 	struct vfio_group *group = container_of(dev, struct vfio_group, dev);
-	struct vfio_unbound_dev *unbound, *tmp;
-
-	list_for_each_entry_safe(unbound, tmp,
-				 &group->unbound_list, unbound_next) {
-		list_del(&unbound->unbound_next);
-		kfree(unbound);
-	}
 
 	mutex_destroy(&group->device_lock);
-	mutex_destroy(&group->unbound_lock);
 	iommu_group_put(group->iommu_group);
 	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
 	kfree(group);
@@ -381,8 +366,6 @@ static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group,
 	refcount_set(&group->users, 1);
 	INIT_LIST_HEAD(&group->device_list);
 	mutex_init(&group->device_lock);
-	INIT_LIST_HEAD(&group->unbound_list);
-	mutex_init(&group->unbound_lock);
 	init_waitqueue_head(&group->container_q);
 	group->iommu_group = iommu_group;
 	/* put in vfio_group_release() */
@@ -571,19 +554,8 @@ static int vfio_dev_viable(struct device *dev, void *data)
 	struct vfio_group *group = data;
 	struct vfio_device *device;
 	struct device_driver *drv = READ_ONCE(dev->driver);
-	struct vfio_unbound_dev *unbound;
-	int ret = -EINVAL;
 
-	mutex_lock(&group->unbound_lock);
-	list_for_each_entry(unbound, &group->unbound_list, unbound_next) {
-		if (dev == unbound->dev) {
-			ret = 0;
-			break;
-		}
-	}
-	mutex_unlock(&group->unbound_lock);
-
-	if (!ret || !drv || vfio_dev_driver_allowed(dev, drv))
+	if (!drv || vfio_dev_driver_allowed(dev, drv))
 		return 0;
 
 	device = vfio_group_get_device(group, dev);
@@ -592,7 +564,7 @@ static int vfio_dev_viable(struct device *dev, void *data)
 		return 0;
 	}
 
-	return ret;
+	return -EINVAL;
 }
 
 /**
@@ -634,7 +606,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 {
 	struct vfio_group *group = container_of(nb, struct vfio_group, nb);
 	struct device *dev = data;
-	struct vfio_unbound_dev *unbound;
 
 	switch (action) {
 	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
@@ -663,28 +634,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 			__func__, iommu_group_id(group->iommu_group),
 			dev->driver->name);
 		break;
-	case IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER:
-		dev_dbg(dev, "%s: group %d unbound from driver\n", __func__,
-			iommu_group_id(group->iommu_group));
-		/*
-		 * XXX An unbound device in a live group is ok, but we'd
-		 * really like to avoid the above BUG_ON by preventing other
-		 * drivers from binding to it.  Once that occurs, we have to
-		 * stop the system to maintain isolation.  At a minimum, we'd
-		 * want a toggle to disable driver auto probe for this device.
-		 */
-
-		mutex_lock(&group->unbound_lock);
-		list_for_each_entry(unbound,
-				    &group->unbound_list, unbound_next) {
-			if (dev == unbound->dev) {
-				list_del(&unbound->unbound_next);
-				kfree(unbound);
-				break;
-			}
-		}
-		mutex_unlock(&group->unbound_lock);
-		break;
 	}
 	return NOTIFY_OK;
 }
@@ -889,29 +838,10 @@ static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
 void vfio_unregister_group_dev(struct vfio_device *device)
 {
 	struct vfio_group *group = device->group;
-	struct vfio_unbound_dev *unbound;
 	unsigned int i = 0;
 	bool interrupted = false;
 	long rc;
 
-	/*
-	 * When the device is removed from the group, the group suddenly
-	 * becomes non-viable; the device has a driver (until the unbind
-	 * completes), but it's not present in the group.  This is bad news
-	 * for any external users that need to re-acquire a group reference
-	 * in order to match and release their existing reference.  To
-	 * solve this, we track such devices on the unbound_list to bridge
-	 * the gap until they're fully unbound.
-	 */
-	unbound = kzalloc(sizeof(*unbound), GFP_KERNEL);
-	if (unbound) {
-		unbound->dev = device->dev;
-		mutex_lock(&group->unbound_lock);
-		list_add(&unbound->unbound_next, &group->unbound_list);
-		mutex_unlock(&group->unbound_lock);
-	}
-	WARN_ON(!unbound);
-
 	vfio_device_put(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 10/11] vfio: Remove iommu group notifier
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (8 preceding siblings ...)
  2021-11-15  2:05 ` [PATCH 09/11] vfio: Delete the unbound_list Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  2021-11-15  2:05 ` [PATCH 11/11] iommu: Remove iommu group changes notifier Lu Baolu
  10 siblings, 0 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu

The iommu core and driver core have been enhanced to avoid unsafe driver
binding to a live group after iommu_group_set_dma_owner(DMA_MODE_USER)
has been called. There's no need to register iommu group notifier. This
removes the iommu group notifer which contains BUG_ON() and WARN().

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio.c | 147 --------------------------------------------
 1 file changed, 147 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 8c317d1a0f3c..c74e10044af4 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -71,7 +71,6 @@ struct vfio_group {
 	struct vfio_container		*container;
 	struct list_head		device_list;
 	struct mutex			device_lock;
-	struct notifier_block		nb;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
 	atomic_t			opened;
@@ -274,8 +273,6 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops)
 }
 EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
 
-static int vfio_iommu_group_notifier(struct notifier_block *nb,
-				     unsigned long action, void *data);
 static void vfio_group_get(struct vfio_group *group);
 
 /**
@@ -395,13 +392,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 		goto err_put;
 	}
 
-	group->nb.notifier_call = vfio_iommu_group_notifier;
-	err = iommu_group_register_notifier(iommu_group, &group->nb);
-	if (err) {
-		ret = ERR_PTR(err);
-		goto err_put;
-	}
-
 	mutex_lock(&vfio.group_lock);
 
 	/* Did we race creating this group? */
@@ -422,7 +412,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 
 err_unlock:
 	mutex_unlock(&vfio.group_lock);
-	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
 err_put:
 	put_device(&group->dev);
 	return ret;
@@ -447,7 +436,6 @@ static void vfio_group_put(struct vfio_group *group)
 	cdev_device_del(&group->cdev, &group->dev);
 	mutex_unlock(&vfio.group_lock);
 
-	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
 	put_device(&group->dev);
 }
 
@@ -503,141 +491,6 @@ static struct vfio_device *vfio_group_get_device(struct vfio_group *group,
 	return NULL;
 }
 
-/*
- * Some drivers, like pci-stub, are only used to prevent other drivers from
- * claiming a device and are therefore perfectly legitimate for a user owned
- * group.  The pci-stub driver has no dependencies on DMA or the IOVA mapping
- * of the device, but it does prevent the user from having direct access to
- * the device, which is useful in some circumstances.
- *
- * We also assume that we can include PCI interconnect devices, ie. bridges.
- * IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
- * then all of the downstream devices will be part of the same IOMMU group as
- * the bridge.  Thus, if placing the bridge into the user owned IOVA space
- * breaks anything, it only does so for user owned devices downstream.  Note
- * that error notification via MSI can be affected for platforms that handle
- * MSI within the same IOVA space as DMA.
- */
-static const char * const vfio_driver_allowed[] = { "pci-stub" };
-
-static bool vfio_dev_driver_allowed(struct device *dev,
-				    struct device_driver *drv)
-{
-	if (dev_is_pci(dev)) {
-		struct pci_dev *pdev = to_pci_dev(dev);
-
-		if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
-			return true;
-	}
-
-	return match_string(vfio_driver_allowed,
-			    ARRAY_SIZE(vfio_driver_allowed),
-			    drv->name) >= 0;
-}
-
-/*
- * A vfio group is viable for use by userspace if all devices are in
- * one of the following states:
- *  - driver-less
- *  - bound to a vfio driver
- *  - bound to an otherwise allowed driver
- *  - a PCI interconnect device
- *
- * We use two methods to determine whether a device is bound to a vfio
- * driver.  The first is to test whether the device exists in the vfio
- * group.  The second is to test if the device exists on the group
- * unbound_list, indicating it's in the middle of transitioning from
- * a vfio driver to driver-less.
- */
-static int vfio_dev_viable(struct device *dev, void *data)
-{
-	struct vfio_group *group = data;
-	struct vfio_device *device;
-	struct device_driver *drv = READ_ONCE(dev->driver);
-
-	if (!drv || vfio_dev_driver_allowed(dev, drv))
-		return 0;
-
-	device = vfio_group_get_device(group, dev);
-	if (device) {
-		vfio_device_put(device);
-		return 0;
-	}
-
-	return -EINVAL;
-}
-
-/**
- * Async device support
- */
-static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev)
-{
-	struct vfio_device *device;
-
-	/* Do we already know about it?  We shouldn't */
-	device = vfio_group_get_device(group, dev);
-	if (WARN_ON_ONCE(device)) {
-		vfio_device_put(device);
-		return 0;
-	}
-
-	/* Nothing to do for idle groups */
-	if (!atomic_read(&group->container_users))
-		return 0;
-
-	/* TODO Prevent device auto probing */
-	dev_WARN(dev, "Device added to live group %d!\n",
-		 iommu_group_id(group->iommu_group));
-
-	return 0;
-}
-
-static int vfio_group_nb_verify(struct vfio_group *group, struct device *dev)
-{
-	/* We don't care what happens when the group isn't in use */
-	if (!atomic_read(&group->container_users))
-		return 0;
-
-	return vfio_dev_viable(dev, group);
-}
-
-static int vfio_iommu_group_notifier(struct notifier_block *nb,
-				     unsigned long action, void *data)
-{
-	struct vfio_group *group = container_of(nb, struct vfio_group, nb);
-	struct device *dev = data;
-
-	switch (action) {
-	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
-		vfio_group_nb_add_dev(group, dev);
-		break;
-	case IOMMU_GROUP_NOTIFY_DEL_DEVICE:
-		/*
-		 * Nothing to do here.  If the device is in use, then the
-		 * vfio sub-driver should block the remove callback until
-		 * it is unused.  If the device is unused or attached to a
-		 * stub driver, then it should be released and we don't
-		 * care that it will be going away.
-		 */
-		break;
-	case IOMMU_GROUP_NOTIFY_BIND_DRIVER:
-		dev_dbg(dev, "%s: group %d binding to driver\n", __func__,
-			iommu_group_id(group->iommu_group));
-		break;
-	case IOMMU_GROUP_NOTIFY_BOUND_DRIVER:
-		dev_dbg(dev, "%s: group %d bound to driver %s\n", __func__,
-			iommu_group_id(group->iommu_group), dev->driver->name);
-		BUG_ON(vfio_group_nb_verify(group, dev));
-		break;
-	case IOMMU_GROUP_NOTIFY_UNBIND_DRIVER:
-		dev_dbg(dev, "%s: group %d unbinding from driver %s\n",
-			__func__, iommu_group_id(group->iommu_group),
-			dev->driver->name);
-		break;
-	}
-	return NOTIFY_OK;
-}
-
 /**
  * VFIO driver API
  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 11/11] iommu: Remove iommu group changes notifier
  2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
                   ` (9 preceding siblings ...)
  2021-11-15  2:05 ` [PATCH 10/11] vfio: Remove iommu group notifier Lu Baolu
@ 2021-11-15  2:05 ` Lu Baolu
  10 siblings, 0 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-15  2:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj
  Cc: Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel, Lu Baolu, Christoph Hellwig

The iommu group changes notifer is not referenced in the tree. Remove it
to avoid dead code.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h | 23 -------------
 drivers/iommu/iommu.c | 75 -------------------------------------------
 2 files changed, 98 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3d2dfd220d3c..d8946f22edd5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -414,13 +414,6 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
 	};
 }
 
-#define IOMMU_GROUP_NOTIFY_ADD_DEVICE		1 /* Device added */
-#define IOMMU_GROUP_NOTIFY_DEL_DEVICE		2 /* Pre Device removed */
-#define IOMMU_GROUP_NOTIFY_BIND_DRIVER		3 /* Pre Driver bind */
-#define IOMMU_GROUP_NOTIFY_BOUND_DRIVER		4 /* Post Driver bind */
-#define IOMMU_GROUP_NOTIFY_UNBIND_DRIVER	5 /* Pre Driver unbind */
-#define IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER	6 /* Post Driver unbind */
-
 extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops);
 extern int bus_iommu_probe(struct bus_type *bus);
 extern bool iommu_present(struct bus_type *bus);
@@ -493,10 +486,6 @@ extern int iommu_group_for_each_dev(struct iommu_group *group, void *data,
 extern struct iommu_group *iommu_group_get(struct device *dev);
 extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
 extern void iommu_group_put(struct iommu_group *group);
-extern int iommu_group_register_notifier(struct iommu_group *group,
-					 struct notifier_block *nb);
-extern int iommu_group_unregister_notifier(struct iommu_group *group,
-					   struct notifier_block *nb);
 extern int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
 					void *data);
@@ -897,18 +886,6 @@ static inline void iommu_group_put(struct iommu_group *group)
 {
 }
 
-static inline int iommu_group_register_notifier(struct iommu_group *group,
-						struct notifier_block *nb)
-{
-	return -ENODEV;
-}
-
-static inline int iommu_group_unregister_notifier(struct iommu_group *group,
-						  struct notifier_block *nb)
-{
-	return 0;
-}
-
 static inline
 int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3dcd3fc4290a..064d0679906a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -18,7 +18,6 @@
 #include <linux/errno.h>
 #include <linux/iommu.h>
 #include <linux/idr.h>
-#include <linux/notifier.h>
 #include <linux/err.h>
 #include <linux/pci.h>
 #include <linux/bitops.h>
@@ -41,7 +40,6 @@ struct iommu_group {
 	struct kobject *devices_kobj;
 	struct list_head devices;
 	struct mutex mutex;
-	struct blocking_notifier_head notifier;
 	void *iommu_data;
 	void (*iommu_data_release)(void *iommu_data);
 	char *name;
@@ -628,7 +626,6 @@ struct iommu_group *iommu_group_alloc(void)
 	mutex_init(&group->mutex);
 	INIT_LIST_HEAD(&group->devices);
 	INIT_LIST_HEAD(&group->entry);
-	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
 	group->dma_owner = DMA_OWNER_NONE;
 
 	ret = ida_simple_get(&iommu_group_ida, 0, 0, GFP_KERNEL);
@@ -904,10 +901,6 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 	if (ret)
 		goto err_put_group;
 
-	/* Notify any listeners about change to group. */
-	blocking_notifier_call_chain(&group->notifier,
-				     IOMMU_GROUP_NOTIFY_ADD_DEVICE, dev);
-
 	trace_add_device_to_group(group->id, dev);
 
 	dev_info(dev, "Adding to iommu group %d\n", group->id);
@@ -949,10 +942,6 @@ void iommu_group_remove_device(struct device *dev)
 
 	dev_info(dev, "Removing from iommu group %d\n", group->id);
 
-	/* Pre-notify listeners that a device is being removed. */
-	blocking_notifier_call_chain(&group->notifier,
-				     IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev);
-
 	mutex_lock(&group->mutex);
 	list_for_each_entry(tmp_device, &group->devices, list) {
 		if (tmp_device->dev == dev) {
@@ -1075,36 +1064,6 @@ void iommu_group_put(struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_group_put);
 
-/**
- * iommu_group_register_notifier - Register a notifier for group changes
- * @group: the group to watch
- * @nb: notifier block to signal
- *
- * This function allows iommu group users to track changes in a group.
- * See include/linux/iommu.h for actions sent via this notifier.  Caller
- * should hold a reference to the group throughout notifier registration.
- */
-int iommu_group_register_notifier(struct iommu_group *group,
-				  struct notifier_block *nb)
-{
-	return blocking_notifier_chain_register(&group->notifier, nb);
-}
-EXPORT_SYMBOL_GPL(iommu_group_register_notifier);
-
-/**
- * iommu_group_unregister_notifier - Unregister a notifier
- * @group: the group to watch
- * @nb: notifier block to signal
- *
- * Unregister a previously registered group notifier block.
- */
-int iommu_group_unregister_notifier(struct iommu_group *group,
-				    struct notifier_block *nb)
-{
-	return blocking_notifier_chain_unregister(&group->notifier, nb);
-}
-EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
-
 /**
  * iommu_register_device_fault_handler() - Register a device fault handler
  * @dev: the device
@@ -1653,14 +1612,8 @@ static int remove_iommu_group(struct device *dev, void *data)
 static int iommu_bus_notifier(struct notifier_block *nb,
 			      unsigned long action, void *data)
 {
-	unsigned long group_action = 0;
 	struct device *dev = data;
-	struct iommu_group *group;
 
-	/*
-	 * ADD/DEL call into iommu driver ops if provided, which may
-	 * result in ADD/DEL notifiers to group->notifier
-	 */
 	if (action == BUS_NOTIFY_ADD_DEVICE) {
 		int ret;
 
@@ -1671,34 +1624,6 @@ static int iommu_bus_notifier(struct notifier_block *nb,
 		return NOTIFY_OK;
 	}
 
-	/*
-	 * Remaining BUS_NOTIFYs get filtered and republished to the
-	 * group, if anyone is listening
-	 */
-	group = iommu_group_get(dev);
-	if (!group)
-		return 0;
-
-	switch (action) {
-	case BUS_NOTIFY_BIND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
-		break;
-	case BUS_NOTIFY_BOUND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_BOUND_DRIVER;
-		break;
-	case BUS_NOTIFY_UNBIND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_UNBIND_DRIVER;
-		break;
-	case BUS_NOTIFY_UNBOUND_DRIVER:
-		group_action = IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER;
-		break;
-	}
-
-	if (group_action)
-		blocking_notifier_call_chain(&group->notifier,
-					     group_action, dev);
-
-	iommu_group_put(group);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15  2:05 ` [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind Lu Baolu
@ 2021-11-15  6:59   ` Greg Kroah-Hartman
  2021-11-15 13:20     ` Christoph Hellwig
  2021-11-15 13:38     ` Jason Gunthorpe
  2021-11-15 13:19   ` Christoph Hellwig
  1 sibling, 2 replies; 60+ messages in thread
From: Greg Kroah-Hartman @ 2021-11-15  6:59 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Alex Williamson, Bjorn Helgaas, Jason Gunthorpe,
	Kevin Tian, Ashok Raj, Will Deacon, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, iommu, linux-pci, kvm, linux-kernel

On Mon, Nov 15, 2021 at 10:05:43AM +0800, Lu Baolu wrote:
> This extends really_probe() to allow checking for dma ownership conflict
> during the driver binding process. By default, the DMA_OWNER_KERNEL is
> claimed for the bound driver before calling its .probe() callback. If this
> operation fails (e.g. the iommu group of the target device already has the
> DMA_OWNER_USER set), the binding process is aborted to avoid breaking the
> security contract for devices in the iommu group.
> 
> Without this change, the vfio driver has to listen to a bus BOUND_DRIVER
> event and then BUG_ON() in case of dma ownership conflict. This leads to
> bad user experience since careless driver binding operation may crash the
> system if the admin overlooks the group restriction.
> 
> Aside from bad design, this leads to a security problem as a root user,
> even with lockdown=integrity, can force the kernel to BUG.
> 
> Driver may set a new flag (suppress_auto_claim_dma_owner) to disable auto
> claim in the binding process. Examples include kernel drivers (pci_stub,
> PCI bridge drivers, etc.) which don't trigger DMA at all thus can be safely
> exempted in DMA ownership check and userspace framework drivers (vfio/vdpa
> etc.) which need to manually claim DMA_OWNER_USER when assigning a device
> to userspace.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Link: https://lore.kernel.org/linux-iommu/20210922123931.GI327412@nvidia.com/
> Link: https://lore.kernel.org/linux-iommu/20210928115751.GK964074@nvidia.com/
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  include/linux/device/driver.h |  7 ++++++-
>  drivers/base/dd.c             | 12 ++++++++++++
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
> index a498ebcf4993..25d39c64c4d9 100644
> --- a/include/linux/device/driver.h
> +++ b/include/linux/device/driver.h
> @@ -54,6 +54,10 @@ enum probe_type {
>   * @owner:	The module owner.
>   * @mod_name:	Used for built-in modules.
>   * @suppress_bind_attrs: Disables bind/unbind via sysfs.
> + * @suppress_auto_claim_dma_owner: Disable auto claiming of kernel DMA owner.
> + *		Drivers which don't require DMA or want to manually claim the
> + *		owner type (e.g. userspace driver frameworks) could set this
> + *		flag.
>   * @probe_type:	Type of the probe (synchronous or asynchronous) to use.
>   * @of_match_table: The open firmware table.
>   * @acpi_match_table: The ACPI match table.
> @@ -99,7 +103,8 @@ struct device_driver {
>  	struct module		*owner;
>  	const char		*mod_name;	/* used for built-in modules */
>  
> -	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
> +	bool suppress_bind_attrs:1;	/* disables bind/unbind via sysfs */
> +	bool suppress_auto_claim_dma_owner:1;

Can a bool be a bitfield?  Is that valid C?

And why is that even needed?

>  	enum probe_type probe_type;
>  
>  	const struct of_device_id	*of_match_table;
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 68ea1f949daa..ab3333351f19 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -28,6 +28,7 @@
>  #include <linux/pm_runtime.h>
>  #include <linux/pinctrl/devinfo.h>
>  #include <linux/slab.h>
> +#include <linux/iommu.h>
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -566,6 +567,12 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  		goto done;
>  	}
>  
> +	if (!drv->suppress_auto_claim_dma_owner) {
> +		ret = iommu_device_set_dma_owner(dev, DMA_OWNER_KERNEL, NULL);
> +		if (ret)
> +			return ret;
> +	}
> +

This feels wrong to be doing it in the driver core, why doesn't the bus
that cares about this handle it instead?

You just caused all drivers in the kernel today to set and release this
ownership, as none set this flag.  Shouldn't it be the other way around?

And again, why not in the bus that cares?

You only have problems with 1 driver out of thousands, this feels wrong
to abuse the driver core this way for just that one.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-15  2:05 ` [PATCH 01/11] iommu: Add device dma ownership set/release interfaces Lu Baolu
@ 2021-11-15 13:14   ` Christoph Hellwig
  2021-11-16  1:57     ` Lu Baolu
  2021-11-15 20:38   ` Bjorn Helgaas
  1 sibling, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2021-11-15 13:14 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj, Chaitanya Kulkarni, kvm,
	rafael, linux-pci, Cornelia Huck, linux-kernel, iommu,
	Jacob jun Pan, Diana Craciun, Will Deacon

On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> +enum iommu_dma_owner {
> +	DMA_OWNER_NONE,
> +	DMA_OWNER_KERNEL,
> +	DMA_OWNER_USER,
> +};
> +

> +	enum iommu_dma_owner dma_owner;
> +	refcount_t owner_cnt;
> +	struct file *owner_user_file;

I'd just overload the ownership into owner_user_file,

 NULL			-> no owner
 (struct file *)1UL)	-> kernel
 real pointer		-> user

Which could simplify a lot of the code dealing with the owner.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15  2:05 ` [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind Lu Baolu
  2021-11-15  6:59   ` Greg Kroah-Hartman
@ 2021-11-15 13:19   ` Christoph Hellwig
  2021-11-15 13:24     ` Jason Gunthorpe
  1 sibling, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2021-11-15 13:19 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj, Chaitanya Kulkarni, kvm,
	rafael, linux-pci, Cornelia Huck, linux-kernel, iommu,
	Jacob jun Pan, Diana Craciun, Will Deacon

On Mon, Nov 15, 2021 at 10:05:43AM +0800, Lu Baolu wrote:
> @@ -566,6 +567,12 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  		goto done;
>  	}
>  
> +	if (!drv->suppress_auto_claim_dma_owner) {
> +		ret = iommu_device_set_dma_owner(dev, DMA_OWNER_KERNEL, NULL);
> +		if (ret)
> +			return ret;
> +	}

I'd expect this to go into iommu_setup_dma_ops (and its arm and s390
equivalents), as that is what claims an IOMMU for in-kernel usage

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15  6:59   ` Greg Kroah-Hartman
@ 2021-11-15 13:20     ` Christoph Hellwig
  2021-11-15 13:38     ` Jason Gunthorpe
  1 sibling, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2021-11-15 13:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Lu Baolu, Kevin Tian, Chaitanya Kulkarni, Ashok Raj, kvm, rafael,
	linux-pci, Cornelia Huck, iommu, linux-kernel, Alex Williamson,
	Jacob jun Pan, Jason Gunthorpe, Diana Craciun, Bjorn Helgaas,
	Will Deacon

On Mon, Nov 15, 2021 at 07:59:10AM +0100, Greg Kroah-Hartman wrote:
> This feels wrong to be doing it in the driver core, why doesn't the bus
> that cares about this handle it instead?
> 
> You just caused all drivers in the kernel today to set and release this
> ownership, as none set this flag.  Shouldn't it be the other way around?
> 
> And again, why not in the bus that cares?
> 
> You only have problems with 1 driver out of thousands, this feels wrong
> to abuse the driver core this way for just that one.

This isn't really a bus thingy, but related to the IOMMU subsystem.
That being said as pointed out in my previous reply I'd expect this
to go along with other IOMMU setup.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15  2:05 ` [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming Lu Baolu
@ 2021-11-15 13:21   ` Christoph Hellwig
  2021-11-15 13:31     ` Jason Gunthorpe
  2021-11-15 20:48   ` Bjorn Helgaas
  2021-11-15 22:17   ` Bjorn Helgaas
  2 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2021-11-15 13:21 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj, Chaitanya Kulkarni, kvm,
	rafael, linux-pci, Cornelia Huck, linux-kernel, iommu,
	Jacob jun Pan, Diana Craciun, Will Deacon

On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
> pci_stub allows the admin to block driver binding on a device and make
> it permanently shared with userspace. Since pci_stub does not do DMA,
> it is safe.

If an IOMMU is setup and dma-iommu or friends are not used nothing is
unsafe anyway, it just is that IOMMU won't work..

> However the admin must understand that using pci_stub allows
> userspace to attack whatever device it was bound to.

I don't understand this sentence at all.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 05/11] iommu: Add security context management for assigned devices
  2021-11-15  2:05 ` [PATCH 05/11] iommu: Add security context management for assigned devices Lu Baolu
@ 2021-11-15 13:22   ` Christoph Hellwig
  2021-11-16  7:25     ` Lu Baolu
  0 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2021-11-15 13:22 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj, Chaitanya Kulkarni, kvm,
	rafael, linux-pci, Cornelia Huck, linux-kernel, iommu,
	Jacob jun Pan, Diana Craciun, Will Deacon

On Mon, Nov 15, 2021 at 10:05:46AM +0800, Lu Baolu wrote:
> +			/*
> +			 * The UNMANAGED domain should be detached before all USER
> +			 * owners have been released.
> +			 */

Please avoid comments spilling over 80 characters.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15 13:19   ` Christoph Hellwig
@ 2021-11-15 13:24     ` Jason Gunthorpe
  2021-11-15 15:37       ` Robin Murphy
  0 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-15 13:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Kevin Tian, Ashok Raj, Chaitanya Kulkarni, kvm,
	rafael, linux-pci, Cornelia Huck, linux-kernel, iommu,
	Jacob jun Pan, Diana Craciun, Will Deacon

On Mon, Nov 15, 2021 at 05:19:02AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 15, 2021 at 10:05:43AM +0800, Lu Baolu wrote:
> > @@ -566,6 +567,12 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> >  		goto done;
> >  	}
> >  
> > +	if (!drv->suppress_auto_claim_dma_owner) {
> > +		ret = iommu_device_set_dma_owner(dev, DMA_OWNER_KERNEL, NULL);
> > +		if (ret)
> > +			return ret;
> > +	}
> 
> I'd expect this to go into iommu_setup_dma_ops (and its arm and s390
> equivalents), as that is what claims an IOMMU for in-kernel usage

If iommu_device_set_dma_owner(dev_a) fails changes dynamically
depending on what iommu_device_set_dma_owner(dev_b, DMA_OWNER_USER)
have been done.

The whole point here is that doing a
 iommu_device_set_dma_owner(dev_b, DMA_OWNER_USER)
needs to revoke kernel usage from a whole bunch of other devices in
the same group.

revoking kernel usage means it needs to ensure that no driver is bound
and prevent future drivers from being bound.

iommu_setup_dma_ops() is something done once early on in boot, not at
every driver probe, so I don't see how it can help??

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 06/11] iommu: Expose group variants of dma ownership interfaces
  2021-11-15  2:05 ` [PATCH 06/11] iommu: Expose group variants of dma ownership interfaces Lu Baolu
@ 2021-11-15 13:27   ` Christoph Hellwig
  2021-11-16  9:42     ` Lu Baolu
  0 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2021-11-15 13:27 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj, Chaitanya Kulkarni, kvm,
	rafael, linux-pci, Cornelia Huck, linux-kernel, iommu,
	Jacob jun Pan, Diana Craciun, Will Deacon

On Mon, Nov 15, 2021 at 10:05:47AM +0800, Lu Baolu wrote:
> The vfio needs to set DMA_OWNER_USER for the entire group when attaching

The vfio subsystem?  driver?

> it to a vfio container. So expose group variants of setting/releasing dma
> ownership for this purpose.
> 
> This also exposes the helper iommu_group_dma_owner_unclaimed() for vfio
> report to userspace if the group is viable to user assignment, for

.. for vfio to report .. ?

>  void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner);
> +int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner,
> +			      struct file *user_file);
> +void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner);

Pleae avoid all these overly long lines.

> +static inline int iommu_group_set_dma_owner(struct iommu_group *group,
> +					    enum iommu_dma_owner owner,
> +					    struct file *user_file)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline void iommu_group_release_dma_owner(struct iommu_group *group,
> +						 enum iommu_dma_owner owner)
> +{
> +}
> +
> +static inline bool iommu_group_dma_owner_unclaimed(struct iommu_group *group)
> +{
> +	return false;
> +}

Why do we need these stubs?  All potential callers should already
require CONFIG_IOMMU_API?  Same for the helpers added in patch 1, btw.

> +	mutex_lock(&group->mutex);
> +	ret = __iommu_group_set_dma_owner(group, owner, user_file);
> +	mutex_unlock(&group->mutex);

> +	mutex_lock(&group->mutex);
> +	__iommu_group_release_dma_owner(group, owner);
> +	mutex_unlock(&group->mutex);

Unless I'm missing something (just skipping over the patches),
the existing callers also take the lock just around these calls,
so we don't really need the __-prefixed lowlevel helpers.

> +	mutex_lock(&group->mutex);
> +	owner = group->dma_owner;
> +	mutex_unlock(&group->mutex);

No need for a lock to read a single scalar.

> +
> +	return owner == DMA_OWNER_NONE;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_dma_owner_unclaimed);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15 13:21   ` Christoph Hellwig
@ 2021-11-15 13:31     ` Jason Gunthorpe
  2021-11-15 15:14       ` Robin Murphy
  0 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-15 13:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Kevin Tian, Ashok Raj, Chaitanya Kulkarni, kvm,
	rafael, linux-pci, Cornelia Huck, linux-kernel, iommu,
	Jacob jun Pan, Diana Craciun, Will Deacon

On Mon, Nov 15, 2021 at 05:21:26AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
> > pci_stub allows the admin to block driver binding on a device and make
> > it permanently shared with userspace. Since pci_stub does not do DMA,
> > it is safe.
> 
> If an IOMMU is setup and dma-iommu or friends are not used nothing is
> unsafe anyway, it just is that IOMMU won't work..
> 
> > However the admin must understand that using pci_stub allows
> > userspace to attack whatever device it was bound to.
> 
> I don't understand this sentence at all.

If userspace has control of device A and can cause A to issue DMA to
arbitary DMA addresses then there are certain PCI topologies where A
can now issue peer to peer DMA and manipulate the MMMIO registers in
device B.

A kernel driver on device B is thus subjected to concurrent
manipulation of the device registers from userspace.

So, a 'safe' kernel driver is one that can tolerate this, and an
'unsafe' driver is one where userspace can break kernel integrity.

The second issue is DMA - because there is only one iommu_domain
underlying many devices if we give that iommu_domain to userspace it
means the kernel DMA API on other devices no longer works. 

So no kernel driver doing DMA can work at all, under any PCI topology,
if userspace owns the IO page table.

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15  6:59   ` Greg Kroah-Hartman
  2021-11-15 13:20     ` Christoph Hellwig
@ 2021-11-15 13:38     ` Jason Gunthorpe
  1 sibling, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-15 13:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Lu Baolu, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Kevin Tian, Ashok Raj, Will Deacon, rafael, Diana Craciun,
	Cornelia Huck, Eric Auger, Liu Yi L, Jacob jun Pan,
	Chaitanya Kulkarni, iommu, linux-pci, kvm, linux-kernel

On Mon, Nov 15, 2021 at 07:59:10AM +0100, Greg Kroah-Hartman wrote:
> > @@ -566,6 +567,12 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> >  		goto done;
> >  	}
> >  
> > +	if (!drv->suppress_auto_claim_dma_owner) {
> > +		ret = iommu_device_set_dma_owner(dev, DMA_OWNER_KERNEL, NULL);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> 
> This feels wrong to be doing it in the driver core, why doesn't the bus
> that cares about this handle it instead?

As Christoph said, it is not related to the bus. To elaborate any
bus_type that has iommu_ops != NULL needs this check, and it must be
done on an individual struct device as the result is sensitive to the
iommu_group member of each struct device.

> You just caused all drivers in the kernel today to set and release this
> ownership, as none set this flag.  Shouldn't it be the other way around?

No - the whole point is to cause every driver to do this test.

iommu_device_set_dma_owner() can fail for any device, if it does then
a kernel driver must not be probed. Probing a kernel driver when
iommu_device_set_dma_owner() fails will break kernel integrity due to
HW limitations.

The drv->suppress_auto_claim_dma_owner disables this restriction
because three drivers will deal with DMA ownership on their own.

> You only have problems with 1 driver out of thousands, this feels wrong
> to abuse the driver core this way for just that one.

I think you have it backwards. Few drivers out of thousands can take
an action that impacts the security of a thousand other drivers.

The key thing is that device A can have a driver with
suppress_auto_claim_dma_owner=1 and call
iommu_device_set_dma_owner(DMA_OWNER_USER) which will then cause
another device B to be unsable in the kernel.

Device B, with a normal driver, must be prevented from having a kernel
driver because of what the special driver on device A did.

This behavior is a IOMMU HW limitation that cannot be avoided. The
restrictions have always been in the kernel, they were just enforced
with a BUG_ON at probe via a bus_notifier instead of a clean failure.

So, I don't know how to block probing of the thousands of drivers
without adding a test during probing, do you have an different idea?

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15 13:31     ` Jason Gunthorpe
@ 2021-11-15 15:14       ` Robin Murphy
  2021-11-15 16:17         ` Jason Gunthorpe
  0 siblings, 1 reply; 60+ messages in thread
From: Robin Murphy @ 2021-11-15 15:14 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: Kevin Tian, Chaitanya Kulkarni, Ashok Raj, kvm, rafael,
	Greg Kroah-Hartman, Cornelia Huck, Will Deacon, linux-kernel,
	iommu, Alex Williamson, Jacob jun Pan, linux-pci, Bjorn Helgaas,
	Diana Craciun

On 2021-11-15 13:31, Jason Gunthorpe via iommu wrote:
> On Mon, Nov 15, 2021 at 05:21:26AM -0800, Christoph Hellwig wrote:
>> On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
>>> pci_stub allows the admin to block driver binding on a device and make
>>> it permanently shared with userspace. Since pci_stub does not do DMA,
>>> it is safe.
>>
>> If an IOMMU is setup and dma-iommu or friends are not used nothing is
>> unsafe anyway, it just is that IOMMU won't work..
>>
>>> However the admin must understand that using pci_stub allows
>>> userspace to attack whatever device it was bound to.
>>
>> I don't understand this sentence at all.
> 
> If userspace has control of device A and can cause A to issue DMA to
> arbitary DMA addresses then there are certain PCI topologies where A
> can now issue peer to peer DMA and manipulate the MMMIO registers in
> device B.
> 
> A kernel driver on device B is thus subjected to concurrent
> manipulation of the device registers from userspace.
> 
> So, a 'safe' kernel driver is one that can tolerate this, and an
> 'unsafe' driver is one where userspace can break kernel integrity.

You mean in the case where the kernel driver is trying to use device B 
in a purely PIO mode, such that userspace might potentially be able to 
interfere with data being transferred in and out of the kernel? Perhaps 
it's not so clear to put that under a notion of "DMA ownership", since 
device B's DMA is irrelevant and it's really much more equivalent to 
/dev/mem access or mmaping BARs to userspace while a driver is bound.

> The second issue is DMA - because there is only one iommu_domain
> underlying many devices if we give that iommu_domain to userspace it
> means the kernel DMA API on other devices no longer works.

Actually, the DMA API itself via iommu-dma will "work" just fine in the 
sense that it will still successfully perform all its operations in the 
unattached default domain, it's just that if the driver then programs 
the device to access the returned DMA address, the device is likely to 
get a nasty surprise.

> So no kernel driver doing DMA can work at all, under any PCI topology,
> if userspace owns the IO page table.

This isn't really about userspace at all - it's true of any case where a 
kernel driver wants to attach a grouped device to its own unmanaged 
domain. The fact that the VFIO kernel driver uses its unmanaged domains 
to map user pages upon user requests is merely a VFIO detail, and VFIO 
happens to be the only common case where unmanaged domains and 
non-singleton groups intersect. I'd say that, logically, if you want to 
put policy on mutual driver/usage compatibility anywhere it should be in 
iommu_attach_group().

Robin.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15 13:24     ` Jason Gunthorpe
@ 2021-11-15 15:37       ` Robin Murphy
  2021-11-15 15:56         ` Jason Gunthorpe
  0 siblings, 1 reply; 60+ messages in thread
From: Robin Murphy @ 2021-11-15 15:37 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: Kevin Tian, Chaitanya Kulkarni, Ashok Raj, kvm, rafael,
	Greg Kroah-Hartman, Cornelia Huck, Will Deacon, linux-kernel,
	iommu, Alex Williamson, Jacob jun Pan, linux-pci, Bjorn Helgaas,
	Diana Craciun

On 2021-11-15 13:24, Jason Gunthorpe via iommu wrote:
> On Mon, Nov 15, 2021 at 05:19:02AM -0800, Christoph Hellwig wrote:
>> On Mon, Nov 15, 2021 at 10:05:43AM +0800, Lu Baolu wrote:
>>> @@ -566,6 +567,12 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>>>   		goto done;
>>>   	}
>>>   
>>> +	if (!drv->suppress_auto_claim_dma_owner) {
>>> +		ret = iommu_device_set_dma_owner(dev, DMA_OWNER_KERNEL, NULL);
>>> +		if (ret)
>>> +			return ret;
>>> +	}
>>
>> I'd expect this to go into iommu_setup_dma_ops (and its arm and s390
>> equivalents), as that is what claims an IOMMU for in-kernel usage
> 
> If iommu_device_set_dma_owner(dev_a) fails changes dynamically
> depending on what iommu_device_set_dma_owner(dev_b, DMA_OWNER_USER)
> have been done.
> 
> The whole point here is that doing a
>   iommu_device_set_dma_owner(dev_b, DMA_OWNER_USER)
> needs to revoke kernel usage from a whole bunch of other devices in
> the same group.
> 
> revoking kernel usage means it needs to ensure that no driver is bound
> and prevent future drivers from being bound.
> 
> iommu_setup_dma_ops() is something done once early on in boot, not at
> every driver probe, so I don't see how it can help??

Note that there's some annoying inconsistency across architectures, and 
with the {acpi,of}_dma_configure() code in general. I guess Christoph 
might be thinking of the case where iommu_setup_dma_ops() *does* end up 
being called off the back of the bus->dma_configure() hook a few lines 
below the context above.

iommu_setup_dma_ops() itself is indeed not the appropriate place for 
this (the fact that it can be called as late as driver probe is subtly 
broken and still on my list to fix...) but
bus->dma_configure() definitely is. Only a handful of buses care about 
IOMMUs, and possibly even fewer of them support VFIO, so I'm in full 
agreement with Greg and Christoph that this absolutely warrants being 
scoped per-bus. I mean, we literally already have infrastructure to 
prevent drivers binding if the IOMMU/DMA configuration is broken or not 
ready yet; why would we want a totally different mechanism to prevent 
driver binding when the only difference is that that configuration *is* 
ready and working to the point that someone's already claimed it for 
other purposes?

Robin.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15 15:37       ` Robin Murphy
@ 2021-11-15 15:56         ` Jason Gunthorpe
  2021-11-15 18:15           ` Christoph Hellwig
  2021-11-15 18:35           ` Robin Murphy
  0 siblings, 2 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-15 15:56 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Kevin Tian, Chaitanya Kulkarni, Ashok Raj,
	kvm, rafael, Greg Kroah-Hartman, Cornelia Huck, Will Deacon,
	linux-kernel, iommu, Alex Williamson, Jacob jun Pan, linux-pci,
	Bjorn Helgaas, Diana Craciun

On Mon, Nov 15, 2021 at 03:37:18PM +0000, Robin Murphy wrote:

> IOMMUs, and possibly even fewer of them support VFIO, so I'm in full
> agreement with Greg and Christoph that this absolutely warrants being scoped
> per-bus. I mean, we literally already have infrastructure to prevent drivers
> binding if the IOMMU/DMA configuration is broken or not ready yet; why would
> we want a totally different mechanism to prevent driver binding when the
> only difference is that that configuration *is* ready and working to the
> point that someone's already claimed it for other purposes?

I see, that does make sense

I see these implementations:

drivers/amba/bus.c:     .dma_configure  = platform_dma_configure,
drivers/base/platform.c:        .dma_configure  = platform_dma_configure,
drivers/bus/fsl-mc/fsl-mc-bus.c:        .dma_configure  = fsl_mc_dma_configure,
drivers/pci/pci-driver.c:       .dma_configure  = pci_dma_configure,
drivers/gpu/host1x/bus.c:       .dma_configure = host1x_dma_configure,

Other than host1x they all work with VFIO.

Also, there is no bus->dma_unconfigure() which would be needed to
restore the device as well.

So, would you rather see duplicated code into the 4 drivers, and a new
bus op to 'unconfigure dma'

Or, a 'dev_configure_dma()' function that is roughly:

        if (dev->bus->dma_configure) {
                ret = dev->bus->dma_configure(dev);
                if (ret)
                        return ret;
                if (!drv->suppress_auto_claim_dma_owner) {
                       ret = iommu_device_set_dma_owner(dev, DMA_OWNER_KERNEL,
                                                        NULL);
                       if (ret)
                               ret;
                }
         }

And a pair'd undo.

This is nice because we can enforce dev->bus->dma_configure when doing
a user bind so everything holds together safely without relying on
each bus_type to properly implement security.

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15 15:14       ` Robin Murphy
@ 2021-11-15 16:17         ` Jason Gunthorpe
  2021-11-15 17:54           ` Robin Murphy
  0 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-15 16:17 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Kevin Tian, Chaitanya Kulkarni, Ashok Raj,
	kvm, rafael, Greg Kroah-Hartman, Cornelia Huck, Will Deacon,
	linux-kernel, iommu, Alex Williamson, Jacob jun Pan, linux-pci,
	Bjorn Helgaas, Diana Craciun

On Mon, Nov 15, 2021 at 03:14:49PM +0000, Robin Murphy wrote:

> > If userspace has control of device A and can cause A to issue DMA to
> > arbitary DMA addresses then there are certain PCI topologies where A
> > can now issue peer to peer DMA and manipulate the MMMIO registers in
> > device B.
> > 
> > A kernel driver on device B is thus subjected to concurrent
> > manipulation of the device registers from userspace.
> > 
> > So, a 'safe' kernel driver is one that can tolerate this, and an
> > 'unsafe' driver is one where userspace can break kernel integrity.
> 
> You mean in the case where the kernel driver is trying to use device B in a
> purely PIO mode, such that userspace might potentially be able to interfere
> with data being transferred in and out of the kernel?

s/PIO/MMIO, but yes basically. And not just data trasnfer but
userspace can interfere with the device state as well.

> Perhaps it's not so clear to put that under a notion of "DMA
> ownership", since device B's DMA is irrelevant and it's really much
> more equivalent to /dev/mem access or mmaping BARs to userspace
> while a driver is bound.

It is DMA ownership because device A's DMA is what is relevant
here. device A's DMA compromises device B. So device A asserts it has
USER ownership for DMA.

Any device in a group with USER ownership is incompatible with a
kernel driver.

> > The second issue is DMA - because there is only one iommu_domain
> > underlying many devices if we give that iommu_domain to userspace it
> > means the kernel DMA API on other devices no longer works.
> 
> Actually, the DMA API itself via iommu-dma will "work" just fine in the
> sense that it will still successfully perform all its operations in the
> unattached default domain, it's just that if the driver then programs the
> device to access the returned DMA address, the device is likely to get a
> nasty surprise.

A DMA API that returns an dma_ddr_t that does not result in data
transfer to the specified buffers is not working, in my book - it
breaks the API contract.

> > So no kernel driver doing DMA can work at all, under any PCI topology,
> > if userspace owns the IO page table.
> 
> This isn't really about userspace at all - it's true of any case where a
> kernel driver wants to attach a grouped device to its own unmanaged
> domain.

This is true for the dma api issue in isolation.

I think if we have a user someday it would make sense to add another
API DMA_OWNER_DRIVER_DOMAIN that captures how the dma API doesn't work
but DMA MMIO attacks are not possible.

> The fact that the VFIO kernel driver uses its unmanaged domains to map user
> pages upon user requests is merely a VFIO detail, and VFIO happens to be the
> only common case where unmanaged domains and non-singleton groups intersect.
> I'd say that, logically, if you want to put policy on mutual driver/usage
> compatibility anywhere it should be in iommu_attach_group().

It would make sense for iommu_attach_group() to require that the
DMA_OWNERSHIP is USER or DRIVER_DOMAIN.

That has a nice symmetry with iommu_attach_device() already requiring
that the group has a single device. For a driver to use these APIs it
must ensure security, one way or another.

That is a good idea, but requires understanding what tegra is
doing. Maybe tegra is that DMA_OWNER_DRIVER_DOMAIN user?

I wouldn't want to see iommu_attach_group() change the DMA_OWNERSHIP,
I think ownership is cleaner as a dedicated API. Adding a file * and
probably the enum to iommu_attach_group() feels weird.

We need the dedicated API for the dma_configure op, and keeping
ownership split from the current domain makes more sense with the
design in the iommfd RFC.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15 16:17         ` Jason Gunthorpe
@ 2021-11-15 17:54           ` Robin Murphy
  2021-11-15 18:19             ` Christoph Hellwig
  2021-11-15 19:22             ` Jason Gunthorpe
  0 siblings, 2 replies; 60+ messages in thread
From: Robin Murphy @ 2021-11-15 17:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Kevin Tian, Chaitanya Kulkarni, Ashok Raj,
	kvm, rafael, Greg Kroah-Hartman, Cornelia Huck, Will Deacon,
	linux-kernel, iommu, Alex Williamson, Jacob jun Pan, linux-pci,
	Bjorn Helgaas, Diana Craciun

On 2021-11-15 16:17, Jason Gunthorpe wrote:
> On Mon, Nov 15, 2021 at 03:14:49PM +0000, Robin Murphy wrote:
> 
>>> If userspace has control of device A and can cause A to issue DMA to
>>> arbitary DMA addresses then there are certain PCI topologies where A
>>> can now issue peer to peer DMA and manipulate the MMMIO registers in
>>> device B.
>>>
>>> A kernel driver on device B is thus subjected to concurrent
>>> manipulation of the device registers from userspace.
>>>
>>> So, a 'safe' kernel driver is one that can tolerate this, and an
>>> 'unsafe' driver is one where userspace can break kernel integrity.
>>
>> You mean in the case where the kernel driver is trying to use device B in a
>> purely PIO mode, such that userspace might potentially be able to interfere
>> with data being transferred in and out of the kernel?
> 
> s/PIO/MMIO, but yes basically. And not just data trasnfer but
> userspace can interfere with the device state as well.

Sure, but unexpected changes in device state could happen for any number 
of reasons - uncorrected ECC error, surprise removal, etc. - so if that 
can affect "kernel integrity" I'm considering it an independent problem.

>> Perhaps it's not so clear to put that under a notion of "DMA
>> ownership", since device B's DMA is irrelevant and it's really much
>> more equivalent to /dev/mem access or mmaping BARs to userspace
>> while a driver is bound.
> 
> It is DMA ownership because device A's DMA is what is relevant
> here. device A's DMA compromises device B. So device A asserts it has
> USER ownership for DMA.
> 
> Any device in a group with USER ownership is incompatible with a
> kernel driver.

I can see the argument from that angle, but you can equally look at it 
another way and say that a device with kernel ownership is incompatible 
with a kernel driver, if userspace can call write() on 
"/sys/devices/B/resource0" such that device A's kernel driver DMAs all 
over it. Maybe that particular example lands firmly under "just don't do 
that", but I'd like to figure out where exactly we should draw the line 
between "DMA" and "ability to mess with a device".

>>> The second issue is DMA - because there is only one iommu_domain
>>> underlying many devices if we give that iommu_domain to userspace it
>>> means the kernel DMA API on other devices no longer works.
>>
>> Actually, the DMA API itself via iommu-dma will "work" just fine in the
>> sense that it will still successfully perform all its operations in the
>> unattached default domain, it's just that if the driver then programs the
>> device to access the returned DMA address, the device is likely to get a
>> nasty surprise.
> 
> A DMA API that returns an dma_ddr_t that does not result in data
> transfer to the specified buffers is not working, in my book - it
> breaks the API contract.
> 
>>> So no kernel driver doing DMA can work at all, under any PCI topology,
>>> if userspace owns the IO page table.
>>
>> This isn't really about userspace at all - it's true of any case where a
>> kernel driver wants to attach a grouped device to its own unmanaged
>> domain.
> 
> This is true for the dma api issue in isolation.

No, it's definitely a general IOMMU-API-level thing; you could just as 
well have two drivers both trying to attach to their own unmanaged 
domains without DMA API involvement. What I think it boils down to is 
that if multiple devices in a group are bound (or want to bind) to 
different drivers, we want to enforce some kind of consensus between 
those drivers over IOMMU API usage.

> I think if we have a user someday it would make sense to add another
> API DMA_OWNER_DRIVER_DOMAIN that captures how the dma API doesn't work
> but DMA MMIO attacks are not possible.
> 
>> The fact that the VFIO kernel driver uses its unmanaged domains to map user
>> pages upon user requests is merely a VFIO detail, and VFIO happens to be the
>> only common case where unmanaged domains and non-singleton groups intersect.
>> I'd say that, logically, if you want to put policy on mutual driver/usage
>> compatibility anywhere it should be in iommu_attach_group().
> 
> It would make sense for iommu_attach_group() to require that the
> DMA_OWNERSHIP is USER or DRIVER_DOMAIN.
> 
> That has a nice symmetry with iommu_attach_device() already requiring
> that the group has a single device. For a driver to use these APIs it
> must ensure security, one way or another.

iommu_attach_device() is supposed to be deprecated and eventually going 
away; I wouldn't look at it too much.

> That is a good idea, but requires understanding what tegra is
> doing. Maybe tegra is that DMA_OWNER_DRIVER_DOMAIN user?
> 
> I wouldn't want to see iommu_attach_group() change the DMA_OWNERSHIP,
> I think ownership is cleaner as a dedicated API. Adding a file * and
> probably the enum to iommu_attach_group() feels weird.

Indeed I wasn't imagining it changing any ownership, just preventing a 
group from being attached to a non-default domain if it contains devices 
bound to different incompatible drivers. Basically just taking the 
existing check that VFIO tries to enforce and formalising it into the 
core API. It's not too far off what we already have around changing the 
default domain type, so there seems to be room for it to all fit 
together quite nicely.

There would still need to be separate enforcement elsewhere to prevent 
new drivers binding *after* a group *has* been attached to an unmanaged 
domain, but again it can still be in those simplest terms. Tying it in 
to userspace and FDs just muddies the water far more than necessary.

Robin.

> We need the dedicated API for the dma_configure op, and keeping
> ownership split from the current domain makes more sense with the
> design in the iommfd RFC.
> 
> Thanks,
> Jason
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15 15:56         ` Jason Gunthorpe
@ 2021-11-15 18:15           ` Christoph Hellwig
  2021-11-15 18:35           ` Robin Murphy
  1 sibling, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2021-11-15 18:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, Christoph Hellwig, Kevin Tian, Chaitanya Kulkarni,
	Ashok Raj, kvm, rafael, Greg Kroah-Hartman, Cornelia Huck,
	Will Deacon, linux-kernel, iommu, Alex Williamson, Jacob jun Pan,
	linux-pci, Bjorn Helgaas, Diana Craciun

On Mon, Nov 15, 2021 at 11:56:13AM -0400, Jason Gunthorpe wrote:
> drivers/base/platform.c:        .dma_configure  = platform_dma_configure,
> drivers/bus/fsl-mc/fsl-mc-bus.c:        .dma_configure  = fsl_mc_dma_configure,
> drivers/pci/pci-driver.c:       .dma_configure  = pci_dma_configure,
> drivers/gpu/host1x/bus.c:       .dma_configure = host1x_dma_configure,
> 
> Other than host1x they all work with VFIO.
> 
> Also, there is no bus->dma_unconfigure() which would be needed to
> restore the device as well.
> 
> So, would you rather see duplicated code into the 4 drivers, and a new
> bus op to 'unconfigure dma'

The tend to mostly call into common helpers eventually.

> 
> Or, a 'dev_configure_dma()' function that is roughly:
> 
>         if (dev->bus->dma_configure) {
>                 ret = dev->bus->dma_configure(dev);
>                 if (ret)
>                         return ret;
>                 if (!drv->suppress_auto_claim_dma_owner) {
>                        ret = iommu_device_set_dma_owner(dev, DMA_OWNER_KERNEL,
>                                                         NULL);
>                        if (ret)
>                                ret;
>                 }
>          }
> 
> And a pair'd undo.

But that seems like an even better idea to me.  Even better with an
early return and avoiding the pointless indentation.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15 17:54           ` Robin Murphy
@ 2021-11-15 18:19             ` Christoph Hellwig
  2021-11-15 18:44               ` Robin Murphy
  2021-11-15 19:22             ` Jason Gunthorpe
  1 sibling, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2021-11-15 18:19 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Chaitanya Kulkarni, Ashok Raj, kvm, rafael, Greg Kroah-Hartman,
	Cornelia Huck, Will Deacon, linux-kernel, iommu, Alex Williamson,
	Jacob jun Pan, linux-pci, Bjorn Helgaas, Diana Craciun

On Mon, Nov 15, 2021 at 05:54:42PM +0000, Robin Murphy wrote:
> > s/PIO/MMIO, but yes basically. And not just data trasnfer but
> > userspace can interfere with the device state as well.
> 
> Sure, but unexpected changes in device state could happen for any number of
> reasons - uncorrected ECC error, surprise removal, etc. - so if that can
> affect "kernel integrity" I'm considering it an independent problem.

Well, most DMA is triggered by the host requesting it through MMIO.
So having access to the BAR can turn many devices into somewhat
arbitrary DMA engines.

> I can see the argument from that angle, but you can equally look at it
> another way and say that a device with kernel ownership is incompatible with
> a kernel driver, if userspace can call write() on "/sys/devices/B/resource0"
> such that device A's kernel driver DMAs all over it. Maybe that particular
> example lands firmly under "just don't do that", but I'd like to figure out
> where exactly we should draw the line between "DMA" and "ability to mess
> with a device".

Userspace writing to the resourceN files with a bound driver is a mive
receipe for trouble.  Do we really allow this currently?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15 15:56         ` Jason Gunthorpe
  2021-11-15 18:15           ` Christoph Hellwig
@ 2021-11-15 18:35           ` Robin Murphy
  2021-11-15 19:39             ` Jason Gunthorpe
  1 sibling, 1 reply; 60+ messages in thread
From: Robin Murphy @ 2021-11-15 18:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Kevin Tian, Chaitanya Kulkarni, Ashok Raj, kvm,
	rafael, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Christoph Hellwig, iommu, Jacob jun Pan, linux-pci,
	Bjorn Helgaas, Will Deacon, Diana Craciun

On 2021-11-15 15:56, Jason Gunthorpe via iommu wrote:
> On Mon, Nov 15, 2021 at 03:37:18PM +0000, Robin Murphy wrote:
> 
>> IOMMUs, and possibly even fewer of them support VFIO, so I'm in full
>> agreement with Greg and Christoph that this absolutely warrants being scoped
>> per-bus. I mean, we literally already have infrastructure to prevent drivers
>> binding if the IOMMU/DMA configuration is broken or not ready yet; why would
>> we want a totally different mechanism to prevent driver binding when the
>> only difference is that that configuration *is* ready and working to the
>> point that someone's already claimed it for other purposes?
> 
> I see, that does make sense
> 
> I see these implementations:
> 
> drivers/amba/bus.c:     .dma_configure  = platform_dma_configure,
> drivers/base/platform.c:        .dma_configure  = platform_dma_configure,
> drivers/bus/fsl-mc/fsl-mc-bus.c:        .dma_configure  = fsl_mc_dma_configure,
> drivers/pci/pci-driver.c:       .dma_configure  = pci_dma_configure,
> drivers/gpu/host1x/bus.c:       .dma_configure = host1x_dma_configure,
> 
> Other than host1x they all work with VFIO.
> 
> Also, there is no bus->dma_unconfigure() which would be needed to
> restore the device as well.

Not if we reduce the notion of "ownership" down to 
"dev->iommu_group->domain != dev->iommu_group->default_domain", which 
I'm becoming increasingly convinced is all we actually need here.

> So, would you rather see duplicated code into the 4 drivers, and a new
> bus op to 'unconfigure dma'

The .dma_configure flow is unavoidably a bit boilerplatey already, so 
personally I'd go for having the implementations call back into a common 
check, similarly to their current flow. That also leaves room for the 
bus code to further refine the outcome based on what it might know, 
which I can particularly imagine for cleverer buses like fsl-mc and 
host1x which can have lots of inside knowledge about how their devices 
may interact.

Robin.

> Or, a 'dev_configure_dma()' function that is roughly:
> 
>          if (dev->bus->dma_configure) {
>                  ret = dev->bus->dma_configure(dev);
>                  if (ret)
>                          return ret;
>                  if (!drv->suppress_auto_claim_dma_owner) {
>                         ret = iommu_device_set_dma_owner(dev, DMA_OWNER_KERNEL,
>                                                          NULL);
>                         if (ret)
>                                 ret;
>                  }
>           }
> 
> And a pair'd undo.
> 
> This is nice because we can enforce dev->bus->dma_configure when doing
> a user bind so everything holds together safely without relying on
> each bus_type to properly implement security.
> 
> Jason
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15 18:19             ` Christoph Hellwig
@ 2021-11-15 18:44               ` Robin Murphy
  0 siblings, 0 replies; 60+ messages in thread
From: Robin Murphy @ 2021-11-15 18:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, Kevin Tian, Chaitanya Kulkarni, Ashok Raj, kvm,
	rafael, Greg Kroah-Hartman, Cornelia Huck, Will Deacon,
	linux-kernel, iommu, Alex Williamson, Jacob jun Pan, linux-pci,
	Bjorn Helgaas, Diana Craciun

On 2021-11-15 18:19, Christoph Hellwig wrote:
> On Mon, Nov 15, 2021 at 05:54:42PM +0000, Robin Murphy wrote:
>>> s/PIO/MMIO, but yes basically. And not just data trasnfer but
>>> userspace can interfere with the device state as well.
>>
>> Sure, but unexpected changes in device state could happen for any number of
>> reasons - uncorrected ECC error, surprise removal, etc. - so if that can
>> affect "kernel integrity" I'm considering it an independent problem.
> 
> Well, most DMA is triggered by the host requesting it through MMIO.
> So having access to the BAR can turn many devices into somewhat
> arbitrary DMA engines.

Yup, but as far as I understand we're talking about the situation where 
the overall group is already attached to the VFIO domain by virtue of 
device A, so any unsolicited DMA by device B could only be to 
userspace's own memory.

>> I can see the argument from that angle, but you can equally look at it
>> another way and say that a device with kernel ownership is incompatible with
>> a kernel driver, if userspace can call write() on "/sys/devices/B/resource0"
>> such that device A's kernel driver DMAs all over it. Maybe that particular
>> example lands firmly under "just don't do that", but I'd like to figure out
>> where exactly we should draw the line between "DMA" and "ability to mess
>> with a device".
> 
> Userspace writing to the resourceN files with a bound driver is a mive
> receipe for trouble.  Do we really allow this currently?

No idea - I just want to make sure we don't get blinkered on VFIO at 
this point and consider the potential problem space fully :)

Robin.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15 17:54           ` Robin Murphy
  2021-11-15 18:19             ` Christoph Hellwig
@ 2021-11-15 19:22             ` Jason Gunthorpe
  2021-11-15 20:58               ` Robin Murphy
  1 sibling, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-15 19:22 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Kevin Tian, Chaitanya Kulkarni, Ashok Raj,
	kvm, rafael, Greg Kroah-Hartman, Cornelia Huck, Will Deacon,
	linux-kernel, iommu, Alex Williamson, Jacob jun Pan, linux-pci,
	Bjorn Helgaas, Diana Craciun

On Mon, Nov 15, 2021 at 05:54:42PM +0000, Robin Murphy wrote:
> On 2021-11-15 16:17, Jason Gunthorpe wrote:
> > On Mon, Nov 15, 2021 at 03:14:49PM +0000, Robin Murphy wrote:
> > 
> > > > If userspace has control of device A and can cause A to issue DMA to
> > > > arbitary DMA addresses then there are certain PCI topologies where A
> > > > can now issue peer to peer DMA and manipulate the MMMIO registers in
> > > > device B.
> > > > 
> > > > A kernel driver on device B is thus subjected to concurrent
> > > > manipulation of the device registers from userspace.
> > > > 
> > > > So, a 'safe' kernel driver is one that can tolerate this, and an
> > > > 'unsafe' driver is one where userspace can break kernel integrity.
> > > 
> > > You mean in the case where the kernel driver is trying to use device B in a
> > > purely PIO mode, such that userspace might potentially be able to interfere
> > > with data being transferred in and out of the kernel?
> > 
> > s/PIO/MMIO, but yes basically. And not just data trasnfer but
> > userspace can interfere with the device state as well.
> 
> Sure, but unexpected changes in device state could happen for any number of
> reasons - uncorrected ECC error, surprise removal, etc. - so if that can
> affect "kernel integrity" I'm considering it an independent problem.

There is a big difference in my mind between a device/HW attacking the
kernel and userspace can attack the kernel. They are both valid cases,
and I know people are also working on the device/HW attacks the kernel
problem.

This series is only about user attacks kernel.

> > > Perhaps it's not so clear to put that under a notion of "DMA
> > > ownership", since device B's DMA is irrelevant and it's really much
> > > more equivalent to /dev/mem access or mmaping BARs to userspace
> > > while a driver is bound.
> > 
> > It is DMA ownership because device A's DMA is what is relevant
> > here. device A's DMA compromises device B. So device A asserts it has
> > USER ownership for DMA.
> > 
> > Any device in a group with USER ownership is incompatible with a
> > kernel driver.
> 
> I can see the argument from that angle, but you can equally look at it
> another way and say that a device with kernel ownership is incompatible with
> a kernel driver, if userspace can call write() on "/sys/devices/B/resource0"
> such that device A's kernel driver DMAs all over it. Maybe that particular
> example lands firmly under "just don't do that", but I'd like to figure out
> where exactly we should draw the line between "DMA" and "ability to mess
> with a device".

The above scenarios are already blocked by the kernel with
LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel
integrity, and these days they almost all have mitigation. I would
consider any kernel integrity violation to be a bug today if
LOCKDOWN_INTEGRITY_MAX is enabled.

I don't know why you bring this up?

> > That has a nice symmetry with iommu_attach_device() already requiring
> > that the group has a single device. For a driver to use these APIs it
> > must ensure security, one way or another.
> 
> iommu_attach_device() is supposed to be deprecated and eventually going
> away; I wouldn't look at it too much.

What is the preference then? This is the only working API today,
right?

> Indeed I wasn't imagining it changing any ownership, just preventing a group
> from being attached to a non-default domain if it contains devices bound to
> different incompatible drivers. 

So this could solve just the domain/DMA API problem, but it leaves the
MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in
a layered way. 

This seems like half an idea, do you have a solution for the rest?

The concept of DMA USER is important here, and it is more than just
which domain is attached.

> Basically just taking the existing check that VFIO tries to enforce
> and formalising it into the core API. It's not too far off what we
> already have around changing the default domain type, so there seems
> to be room for it to all fit together quite nicely.

VFIO also has logic related to the file

> Tying it in to userspace and FDs just muddies the water far more
> than necessary.

It isn't muddying the water, it is providing common security code that
is easy to undertand.

*Which* userspace FD/process owns the iommu_group is important
security information because we can't have process A do DMA attacks on
some other process B.

Before userspace can be allowed to touch the MMIO registers it must
ensure ownership. This is also why the split API makes sense.

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind
  2021-11-15 18:35           ` Robin Murphy
@ 2021-11-15 19:39             ` Jason Gunthorpe
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-15 19:39 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Alex Williamson, Kevin Tian, Chaitanya Kulkarni, Ashok Raj, kvm,
	rafael, Greg Kroah-Hartman, Cornelia Huck, linux-kernel,
	Christoph Hellwig, iommu, Jacob jun Pan, linux-pci,
	Bjorn Helgaas, Will Deacon, Diana Craciun

On Mon, Nov 15, 2021 at 06:35:37PM +0000, Robin Murphy wrote:
> On 2021-11-15 15:56, Jason Gunthorpe via iommu wrote:
> > On Mon, Nov 15, 2021 at 03:37:18PM +0000, Robin Murphy wrote:
> > 
> > > IOMMUs, and possibly even fewer of them support VFIO, so I'm in full
> > > agreement with Greg and Christoph that this absolutely warrants being scoped
> > > per-bus. I mean, we literally already have infrastructure to prevent drivers
> > > binding if the IOMMU/DMA configuration is broken or not ready yet; why would
> > > we want a totally different mechanism to prevent driver binding when the
> > > only difference is that that configuration *is* ready and working to the
> > > point that someone's already claimed it for other purposes?
> > 
> > I see, that does make sense
> > 
> > I see these implementations:
> > 
> > drivers/amba/bus.c:     .dma_configure  = platform_dma_configure,
> > drivers/base/platform.c:        .dma_configure  = platform_dma_configure,
> > drivers/bus/fsl-mc/fsl-mc-bus.c:        .dma_configure  = fsl_mc_dma_configure,
> > drivers/pci/pci-driver.c:       .dma_configure  = pci_dma_configure,
> > drivers/gpu/host1x/bus.c:       .dma_configure = host1x_dma_configure,
> > 
> > Other than host1x they all work with VFIO.
> > 
> > Also, there is no bus->dma_unconfigure() which would be needed to
> > restore the device as well.
> 
> Not if we reduce the notion of "ownership" down to "dev->iommu_group->domain
> != dev->iommu_group->default_domain", which I'm becoming increasingly
> convinced is all we actually need here.

The group will be on the default_domain regardless if a kernel driver
is bound or not, so the number of bound kernel drivers still needs to
be tracked and restored.

> > So, would you rather see duplicated code into the 4 drivers, and a new
> > bus op to 'unconfigure dma'
>
> The .dma_configure flow is unavoidably a bit boilerplatey already, so
> personally I'd go for having the implementations call back into a common
> check, similarly to their current flow. That also leaves room for the bus
> code to further refine the outcome based on what it might know, which I can
> particularly imagine for cleverer buses like fsl-mc and host1x which can
> have lots of inside knowledge about how their devices may interact.

bus specific variation does not fill me with confidence - there should
not be bus specific variation on security principles, especially when
the API is supporting VFIO and the like. 

How can we reason about that?

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-15  2:05 ` [PATCH 01/11] iommu: Add device dma ownership set/release interfaces Lu Baolu
  2021-11-15 13:14   ` Christoph Hellwig
@ 2021-11-15 20:38   ` Bjorn Helgaas
  2021-11-16  1:52     ` Lu Baolu
  1 sibling, 1 reply; 60+ messages in thread
From: Bjorn Helgaas @ 2021-11-15 20:38 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj, Will Deacon, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci, kvm,
	linux-kernel

On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> From the perspective of who is initiating the device to do DMA, device
> DMA could be divided into the following types:
> 
>         DMA_OWNER_KERNEL: kernel device driver intiates the DMA
>         DMA_OWNER_USER: userspace device driver intiates the DMA

s/intiates/initiates/ (twice)

As your first sentence suggests, the driver doesn't actually
*initiate* the DMA in either case.  One of the drivers programs the
device, and the *device* initiates the DMA.

> DMA_OWNER_KERNEL and DMA_OWNER_USER are exclusive for all devices in
> same iommu group as an iommu group is the smallest granularity of device
> isolation and protection that the IOMMU subsystem can guarantee.

I think this basically says DMA_OWNER_KERNEL and DMA_OWNER_USER are
attributes of the iommu_group (not an individual device), and it
applies to all devices in the iommu_group.  Below, you allude to the
fact that the interfaces are per-device.  It's not clear to me why you
made a per-device interface instead of a per-group interface.

> This
> extends the iommu core to enforce this exclusion when devices are
> assigned to userspace.
> 
> Basically two new interfaces are provided:
> 
>         int iommu_device_set_dma_owner(struct device *dev,
>                 enum iommu_dma_owner mode, struct file *user_file);
>         void iommu_device_release_dma_owner(struct device *dev,
>                 enum iommu_dma_owner mode);
> 
> Although above interfaces are per-device, DMA owner is tracked per group
> under the hood. An iommu group cannot have both DMA_OWNER_KERNEL
> and DMA_OWNER_USER set at the same time. Violation of this assumption
> fails iommu_device_set_dma_owner().
> 
> Kernel driver which does DMA have DMA_OWNER_KENREL automatically
> set/released in the driver binding process (see next patch).

s/DMA_OWNER_KENREL/DMA_OWNER_KERNEL/

> Kernel driver which doesn't do DMA should not set the owner type (via a
> new suppress flag in next patch). Device bound to such driver is considered
> same as a driver-less device which is compatible to all owner types.
> 
> Userspace driver framework (e.g. vfio) should set DMA_OWNER_USER for
> a device before the userspace is allowed to access it, plus a fd pointer to
> mark the user identity so a single group cannot be operated by multiple
> users simultaneously. Vice versa, the owner type should be released after
> the user access permission is withdrawn.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/11] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  2021-11-15  2:05 ` [PATCH 04/11] PCI: portdrv: " Lu Baolu
@ 2021-11-15 20:44   ` Bjorn Helgaas
  2021-11-16  7:24     ` Lu Baolu
  0 siblings, 1 reply; 60+ messages in thread
From: Bjorn Helgaas @ 2021-11-15 20:44 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj, Will Deacon, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci, kvm,
	linux-kernel

On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> then all of the downstream devices will be part of the same IOMMU group
> as the bridge.

I think this means something like: "If a PCIe Switch Downstream Port
lacks <a specific set of ACS capabilities>, all downstream devices
will be part of the same IOMMU group as the switch," right?

If so, can you fill in the details to make it specific and concrete?

> As long as the bridge kernel driver doesn't map and
> access any PCI mmio bar, it's safe to bind it to the device in a USER-
> owned group. Hence, safe to suppress the kernel DMA ownership auto-
> claiming.

s/mmio/MMIO/ (also below)
s/bar/BAR/ (also below)

I don't understand what "kernel DMA ownership auto-claiming" means.
Presumably that's explained in previous patches and a code comment
near "suppress_auto_claim_dma_owner".

> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") permitted a
> class of kernel drivers. 

Permitted them to do what?

> This is not always safe. For example, the SHPC
> system design requires that it must be integrated into a PCI-to-PCI
> bridge or a host bridge.

If this SHPC example is important, it would be nice to have a citation
to the spec section that requires this.

> The shpchp_core driver relies on the PCI mmio
> bar access for the controller functionality. Binding it to the device
> belonging to a USER-owned group will allow the user to change the
> controller via p2p transactions which is unknown to the hot-plug driver
> and could lead to some unpredictable consequences.
> 
> Now that we have driver self-declaration of safety we should rely on that.

Can you spell out what drivers are self-declaring?  Are they declaring
that they don't program their devices to do DMA?

> This change may cause regression on some platforms, since all bridges were
> exempted before, but now they have to be manually audited before doing so.
> This is actually the desired outcome anyway.

Please spell out what regression this may cause and how users would
recognize it.  Also, please give a hint about why that is desirable.

> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pcie/portdrv_pci.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 35eca6277a96..1285862a9aa8 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -203,6 +203,8 @@ static struct pci_driver pcie_portdriver = {
>  	.err_handler	= &pcie_portdrv_err_handler,
>  
>  	.driver.pm	= PCIE_PORTDRV_PM_OPS,
> +
> +	.driver.suppress_auto_claim_dma_owner = true,
>  };
>  
>  static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15  2:05 ` [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming Lu Baolu
  2021-11-15 13:21   ` Christoph Hellwig
@ 2021-11-15 20:48   ` Bjorn Helgaas
  2021-11-15 22:17   ` Bjorn Helgaas
  2 siblings, 0 replies; 60+ messages in thread
From: Bjorn Helgaas @ 2021-11-15 20:48 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj, Will Deacon, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci, kvm,
	linux-kernel

On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
> pci_stub allows the admin to block driver binding on a device and make
> it permanently shared with userspace. Since pci_stub does not do DMA,
> it is safe. 

Can you elaborate on what "permanently shared with userspace" means
here?  I assume it's only permanent as long as pci-stub is bound to
the device?

Also, a few words about what "it is safe" means here would be helpful.

> However the admin must understand that using pci_stub allows
> userspace to attack whatever device it was bound to.

The admin isn't going to read this sentence.  Should there be a doc
update related to this?  What sort of attack does this refer to?

> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pci-stub.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..6324c68602b4 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>  	.name		= "pci-stub",
>  	.id_table	= NULL,	/* only dynamic id's */
>  	.probe		= pci_stub_probe,
> +	.driver		= {
> +		.suppress_auto_claim_dma_owner = true,
> +	},
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15 19:22             ` Jason Gunthorpe
@ 2021-11-15 20:58               ` Robin Murphy
  2021-11-15 21:19                 ` Jason Gunthorpe
  0 siblings, 1 reply; 60+ messages in thread
From: Robin Murphy @ 2021-11-15 20:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Kevin Tian, Chaitanya Kulkarni, Ashok Raj,
	kvm, rafael, Greg Kroah-Hartman, Cornelia Huck, Will Deacon,
	linux-kernel, iommu, Alex Williamson, Jacob jun Pan, linux-pci,
	Bjorn Helgaas, Diana Craciun

On 2021-11-15 19:22, Jason Gunthorpe wrote:
> On Mon, Nov 15, 2021 at 05:54:42PM +0000, Robin Murphy wrote:
>> On 2021-11-15 16:17, Jason Gunthorpe wrote:
>>> On Mon, Nov 15, 2021 at 03:14:49PM +0000, Robin Murphy wrote:
>>>
>>>>> If userspace has control of device A and can cause A to issue DMA to
>>>>> arbitary DMA addresses then there are certain PCI topologies where A
>>>>> can now issue peer to peer DMA and manipulate the MMMIO registers in
>>>>> device B.
>>>>>
>>>>> A kernel driver on device B is thus subjected to concurrent
>>>>> manipulation of the device registers from userspace.
>>>>>
>>>>> So, a 'safe' kernel driver is one that can tolerate this, and an
>>>>> 'unsafe' driver is one where userspace can break kernel integrity.
>>>>
>>>> You mean in the case where the kernel driver is trying to use device B in a
>>>> purely PIO mode, such that userspace might potentially be able to interfere
>>>> with data being transferred in and out of the kernel?
>>>
>>> s/PIO/MMIO, but yes basically. And not just data trasnfer but
>>> userspace can interfere with the device state as well.
>>
>> Sure, but unexpected changes in device state could happen for any number of
>> reasons - uncorrected ECC error, surprise removal, etc. - so if that can
>> affect "kernel integrity" I'm considering it an independent problem.
> 
> There is a big difference in my mind between a device/HW attacking the
> kernel and userspace can attack the kernel. They are both valid cases,
> and I know people are also working on the device/HW attacks the kernel
> problem.
> 
> This series is only about user attacks kernel.

Indeed, I was just commenting that if there's any actual attack surface 
for "user makes device go wrong" then that's really a whole other issue. 
I took "device state" to mean any state *other than* what could be used 
to observe and/or subvert the kernel's normal operation of the device. 
Say it's some kind of storage device with some global state bit that 
could be flipped to disable encryption on blocks being written such that 
the medium could be attacked offline later, that's still firmly in my 
"interfering with data transfer" category.

>>>> Perhaps it's not so clear to put that under a notion of "DMA
>>>> ownership", since device B's DMA is irrelevant and it's really much
>>>> more equivalent to /dev/mem access or mmaping BARs to userspace
>>>> while a driver is bound.
>>>
>>> It is DMA ownership because device A's DMA is what is relevant
>>> here. device A's DMA compromises device B. So device A asserts it has
>>> USER ownership for DMA.
>>>
>>> Any device in a group with USER ownership is incompatible with a
>>> kernel driver.
>>
>> I can see the argument from that angle, but you can equally look at it
>> another way and say that a device with kernel ownership is incompatible with
>> a kernel driver, if userspace can call write() on "/sys/devices/B/resource0"
>> such that device A's kernel driver DMAs all over it. Maybe that particular
>> example lands firmly under "just don't do that", but I'd like to figure out
>> where exactly we should draw the line between "DMA" and "ability to mess
>> with a device".
> 
> The above scenarios are already blocked by the kernel with
> LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel
> integrity, and these days they almost all have mitigation. I would
> consider any kernel integrity violation to be a bug today if
> LOCKDOWN_INTEGRITY_MAX is enabled.
> 
> I don't know why you bring this up?

Because as soon as anyone brings up "security" I'm not going to blindly 
accept the implicit assumption that VFIO is the only possible way to get 
one device to mess with another. That was just a silly example in the 
most basic terms, and obviously I don't expect well-worn generic sysfs 
interfaces to be a genuine threat, but how confident are you that no 
other subsystem- or driver-level interfaces past present and future can 
ever be tricked into p2p DMA?

>>> That has a nice symmetry with iommu_attach_device() already requiring
>>> that the group has a single device. For a driver to use these APIs it
>>> must ensure security, one way or another.
>>
>> iommu_attach_device() is supposed to be deprecated and eventually going
>> away; I wouldn't look at it too much.
> 
> What is the preference then? This is the only working API today,
> right?

I believe the intent was that everyone should move to 
iommu_group_get()/iommu_attach_group() - precisely *because* 
iommu_attach_device() can't work sensibly for multi-device groups.

>> Indeed I wasn't imagining it changing any ownership, just preventing a group
>> from being attached to a non-default domain if it contains devices bound to
>> different incompatible drivers.
> 
> So this could solve just the domain/DMA API problem, but it leaves the
> MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in
> a layered way.
> 
> This seems like half an idea, do you have a solution for the rest?

Tell me how the p2p DMA issue can manifest if device A is prohibited 
from attaching to VFIO's unmanaged domain while device B still has a 
driver bound, and thus would fail to be assigned to userspace in the 
first place. And conversely if non-VFIO drivers are still prevented from 
binding to device B while device A remains attached to the VFIO domain.

(Bonus: if so, also tell me how that wouldn't disprove your initial 
argument anyway)

> The concept of DMA USER is important here, and it is more than just
> which domain is attached.

Tell me how a device would be assigned to userspace while its group is 
still attached to a kernel-managed default domain.

As soon as anyone calls iommu_attach_group() - or indeed 
iommu_attach_device() if more devices may end up hotplugged into the 
same group later - *that's* when the door opens for potential subversion 
of any kind, without ever having to leave kernel space.

>> Basically just taking the existing check that VFIO tries to enforce
>> and formalising it into the core API. It's not too far off what we
>> already have around changing the default domain type, so there seems
>> to be room for it to all fit together quite nicely.
> 
> VFIO also has logic related to the file

Yes, because unsurprisingly VFIO code is tailored for the specific case 
of VFIO usage rather than anything more general.

>> Tying it in to userspace and FDs just muddies the water far more
>> than necessary.
> 
> It isn't muddying the water, it is providing common security code that
> is easy to undertand.
> 
> *Which* userspace FD/process owns the iommu_group is important
> security information because we can't have process A do DMA attacks on
> some other process B.

Tell me how a single group could be attached to two domains representing 
two different process address spaces at once.

In case this concept wasn't as clear as I thought, which seems to be so:

                  | dev->iommu_group->domain | dev->driver
------------------------------------------------------------
DMA_OWNER_NONE   |          default         |   unbound
DMA_OWNER_KERNEL |          default         |    bound
DMA_OWNER_USER   |        non-default       |    bound

It's literally that simple. The information's already there. And in a 
more robust form to boot, given that, as before, "user" ownership may 
still exist entirely within kernel space.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15 20:58               ` Robin Murphy
@ 2021-11-15 21:19                 ` Jason Gunthorpe
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-15 21:19 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Kevin Tian, Chaitanya Kulkarni, Ashok Raj,
	kvm, rafael, Greg Kroah-Hartman, Cornelia Huck, Will Deacon,
	linux-kernel, iommu, Alex Williamson, Jacob jun Pan, linux-pci,
	Bjorn Helgaas, Diana Craciun

On Mon, Nov 15, 2021 at 08:58:19PM +0000, Robin Murphy wrote:
> > The above scenarios are already blocked by the kernel with
> > LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel
> > integrity, and these days they almost all have mitigation. I would
> > consider any kernel integrity violation to be a bug today if
> > LOCKDOWN_INTEGRITY_MAX is enabled.
> > 
> > I don't know why you bring this up?
> 
> Because as soon as anyone brings up "security" I'm not going to blindly
> accept the implicit assumption that VFIO is the only possible way to get one
> device to mess with another. That was just a silly example in the most basic
> terms, and obviously I don't expect well-worn generic sysfs interfaces to be
> a genuine threat, but how confident are you that no other subsystem- or
> driver-level interfaces past present and future can ever be tricked into p2p
> DMA?

Given the definition of LOCKDOWN_INTEGRITY_MAX I will consider any
past/present/future p2p attacks as definitive kernel bugs.

Generally, allowing a device to do arbitary DMA to a userspace
controlled address is a pretty serious bug, and directly attacking the
kernel memory is a much more interesting and serious threat vector.

> > What is the preference then? This is the only working API today,
> > right?
> 
> I believe the intent was that everyone should move to
> iommu_group_get()/iommu_attach_group() - precisely *because*
> iommu_attach_device() can't work sensibly for multi-device groups.

And iommu_attach_group() can't work sensibly for anything except VFIO
today, so hum :)

> > > Indeed I wasn't imagining it changing any ownership, just preventing a group
> > > from being attached to a non-default domain if it contains devices bound to
> > > different incompatible drivers.
> > 
> > So this could solve just the domain/DMA API problem, but it leaves the
> > MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in
> > a layered way.
> > 
> > This seems like half an idea, do you have a solution for the rest?
> 
> Tell me how the p2p DMA issue can manifest if device A is prohibited from
> attaching to VFIO's unmanaged domain while device B still has a driver
> bound, and thus would fail to be assigned to userspace in the first place.
> And conversely if non-VFIO drivers are still prevented from binding to
> device B while device A remains attached to the VFIO domain.

You've assumed that a group is continuously attached to the same
domain during the entire period that userspace has MMIO.

Any domain detatch creates a race where a kernel driver can jump in
and bind, while user space continues to have MMIO control over a
device. That violates the security invariant.

Many new flows, like PASID support, are requiring dynamically changing
the domain bound to a group.

If you want to go in this direction then we also need to have some
kind of compare and swap operation for the domain currently bound to a
group.

From a security perspective I disliek this idea a lot. Instead of
having nice clear barriers indicating the security domain we have a
very subtle 'so long as a domain is attached' idea, which is going to
get broken.

> > The concept of DMA USER is important here, and it is more than just
> > which domain is attached.
> 
> Tell me how a device would be assigned to userspace while its group is still
> attached to a kernel-managed default domain.
> 
> As soon as anyone calls iommu_attach_group() - or indeed
> iommu_attach_device() if more devices may end up hotplugged into the same
> group later - *that's* when the door opens for potential subversion of any
> kind, without ever having to leave kernel space.

The approach in this series can solve both, attach_device could switch
the device to user mode and it will block future hot plugged kernel
drivers.

> > VFIO also has logic related to the file
> 
> Yes, because unsurprisingly VFIO code is tailored for the specific case of
> VFIO usage rather than anything more general.

VFIO represents this class of users exposing the IOMMU to userspace,
I say it is general of that use class.

> > It isn't muddying the water, it is providing common security code that
> > is easy to undertand.
> > 
> > *Which* userspace FD/process owns the iommu_group is important
> > security information because we can't have process A do DMA attacks on
> > some other process B.
> 
> Tell me how a single group could be attached to two domains representing two
> different process address spaces at once.

Again you are focused on domains and ignoring MMIO.

Requiring the users of the API to continuously assert a non-default
domain is a non-trivial ask.

> In case this concept wasn't as clear as I thought, which seems to be so:
>
>                  | dev->iommu_group->domain | dev->driver
> DMA_OWNER_NONE   |          default         |   unbound
> DMA_OWNER_KERNEL |          default         |    bound
> DMA_OWNER_USER   |        non-default       |    bound

Unfortunately this can't use dev->driver. Reading dev->driver of every
device in the group requires holding the device_lock. really_probe()
already holds the device lock so this becomes a user triggerable ABBA
deadlock when scaled up to N devices.

This is why this series uses only the group mutex and tracks if
drivers are bound inside the group. I view that as unavoidable.

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15  2:05 ` [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming Lu Baolu
  2021-11-15 13:21   ` Christoph Hellwig
  2021-11-15 20:48   ` Bjorn Helgaas
@ 2021-11-15 22:17   ` Bjorn Helgaas
  2021-11-16  6:05     ` Lu Baolu
  2 siblings, 1 reply; 60+ messages in thread
From: Bjorn Helgaas @ 2021-11-15 22:17 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj, Will Deacon, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci, kvm,
	linux-kernel

On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
> pci_stub allows the admin to block driver binding on a device and make
> it permanently shared with userspace. Since pci_stub does not do DMA,
> it is safe. However the admin must understand that using pci_stub allows
> userspace to attack whatever device it was bound to.

This commit log doesn't say what the patch does.  I think it tells us
something about what pci-stub *already* does ("allows admin to block
driver binding") and something about why that is safe ("does not do
DMA").

But it doesn't say what this patch changes.  Based on the subject
line, I expected something like:

  As of ("<commit subject>"), <some function>() marks the iommu_group
  as containing only devices with kernel drivers that manage DMA.

  Avoid this default behavior for pci-stub because it does not program
  any DMA itself.  This allows <some desirable behavior>.

> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pci-stub.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..6324c68602b4 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>  	.name		= "pci-stub",
>  	.id_table	= NULL,	/* only dynamic id's */
>  	.probe		= pci_stub_probe,
> +	.driver		= {
> +		.suppress_auto_claim_dma_owner = true,
> +	},
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-15 20:38   ` Bjorn Helgaas
@ 2021-11-16  1:52     ` Lu Baolu
  0 siblings, 0 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-16  1:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Kevin Tian, Ashok Raj,
	Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel

Hi Bjorn,

On 11/16/21 4:38 AM, Bjorn Helgaas wrote:
> On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
>>  From the perspective of who is initiating the device to do DMA, device
>> DMA could be divided into the following types:
>>
>>          DMA_OWNER_KERNEL: kernel device driver intiates the DMA
>>          DMA_OWNER_USER: userspace device driver intiates the DMA
> 
> s/intiates/initiates/ (twice)

Yes.

> 
> As your first sentence suggests, the driver doesn't actually
> *initiate* the DMA in either case.  One of the drivers programs the
> device, and the *device* initiates the DMA.

You are right. I could rephrase it as:

"From the perspective of who is controlling the device to do DMA ..."

> 
>> DMA_OWNER_KERNEL and DMA_OWNER_USER are exclusive for all devices in
>> same iommu group as an iommu group is the smallest granularity of device
>> isolation and protection that the IOMMU subsystem can guarantee.
> 
> I think this basically says DMA_OWNER_KERNEL and DMA_OWNER_USER are
> attributes of the iommu_group (not an individual device), and it
> applies to all devices in the iommu_group.  Below, you allude to the
> fact that the interfaces are per-device.  It's not clear to me why you
> made a per-device interface instead of a per-group interface.

Yes, the attributes are of the iommu_group. We have both per-device and
per-iommu_group interfaces. The former is for device drivers and the
latter is only for vfio who has an iommu_group based iommu abstract.

>> This
>> extends the iommu core to enforce this exclusion when devices are
>> assigned to userspace.
>>
>> Basically two new interfaces are provided:
>>
>>          int iommu_device_set_dma_owner(struct device *dev,
>>                  enum iommu_dma_owner mode, struct file *user_file);
>>          void iommu_device_release_dma_owner(struct device *dev,
>>                  enum iommu_dma_owner mode);
>>
>> Although above interfaces are per-device, DMA owner is tracked per group
>> under the hood. An iommu group cannot have both DMA_OWNER_KERNEL
>> and DMA_OWNER_USER set at the same time. Violation of this assumption
>> fails iommu_device_set_dma_owner().
>>
>> Kernel driver which does DMA have DMA_OWNER_KENREL automatically
>> set/released in the driver binding process (see next patch).
> 
> s/DMA_OWNER_KENREL/DMA_OWNER_KERNEL/

Yes. Sorry for the typo.

> 
>> Kernel driver which doesn't do DMA should not set the owner type (via a
>> new suppress flag in next patch). Device bound to such driver is considered
>> same as a driver-less device which is compatible to all owner types.
>>
>> Userspace driver framework (e.g. vfio) should set DMA_OWNER_USER for
>> a device before the userspace is allowed to access it, plus a fd pointer to
>> mark the user identity so a single group cannot be operated by multiple
>> users simultaneously. Vice versa, the owner type should be released after
>> the user access permission is withdrawn.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-15 13:14   ` Christoph Hellwig
@ 2021-11-16  1:57     ` Lu Baolu
  2021-11-16 13:46       ` Jason Gunthorpe
  0 siblings, 1 reply; 60+ messages in thread
From: Lu Baolu @ 2021-11-16  1:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Kevin Tian, Ashok Raj,
	Chaitanya Kulkarni, kvm, rafael, linux-pci, Cornelia Huck,
	linux-kernel, iommu, Jacob jun Pan, Diana Craciun, Will Deacon

Hi Christoph,

On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
>> +enum iommu_dma_owner {
>> +	DMA_OWNER_NONE,
>> +	DMA_OWNER_KERNEL,
>> +	DMA_OWNER_USER,
>> +};
>> +
> 
>> +	enum iommu_dma_owner dma_owner;
>> +	refcount_t owner_cnt;
>> +	struct file *owner_user_file;
> 
> I'd just overload the ownership into owner_user_file,
> 
>   NULL			-> no owner
>   (struct file *)1UL)	-> kernel
>   real pointer		-> user
> 
> Which could simplify a lot of the code dealing with the owner.
> 

Yeah! Sounds reasonable. I will make this in the next version.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
  2021-11-15 22:17   ` Bjorn Helgaas
@ 2021-11-16  6:05     ` Lu Baolu
  0 siblings, 0 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-16  6:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Kevin Tian, Ashok Raj,
	Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel

Hi Bjorn,

On 11/16/21 6:17 AM, Bjorn Helgaas wrote:
> On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
>> pci_stub allows the admin to block driver binding on a device and make
>> it permanently shared with userspace. Since pci_stub does not do DMA,
>> it is safe. However the admin must understand that using pci_stub allows
>> userspace to attack whatever device it was bound to.
> This commit log doesn't say what the patch does.  I think it tells us
> something about what pci-stub*already*  does ("allows admin to block
> driver binding") and something about why that is safe ("does not do
> DMA").

Yes, you are right. This patch is to keep the pci_stub's existing use
case ("allows admin to block driver binding") after moving the viable
check from the vfio to iommu layer (done by this series).

About "safe" (should not be part of this description), there are two
sides from my understanding:

#1) The pci_stub driver itself doesn't control the device to do any DMA.
     So it won't interfere the user space through device DMA.

#2) The pci_stub driver doesn't access the PCI bar and doesn't build any
     device driver state around any value in the bar. So other devices
     in the same iommu group (assigned to user space) have no means to
     change the kernel driver consistency via p2p access.
> 
> But it doesn't say what this patch changes.  Based on the subject
> line, I expected something like:
> 
>    As of ("<commit subject>"), <some function>() marks the iommu_group
>    as containing only devices with kernel drivers that manage DMA.
> 
>    Avoid this default behavior for pci-stub because it does not program
>    any DMA itself.  This allows <some desirable behavior>.
> 

Sure. I will rephrase the description like above.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/11] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  2021-11-15 20:44   ` Bjorn Helgaas
@ 2021-11-16  7:24     ` Lu Baolu
  2021-11-16 20:22       ` Bjorn Helgaas
  0 siblings, 1 reply; 60+ messages in thread
From: Lu Baolu @ 2021-11-16  7:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Kevin Tian, Ashok Raj,
	Will Deacon, rafael, Diana Craciun, Cornelia Huck, Eric Auger,
	Liu Yi L, Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci,
	kvm, linux-kernel

Hi Bjorn,

On 2021/11/16 4:44, Bjorn Helgaas wrote:
> On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
>> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
>> then all of the downstream devices will be part of the same IOMMU group
>> as the bridge.
> 
> I think this means something like: "If a PCIe Switch Downstream Port
> lacks <a specific set of ACS capabilities>, all downstream devices
> will be part of the same IOMMU group as the switch," right?

For this patch, yes.

> 
> If so, can you fill in the details to make it specific and concrete?

The existing vfio implementation allows a kernel driver to bind with a
PCI bridge while its downstream devices are assigned to the user space
though there lacks ACS-like isolation in bridge.

drivers/vfio/vfio.c:
  540 static bool vfio_dev_driver_allowed(struct device *dev,
  541                                     struct device_driver *drv)
  542 {
  543         if (dev_is_pci(dev)) {
  544                 struct pci_dev *pdev = to_pci_dev(dev);
  545
  546                 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
  547                         return true;
  548         }

We are moving the group viability check to IOMMU core, and trying to
make it compatible with the current vfio policy. We saw three types of
bridges:

#1) PCIe/PCI-to-PCI bridges
     These bridges are configured in the PCI framework, there's no
     dedicated driver for such devices.

#2) Generic PCIe switch downstream port
     The port driver doesn't map and access any MMIO in the PCI BAR.
     The iommu group is viable to user even this driver is bound.

#3) Hot Plug Controller
     The controller driver maps and access the device MMIO. The iommu
     group is not viable to user with this driver bound to its device.

>> As long as the bridge kernel driver doesn't map and
>> access any PCI mmio bar, it's safe to bind it to the device in a USER-
>> owned group. Hence, safe to suppress the kernel DMA ownership auto-
>> claiming.
> 
> s/mmio/MMIO/ (also below)
> s/bar/BAR/ (also below)

Sure.

> 
> I don't understand what "kernel DMA ownership auto-claiming" means.
> Presumably that's explained in previous patches and a code comment
> near "suppress_auto_claim_dma_owner".

When a device driver is about to bind with a device, the driver core
will claim the kernel DMA ownership automatically for the driver.

This implies that
- if success, the kernel driver is controlling the device for DMA. Any
   devices sitting in the same iommu group shouldn't be assigned to user.
- if failure, some devices sitting in the same iommu group have already
   been assigned to user space. The driver binding process should abort.

But there are some exceptions where suppress_auto_claim_dma_owner comes
to play.

#1) vfio-like drivers which will assign the devices to user space after
     driver binding;
#2) (compatible with exiting vfio policy) some drivers are allowed to be
     bound with a device while its siblings in the iommu group are
     assigned to user space. Typically, these drivers include
     - pci_stub driver
     - pci bridge drivers

For above drivers, we use driver.suppress_auto_claim_dma_owner as a hint
to tell the driver core to ignore the kernel dma ownership claiming.

> 
>> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") permitted a
>> class of kernel drivers.
> 
> Permitted them to do what?

As I explained above.

> 
>> This is not always safe. For example, the SHPC
>> system design requires that it must be integrated into a PCI-to-PCI
>> bridge or a host bridge.
> 
> If this SHPC example is important, it would be nice to have a citation
> to the spec section that requires this.

I just used it as an example to show that allowing any driver to be
bound to a PCI bridge device in a USER-viable iommu group is too loose.

> 
>> The shpchp_core driver relies on the PCI mmio
>> bar access for the controller functionality. Binding it to the device
>> belonging to a USER-owned group will allow the user to change the
>> controller via p2p transactions which is unknown to the hot-plug driver
>> and could lead to some unpredictable consequences.
>>
>> Now that we have driver self-declaration of safety we should rely on that.
> 
> Can you spell out what drivers are self-declaring?  Are they declaring
> that they don't program their devices to do DMA?

Sure. Set driver.suppress_auto_claim_dma_owner = true.

> 
>> This change may cause regression on some platforms, since all bridges were
>> exempted before, but now they have to be manually audited before doing so.
>> This is actually the desired outcome anyway.
> 
> Please spell out what regression this may cause and how users would
> recognize it.  Also, please give a hint about why that is desirable.

Sure.

Before this series, bridge drivers are always allowed to be bound with
PCI/PCIe bridge which is sitting in an iommu group assigned to user
space. After this, only those drivers which have
suppress_auto_claim_dma_owner set could be done so. Otherwise, the
driver binding or group user assignment will fail.

The criteria of what kind of drivers could have this hint set is "driver
doesn't map and access the MMIO define in the PCI BARs".

> 
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Suggested-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/pci/pcie/portdrv_pci.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
>> index 35eca6277a96..1285862a9aa8 100644
>> --- a/drivers/pci/pcie/portdrv_pci.c
>> +++ b/drivers/pci/pcie/portdrv_pci.c
>> @@ -203,6 +203,8 @@ static struct pci_driver pcie_portdriver = {
>>   	.err_handler	= &pcie_portdrv_err_handler,
>>   
>>   	.driver.pm	= PCIE_PORTDRV_PM_OPS,
>> +
>> +	.driver.suppress_auto_claim_dma_owner = true,
>>   };
>>   
>>   static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
>> -- 
>> 2.25.1
>>

Best regards,
baolu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 05/11] iommu: Add security context management for assigned devices
  2021-11-15 13:22   ` Christoph Hellwig
@ 2021-11-16  7:25     ` Lu Baolu
  0 siblings, 0 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-16  7:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Kevin Tian, Ashok Raj,
	Chaitanya Kulkarni, kvm, rafael, linux-pci, Cornelia Huck,
	linux-kernel, iommu, Jacob jun Pan, Diana Craciun, Will Deacon

On 2021/11/15 21:22, Christoph Hellwig wrote:
> On Mon, Nov 15, 2021 at 10:05:46AM +0800, Lu Baolu wrote:
>> +			/*
>> +			 * The UNMANAGED domain should be detached before all USER
>> +			 * owners have been released.
>> +			 */
> 
> Please avoid comments spilling over 80 characters.

Sure! Thanks!

Best regards,
baolu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 06/11] iommu: Expose group variants of dma ownership interfaces
  2021-11-15 13:27   ` Christoph Hellwig
@ 2021-11-16  9:42     ` Lu Baolu
  0 siblings, 0 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Jason Gunthorpe, Kevin Tian, Ashok Raj,
	Chaitanya Kulkarni, kvm, rafael, linux-pci, Cornelia Huck,
	linux-kernel, iommu, Jacob jun Pan, Diana Craciun, Will Deacon

Hi Christoph,

On 2021/11/15 21:27, Christoph Hellwig wrote:
> On Mon, Nov 15, 2021 at 10:05:47AM +0800, Lu Baolu wrote:
>> The vfio needs to set DMA_OWNER_USER for the entire group when attaching
> 
> The vfio subsystem?  driver?

"vfio subsystem"

> 
>> it to a vfio container. So expose group variants of setting/releasing dma
>> ownership for this purpose.
>>
>> This also exposes the helper iommu_group_dma_owner_unclaimed() for vfio
>> report to userspace if the group is viable to user assignment, for
> 
> .. for vfio to report .. ?

Yes.

> 
>>   void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner);
>> +int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner,
>> +			      struct file *user_file);
>> +void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner);
> 
> Pleae avoid all these overly long lines.

Sure. Thanks!

> 
>> +static inline int iommu_group_set_dma_owner(struct iommu_group *group,
>> +					    enum iommu_dma_owner owner,
>> +					    struct file *user_file)
>> +{
>> +	return -EINVAL;
>> +}
>> +
>> +static inline void iommu_group_release_dma_owner(struct iommu_group *group,
>> +						 enum iommu_dma_owner owner)
>> +{
>> +}
>> +
>> +static inline bool iommu_group_dma_owner_unclaimed(struct iommu_group *group)
>> +{
>> +	return false;
>> +}
> 
> Why do we need these stubs?  All potential callers should already
> require CONFIG_IOMMU_API?  Same for the helpers added in patch 1, btw.

You are right. This helper is only for vfio which requires IOMMU_API. I
will remove this.

The helpers in patch 1 seem not the same. The driver core (or bus
dma_configure() callback as suggested) will also call them.

> 
>> +	mutex_lock(&group->mutex);
>> +	ret = __iommu_group_set_dma_owner(group, owner, user_file);
>> +	mutex_unlock(&group->mutex);
> 
>> +	mutex_lock(&group->mutex);
>> +	__iommu_group_release_dma_owner(group, owner);
>> +	mutex_unlock(&group->mutex);
> 
> Unless I'm missing something (just skipping over the patches),
> the existing callers also take the lock just around these calls,
> so we don't really need the __-prefixed lowlevel helpers.
> 

Move mutex_lock/unlock will make the helper implementation easier. :-)
It seems to be common code style in iommu core. For example,
__iommu_attach_group(), __iommu_group_for_each_dev(), etc.

>> +	mutex_lock(&group->mutex);
>> +	owner = group->dma_owner;
>> +	mutex_unlock(&group->mutex);
> 
> No need for a lock to read a single scalar.

Adding the lock will make kcasn happy. Jason G also told me that

[citing from his review comment]
"
It is always incorrect to read concurrent data without an annotation
of some kind.

For instance it can cause mis-execution of logic where the compiler is
unaware that a value it loads is allowed to change - ie no 
READ_ONCE/WRITE_ONCE semantic.
"

> 
>> +
>> +	return owner == DMA_OWNER_NONE;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_group_dma_owner_unclaimed);

Best regards,
baolu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-16  1:57     ` Lu Baolu
@ 2021-11-16 13:46       ` Jason Gunthorpe
  2021-11-17  5:22         ` Lu Baolu
  2021-11-18  2:39         ` Tian, Kevin
  0 siblings, 2 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-16 13:46 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Kevin Tian, Ashok Raj,
	Chaitanya Kulkarni, kvm, rafael, linux-pci, Cornelia Huck,
	linux-kernel, iommu, Jacob jun Pan, Diana Craciun, Will Deacon

On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> Hi Christoph,
> 
> On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > +enum iommu_dma_owner {
> > > +	DMA_OWNER_NONE,
> > > +	DMA_OWNER_KERNEL,
> > > +	DMA_OWNER_USER,
> > > +};
> > > +
> > 
> > > +	enum iommu_dma_owner dma_owner;
> > > +	refcount_t owner_cnt;
> > > +	struct file *owner_user_file;
> > 
> > I'd just overload the ownership into owner_user_file,
> > 
> >   NULL			-> no owner
> >   (struct file *)1UL)	-> kernel
> >   real pointer		-> user
> > 
> > Which could simplify a lot of the code dealing with the owner.
> > 
> 
> Yeah! Sounds reasonable. I will make this in the next version.

It would be good to figure out how to make iommu_attach_device()
enforce no other driver binding as a kernel user without a file *, as
Robin pointed to, before optimizing this.

This fixes an existing bug where iommu_attach_device() only checks the
group size and is vunerable to a hot plug increasing the group size
after it returns. That check should be replaced by this series's logic
instead.

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/11] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  2021-11-16  7:24     ` Lu Baolu
@ 2021-11-16 20:22       ` Bjorn Helgaas
  2021-11-16 20:48         ` Jason Gunthorpe
  0 siblings, 1 reply; 60+ messages in thread
From: Bjorn Helgaas @ 2021-11-16 20:22 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Alex Williamson, Bjorn Helgaas,
	Jason Gunthorpe, Kevin Tian, Ashok Raj, Will Deacon, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci, kvm,
	linux-kernel

On Tue, Nov 16, 2021 at 03:24:29PM +0800, Lu Baolu wrote:
> On 2021/11/16 4:44, Bjorn Helgaas wrote:
> > On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
> > > IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> > > then all of the downstream devices will be part of the same IOMMU group
> > > as the bridge.
> > 
> > I think this means something like: "If a PCIe Switch Downstream Port
> > lacks <a specific set of ACS capabilities>, all downstream devices
> > will be part of the same IOMMU group as the switch," right?
> 
> For this patch, yes.
> 
> > If so, can you fill in the details to make it specific and concrete?
> 
> The existing vfio implementation allows a kernel driver to bind with a
> PCI bridge while its downstream devices are assigned to the user space
> though there lacks ACS-like isolation in bridge.
> 
> drivers/vfio/vfio.c:
>  540 static bool vfio_dev_driver_allowed(struct device *dev,
>  541                                     struct device_driver *drv)
>  542 {
>  543         if (dev_is_pci(dev)) {
>  544                 struct pci_dev *pdev = to_pci_dev(dev);
>  545
>  546                 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>  547                         return true;
>  548         }
> 
> We are moving the group viability check to IOMMU core, and trying to
> make it compatible with the current vfio policy. We saw three types of
> bridges:
> 
> #1) PCIe/PCI-to-PCI bridges
>     These bridges are configured in the PCI framework, there's no
>     dedicated driver for such devices.
> 
> #2) Generic PCIe switch downstream port
>     The port driver doesn't map and access any MMIO in the PCI BAR.
>     The iommu group is viable to user even this driver is bound.
> 
> #3) Hot Plug Controller
>     The controller driver maps and access the device MMIO. The iommu
>     group is not viable to user with this driver bound to its device.

I *guess* the question here is whether the bridge can or will do DMA?

I think that's orthogonal to the question of whether it implements
BARs, so I'm not sure why the MMIO BARs are part of this discussion.
I assume it's theoretically possible for a driver to use registers in
config space to program a device to do DMA, even if the device has no
BARs.

Bjorn

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/11] PCI: portdrv: Suppress kernel DMA ownership auto-claiming
  2021-11-16 20:22       ` Bjorn Helgaas
@ 2021-11-16 20:48         ` Jason Gunthorpe
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-16 20:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Alex Williamson,
	Bjorn Helgaas, Kevin Tian, Ashok Raj, Will Deacon, rafael,
	Diana Craciun, Cornelia Huck, Eric Auger, Liu Yi L,
	Jacob jun Pan, Chaitanya Kulkarni, iommu, linux-pci, kvm,
	linux-kernel

On Tue, Nov 16, 2021 at 02:22:01PM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 16, 2021 at 03:24:29PM +0800, Lu Baolu wrote:
> > On 2021/11/16 4:44, Bjorn Helgaas wrote:
> > > On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
> > > > IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> > > > then all of the downstream devices will be part of the same IOMMU group
> > > > as the bridge.
> > > 
> > > I think this means something like: "If a PCIe Switch Downstream Port
> > > lacks <a specific set of ACS capabilities>, all downstream devices
> > > will be part of the same IOMMU group as the switch," right?
> > 
> > For this patch, yes.
> > 
> > > If so, can you fill in the details to make it specific and concrete?
> > 
> > The existing vfio implementation allows a kernel driver to bind with a
> > PCI bridge while its downstream devices are assigned to the user space
> > though there lacks ACS-like isolation in bridge.
> > 
> > drivers/vfio/vfio.c:
> >  540 static bool vfio_dev_driver_allowed(struct device *dev,
> >  541                                     struct device_driver *drv)
> >  542 {
> >  543         if (dev_is_pci(dev)) {
> >  544                 struct pci_dev *pdev = to_pci_dev(dev);
> >  545
> >  546                 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> >  547                         return true;
> >  548         }
> > 
> > We are moving the group viability check to IOMMU core, and trying to
> > make it compatible with the current vfio policy. We saw three types of
> > bridges:
> > 
> > #1) PCIe/PCI-to-PCI bridges
> >     These bridges are configured in the PCI framework, there's no
> >     dedicated driver for such devices.
> > 
> > #2) Generic PCIe switch downstream port
> >     The port driver doesn't map and access any MMIO in the PCI BAR.
> >     The iommu group is viable to user even this driver is bound.
> > 
> > #3) Hot Plug Controller
> >     The controller driver maps and access the device MMIO. The iommu
> >     group is not viable to user with this driver bound to its device.
> 
> I *guess* the question here is whether the bridge can or will do DMA?
> I think that's orthogonal to the question of whether it implements
> BARs, so I'm not sure why the MMIO BARs are part of this discussion.
> I assume it's theoretically possible for a driver to use registers in
> config space to program a device to do DMA, even if the device has no
> BARs.

There are two questions Lu is trying to get at:

 1) Does the bridge driver use DMA? Calling pci_set_master() or
    a dma_map_* API is a sure indicate the driver is doing DMA

    Kernel DMA doesn't work if the PCI device is attached to
    non-default iommu domain.

 2) If the bridge driver uses MMIO, is it tolerant to hostile
    userspace also touching the same MMIO registers via P2P DMA
    attacks?

    Conservatively if the driver maps a MMIO region at all
    we can say it fails this test.

Unless someone want to do the audit work, identifying MMIO usage alone
is sufficient to disqualify a driver.

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-16 13:46       ` Jason Gunthorpe
@ 2021-11-17  5:22         ` Lu Baolu
  2021-11-17 13:35           ` Jason Gunthorpe
  2021-11-18  2:39         ` Tian, Kevin
  1 sibling, 1 reply; 60+ messages in thread
From: Lu Baolu @ 2021-11-17  5:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Christoph Hellwig, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Kevin Tian, Ashok Raj,
	Chaitanya Kulkarni, kvm, rafael, linux-pci, Cornelia Huck,
	linux-kernel, iommu, Jacob jun Pan, Diana Craciun, Will Deacon

Hi Jason,

On 11/16/21 9:46 PM, Jason Gunthorpe wrote:
> On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
>> Hi Christoph,
>>
>> On 11/15/21 9:14 PM, Christoph Hellwig wrote:
>>> On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
>>>> +enum iommu_dma_owner {
>>>> +	DMA_OWNER_NONE,
>>>> +	DMA_OWNER_KERNEL,
>>>> +	DMA_OWNER_USER,
>>>> +};
>>>> +
>>>
>>>> +	enum iommu_dma_owner dma_owner;
>>>> +	refcount_t owner_cnt;
>>>> +	struct file *owner_user_file;
>>>
>>> I'd just overload the ownership into owner_user_file,
>>>
>>>    NULL			-> no owner
>>>    (struct file *)1UL)	-> kernel
>>>    real pointer		-> user
>>>
>>> Which could simplify a lot of the code dealing with the owner.
>>>
>>
>> Yeah! Sounds reasonable. I will make this in the next version.
> 
> It would be good to figure out how to make iommu_attach_device()
> enforce no other driver binding as a kernel user without a file *, as
> Robin pointed to, before optimizing this.
> 
> This fixes an existing bug where iommu_attach_device() only checks the
> group size and is vunerable to a hot plug increasing the group size
> after it returns. That check should be replaced by this series's logic
> instead.

As my my understanding, the essence of this problem is that only the
user owner of the iommu_group could attach an UNMANAGED domain to it.
If I understand it right, how about introducing a new interface to
allocate a user managed domain and storing the user file pointer in it.

--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -94,6 +94,7 @@ struct iommu_domain {
         void *handler_token;
         struct iommu_domain_geometry geometry;
         struct iommu_dma_cookie *iova_cookie;
+       struct file *owner_user_file;
  };
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1902,6 +1902,18 @@ struct iommu_domain *iommu_domain_alloc(struct 
bus_type *bus)
  }
  EXPORT_SYMBOL_GPL(iommu_domain_alloc);

+struct iommu_domain *iommu_domain_alloc_user(struct bus_type *bus,
+                                            struct file *filep)
+{
+       struct iommu_domain *domain;
+
+       domain = __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED);
+       if (domain)
+               domain->owner_user_file = filep;
+
+       return domain;
+}

When attaching a domain to an user-owned iommu_group, both group and
domain should have matched user fd.

Does above help here?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-17  5:22         ` Lu Baolu
@ 2021-11-17 13:35           ` Jason Gunthorpe
  2021-11-18  1:12             ` Lu Baolu
  0 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-17 13:35 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Kevin Tian, Ashok Raj,
	Chaitanya Kulkarni, kvm, rafael, linux-pci, Cornelia Huck,
	linux-kernel, iommu, Jacob jun Pan, Diana Craciun, Will Deacon

On Wed, Nov 17, 2021 at 01:22:19PM +0800, Lu Baolu wrote:
> Hi Jason,
> 
> On 11/16/21 9:46 PM, Jason Gunthorpe wrote:
> > On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> > > Hi Christoph,
> > > 
> > > On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > > > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > > > +enum iommu_dma_owner {
> > > > > +	DMA_OWNER_NONE,
> > > > > +	DMA_OWNER_KERNEL,
> > > > > +	DMA_OWNER_USER,
> > > > > +};
> > > > > +
> > > > 
> > > > > +	enum iommu_dma_owner dma_owner;
> > > > > +	refcount_t owner_cnt;
> > > > > +	struct file *owner_user_file;
> > > > 
> > > > I'd just overload the ownership into owner_user_file,
> > > > 
> > > >    NULL			-> no owner
> > > >    (struct file *)1UL)	-> kernel
> > > >    real pointer		-> user
> > > > 
> > > > Which could simplify a lot of the code dealing with the owner.
> > > > 
> > > 
> > > Yeah! Sounds reasonable. I will make this in the next version.
> > 
> > It would be good to figure out how to make iommu_attach_device()
> > enforce no other driver binding as a kernel user without a file *, as
> > Robin pointed to, before optimizing this.
> > 
> > This fixes an existing bug where iommu_attach_device() only checks the
> > group size and is vunerable to a hot plug increasing the group size
> > after it returns. That check should be replaced by this series's logic
> > instead.
> 
> As my my understanding, the essence of this problem is that only the
> user owner of the iommu_group could attach an UNMANAGED domain to it.
> If I understand it right, how about introducing a new interface to
> allocate a user managed domain and storing the user file pointer in it.

For iommu_attach_device() the semantic is simple non-sharing, so there
is no need for the file * at all, it can just be NULL.

> Does above help here?

No, iommu_attach_device() is kernel only and should not interact with
userspace.

I'm also going to see if I can learn what Tegra is doing with
iommu_attach_group()

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-17 13:35           ` Jason Gunthorpe
@ 2021-11-18  1:12             ` Lu Baolu
  2021-11-18 14:10               ` Jason Gunthorpe
  0 siblings, 1 reply; 60+ messages in thread
From: Lu Baolu @ 2021-11-18  1:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Christoph Hellwig, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Kevin Tian, Ashok Raj,
	Chaitanya Kulkarni, kvm, rafael, linux-pci, Cornelia Huck,
	linux-kernel, iommu, Jacob jun Pan, Diana Craciun, Will Deacon

On 11/17/21 9:35 PM, Jason Gunthorpe wrote:
> On Wed, Nov 17, 2021 at 01:22:19PM +0800, Lu Baolu wrote:
>> Hi Jason,
>>
>> On 11/16/21 9:46 PM, Jason Gunthorpe wrote:
>>> On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
>>>> Hi Christoph,
>>>>
>>>> On 11/15/21 9:14 PM, Christoph Hellwig wrote:
>>>>> On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
>>>>>> +enum iommu_dma_owner {
>>>>>> +	DMA_OWNER_NONE,
>>>>>> +	DMA_OWNER_KERNEL,
>>>>>> +	DMA_OWNER_USER,
>>>>>> +};
>>>>>> +
>>>>>
>>>>>> +	enum iommu_dma_owner dma_owner;
>>>>>> +	refcount_t owner_cnt;
>>>>>> +	struct file *owner_user_file;
>>>>>
>>>>> I'd just overload the ownership into owner_user_file,
>>>>>
>>>>>     NULL			-> no owner
>>>>>     (struct file *)1UL)	-> kernel
>>>>>     real pointer		-> user
>>>>>
>>>>> Which could simplify a lot of the code dealing with the owner.
>>>>>
>>>>
>>>> Yeah! Sounds reasonable. I will make this in the next version.
>>>
>>> It would be good to figure out how to make iommu_attach_device()
>>> enforce no other driver binding as a kernel user without a file *, as
>>> Robin pointed to, before optimizing this.
>>>
>>> This fixes an existing bug where iommu_attach_device() only checks the
>>> group size and is vunerable to a hot plug increasing the group size
>>> after it returns. That check should be replaced by this series's logic
>>> instead.
>>
>> As my my understanding, the essence of this problem is that only the
>> user owner of the iommu_group could attach an UNMANAGED domain to it.
>> If I understand it right, how about introducing a new interface to
>> allocate a user managed domain and storing the user file pointer in it.
> 
> For iommu_attach_device() the semantic is simple non-sharing, so there
> is no need for the file * at all, it can just be NULL.

The file * being NULL means the device is only owned by the kernel
driver. Perhaps we can check this pointer in iommu_attach_device() to
avoid using it for user domain attachment.

> 
>> Does above help here?
> 
> No, iommu_attach_device() is kernel only and should not interact with
> userspace.

The existing iommu_attach_device() allows only for singleton group. As
we have added group ownership attribute, we can enforce this interface
only for kernel domain usage.

> 
> I'm also going to see if I can learn what Tegra is doing with
> iommu_attach_group()

Okay! Thank you!

> 
> Jason
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-16 13:46       ` Jason Gunthorpe
  2021-11-17  5:22         ` Lu Baolu
@ 2021-11-18  2:39         ` Tian, Kevin
  2021-11-18 13:33           ` Jason Gunthorpe
  1 sibling, 1 reply; 60+ messages in thread
From: Tian, Kevin @ 2021-11-18  2:39 UTC (permalink / raw)
  To: Jason Gunthorpe, Lu Baolu
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Raj, Ashok, Chaitanya Kulkarni,
	kvm, rafael, linux-pci, Cornelia Huck, linux-kernel, iommu, Pan,
	Jacob jun, Diana Craciun, Will Deacon

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, November 16, 2021 9:46 PM
> 
> On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> > Hi Christoph,
> >
> > On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > > +enum iommu_dma_owner {
> > > > +	DMA_OWNER_NONE,
> > > > +	DMA_OWNER_KERNEL,
> > > > +	DMA_OWNER_USER,
> > > > +};
> > > > +
> > >
> > > > +	enum iommu_dma_owner dma_owner;
> > > > +	refcount_t owner_cnt;
> > > > +	struct file *owner_user_file;
> > >
> > > I'd just overload the ownership into owner_user_file,
> > >
> > >   NULL			-> no owner
> > >   (struct file *)1UL)	-> kernel
> > >   real pointer		-> user
> > >
> > > Which could simplify a lot of the code dealing with the owner.
> > >
> >
> > Yeah! Sounds reasonable. I will make this in the next version.
> 
> It would be good to figure out how to make iommu_attach_device()
> enforce no other driver binding as a kernel user without a file *, as
> Robin pointed to, before optimizing this.
> 
> This fixes an existing bug where iommu_attach_device() only checks the
> group size and is vunerable to a hot plug increasing the group size
> after it returns. That check should be replaced by this series's logic
> instead.
> 

I think this existing bug in iommu_attach_devce() is different from 
what this series is attempting to solve. To avoid breaking singleton
group assumption there the ideal band-aid is to fail device hotplug.
Otherwise some IOVA ranges which are supposed to go upstream 
to IOMMU may be considered as p2p and routed to the hotplugged
device instead. In concept a singleton group is different from a
multi-devices group which has only one device bound to driver...

This series aims to avoid conflict having both user and kernel drivers
mixed in a multi-devices group.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-18  2:39         ` Tian, Kevin
@ 2021-11-18 13:33           ` Jason Gunthorpe
  2021-11-19  5:44             ` Tian, Kevin
  0 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-18 13:33 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Lu Baolu, Christoph Hellwig, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Raj, Ashok, Chaitanya Kulkarni,
	kvm, rafael, linux-pci, Cornelia Huck, linux-kernel, iommu, Pan,
	Jacob jun, Diana Craciun, Will Deacon

On Thu, Nov 18, 2021 at 02:39:45AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, November 16, 2021 9:46 PM
> > 
> > On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> > > Hi Christoph,
> > >
> > > On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > > > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > > > +enum iommu_dma_owner {
> > > > > +	DMA_OWNER_NONE,
> > > > > +	DMA_OWNER_KERNEL,
> > > > > +	DMA_OWNER_USER,
> > > > > +};
> > > > > +
> > > >
> > > > > +	enum iommu_dma_owner dma_owner;
> > > > > +	refcount_t owner_cnt;
> > > > > +	struct file *owner_user_file;
> > > >
> > > > I'd just overload the ownership into owner_user_file,
> > > >
> > > >   NULL			-> no owner
> > > >   (struct file *)1UL)	-> kernel
> > > >   real pointer		-> user
> > > >
> > > > Which could simplify a lot of the code dealing with the owner.
> > > >
> > >
> > > Yeah! Sounds reasonable. I will make this in the next version.
> > 
> > It would be good to figure out how to make iommu_attach_device()
> > enforce no other driver binding as a kernel user without a file *, as
> > Robin pointed to, before optimizing this.
> > 
> > This fixes an existing bug where iommu_attach_device() only checks the
> > group size and is vunerable to a hot plug increasing the group size
> > after it returns. That check should be replaced by this series's logic
> > instead.
> > 
> 
> I think this existing bug in iommu_attach_devce() is different from 
> what this series is attempting to solve. To avoid breaking singleton
> group assumption there the ideal band-aid is to fail device hotplug.
> Otherwise some IOVA ranges which are supposed to go upstream 
> to IOMMU may be considered as p2p and routed to the hotplugged
> device instead.

Yes, but the instability of the reserved regions during hotplug with
!ACS seems like an entirely different problem. It affects everything,
including VFIO, and multi-device groups. Certainly it is nothing to do
with this series.

> In concept a singleton group is different from a
> multi-devices group which has only one device bound to driver...

Really? Why? I don't see it that way..

A singleton group is just a multi-device group that hasn't been
hotplugged yet.

We don't seem to have the concept of a "true" singleton group which is
permanently single due to HW features.

> This series aims to avoid conflict having both user and kernel drivers
> mixed in a multi-devices group.

I see this series about bringing order to all the places that want to
use a non-default domain - in-kernel or user doesn't really matter.

ie why shouldn't iommu_attach_device() work in a group that has a PCI
bridge, just like VFIO does?

The only thing that is special about VFIO vs a kernel driver is we
want a little help to track userspace ownership and VFIO opens
userspace to do the P2P attack.

The way I see it the num device == 1 test in iommu_attach_device() is
an imperfect way of controlling driver binding, and we can do better
by using the mechanism in this series.

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-18  1:12             ` Lu Baolu
@ 2021-11-18 14:10               ` Jason Gunthorpe
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-18 14:10 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Kevin Tian, Ashok Raj,
	Chaitanya Kulkarni, kvm, rafael, linux-pci, Cornelia Huck,
	linux-kernel, iommu, Jacob jun Pan, Diana Craciun, Will Deacon

On Thu, Nov 18, 2021 at 09:12:41AM +0800, Lu Baolu wrote:
> The existing iommu_attach_device() allows only for singleton group. As
> we have added group ownership attribute, we can enforce this interface
> only for kernel domain usage.

Below is what I came up with.
 - Replace the file * with a simple void *

 - Use owner_count == 0 <-> dma_owner == DMA_OWNER to simplify
    the logic and remove levels of indent

 - Add a kernel state DMA_OWNER_PRIVATE_DOMAIN

 - Rename the user state to DMA_OWNER_PRIVATE_DOMAIN_USER

   It differs from the above because it does extra work to keep the
   group isolated that kernel users do no need to do.
 
 - Rename the kernel state to DMA_OWNER_DMA_API to better reflect
   its purpose. Inspired by Robin's point that alot of this is
   indirectly coupled to the domain pointer.

 - Have iommu_attach_device() atomically swap from DMA_OWNER_DMA_API
   to DMA_OWNER_PRIVATE_DOMAIN - replaces the group size check.

When we figure out tegra we can add an WARN_ON to iommu_attach_group()
that dma_owner != DMA_OWNER_NONE || DMA_OWNER_DMA_API

Then the whole thing makes some general sense..

Jason

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 064d0679906afd..4cafe074775e30 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -49,7 +49,7 @@ struct iommu_group {
 	struct list_head entry;
 	enum iommu_dma_owner dma_owner;
 	refcount_t owner_cnt;
-	struct file *owner_user_file;
+	void *owner_cookie;
 };
 
 struct group_device {
@@ -1937,12 +1937,18 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 	 * change while we are attaching
 	 */
 	mutex_lock(&group->mutex);
-	ret = -EINVAL;
-	if (iommu_group_device_count(group) != 1)
+	if (group->dma_owner != DMA_OWNER_DMA_API ||
+	    refcount_read(&group->owner_cnt) != 1) {
+		ret = -EBUSY;
 		goto out_unlock;
+	}
 
 	ret = __iommu_attach_group(domain, group);
+	if (ret)
+		goto out_unlock;
 
+	group->dma_owner = DMA_OWNER_PRIVATE_DOMAIN;
+	group->owner_cookie = domain;
 out_unlock:
 	mutex_unlock(&group->mutex);
 	iommu_group_put(group);
@@ -2193,14 +2199,11 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 		return;
 
 	mutex_lock(&group->mutex);
-	if (iommu_group_device_count(group) != 1) {
-		WARN_ON(1);
-		goto out_unlock;
-	}
-
+	WARN_ON(group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN ||
+		refcount_read(&group->owner_cnt) != 1 ||
+		group->owner_cookie != domain);
+	group->dma_owner = DMA_OWNER_DMA_API;
 	__iommu_detach_group(domain, group);
-
-out_unlock:
 	mutex_unlock(&group->mutex);
 	iommu_group_put(group);
 }
@@ -3292,44 +3295,33 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 
 static int __iommu_group_set_dma_owner(struct iommu_group *group,
 				       enum iommu_dma_owner owner,
-				       struct file *user_file)
+				       void *owner_cookie)
 {
-	if (group->dma_owner != DMA_OWNER_NONE && group->dma_owner != owner)
-		return -EBUSY;
-
-	if (owner == DMA_OWNER_USER) {
-		if (!user_file)
-			return -EINVAL;
-
-		if (group->owner_user_file && group->owner_user_file != user_file)
-			return -EPERM;
+	if (refcount_inc_not_zero(&group->owner_cnt)) {
+		if (group->dma_owner != owner ||
+		    group->owner_cookie != owner_cookie) {
+			refcount_dec(&group->owner_cnt);
+			return -EBUSY;
+		}
+		return 0;
 	}
 
-	if (!refcount_inc_not_zero(&group->owner_cnt)) {
-		group->dma_owner = owner;
-		refcount_set(&group->owner_cnt, 1);
-
-		if (owner == DMA_OWNER_USER) {
-			/*
-			 * The UNMANAGED domain shouldn't be attached before
-			 * claiming the USER ownership for the first time.
-			 */
-			if (group->domain) {
-				if (group->domain != group->default_domain) {
-					group->dma_owner = DMA_OWNER_NONE;
-					refcount_set(&group->owner_cnt, 0);
-
-					return -EBUSY;
-				}
-
-				__iommu_detach_group(group->domain, group);
-			}
-
-			get_file(user_file);
-			group->owner_user_file = user_file;
+	/*
+	 * We must ensure that any device DMAs issued after this call
+	 * are discarded. DMAs can only reach real memory once someone
+	 * has attached a real domain.
+	 */
+	if (owner == DMA_OWNER_PRIVATE_DOMAIN_USER) {
+		if (group->domain) {
+			if (group->domain != group->default_domain)
+				return -EBUSY;
+			__iommu_detach_group(group->domain, group);
 		}
 	}
 
+	group->dma_owner = owner;
+	group->owner_cookie = owner_cookie;
+	refcount_set(&group->owner_cnt, 1);
 	return 0;
 }
 
@@ -3339,20 +3331,18 @@ static void __iommu_group_release_dma_owner(struct iommu_group *group,
 	if (WARN_ON(group->dma_owner != owner))
 		return;
 
-	if (refcount_dec_and_test(&group->owner_cnt)) {
-		group->dma_owner = DMA_OWNER_NONE;
+	if (!refcount_dec_and_test(&group->owner_cnt))
+		return;
 
-		if (owner == DMA_OWNER_USER) {
-			fput(group->owner_user_file);
-			group->owner_user_file = NULL;
+	group->dma_owner = DMA_OWNER_NONE;
 
-			/*
-			 * The UNMANAGED domain should be detached before all USER
-			 * owners have been released.
-			 */
-			if (!WARN_ON(group->domain) && group->default_domain)
-				__iommu_attach_group(group->default_domain, group);
-		}
+	/*
+	 * The UNMANAGED domain should be detached before all USER
+	 * owners have been released.
+	 */
+	if (owner == DMA_OWNER_PRIVATE_DOMAIN_USER) {
+		if (!WARN_ON(group->domain) && group->default_domain)
+			__iommu_attach_group(group->default_domain, group);
 	}
 }
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d8946f22edd5df..7f50dfa7207e9c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -164,14 +164,21 @@ enum iommu_dev_features {
 
 /**
  * enum iommu_dma_owner - IOMMU DMA ownership
- * @DMA_OWNER_NONE: No DMA ownership
- * @DMA_OWNER_KERNEL: Device DMAs are initiated by a kernel driver
- * @DMA_OWNER_USER: Device DMAs are initiated by a userspace driver
+ * @DMA_OWNER_NONE:
+ *  No DMA ownership
+ * @DMA_OWNER_DMA_API:
+ *  Device DMAs are initiated by a kernel driver through the DMA API
+ * @DMA_OWNER_PRIVATE_DOMAIN:
+ *  Device DMAs are initiated by a kernel driver
+ * @DMA_OWNER_PRIVATE_DOMAIN_USER:
+ *  Device DMAs are initiated by userspace, kernel ensures that DMAs
+ *  never go to kernel memory.
  */
 enum iommu_dma_owner {
 	DMA_OWNER_NONE,
-	DMA_OWNER_KERNEL,
-	DMA_OWNER_USER,
+	DMA_OWNER_DMA_API,
+	DMA_OWNER_PRIVATE_DOMAIN,
+	DMA_OWNER_PRIVATE_DOMAIN_USER,
 };
 
 #define IOMMU_PASID_INVALID	(-1U)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-18 13:33           ` Jason Gunthorpe
@ 2021-11-19  5:44             ` Tian, Kevin
  2021-11-19 11:14               ` Lu Baolu
  2021-11-19 12:56               ` Jason Gunthorpe
  0 siblings, 2 replies; 60+ messages in thread
From: Tian, Kevin @ 2021-11-19  5:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lu Baolu, Christoph Hellwig, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Raj, Ashok, Chaitanya Kulkarni,
	kvm, rafael, linux-pci, Cornelia Huck, linux-kernel, iommu, Pan,
	Jacob jun, Diana Craciun, Will Deacon

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, November 18, 2021 9:33 PM
> 
> > In concept a singleton group is different from a
> > multi-devices group which has only one device bound to driver...
> 
> Really? Why? I don't see it that way..
> 
> A singleton group is just a multi-device group that hasn't been
> hotplugged yet.
> 
> We don't seem to have the concept of a "true" singleton group which is
> permanently single due to HW features.
> 
> > This series aims to avoid conflict having both user and kernel drivers
> > mixed in a multi-devices group.

Well, the difference is just in literal. I don't know the background
why the existing iommu_attach_device() users want to do it this
way. But given the condition in iommu_attach_device() it could
in theory imply some unknown hardware-level side effect which 
may break the desired functionality once the group size grows 
beyond singleton. Is it a real case? I don't know...

You are now redefining that condition from singleton group to
multi-devices group with single driver bound. As long as no object
from existing driver users, I'm fine with it. But still want to raise
awareness as it does change the existing semantics (though might
be considered as an imperfect way).

Thanks
Kevin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-19  5:44             ` Tian, Kevin
@ 2021-11-19 11:14               ` Lu Baolu
  2021-11-19 15:06                 ` Jörg Rödel
  2021-11-19 12:56               ` Jason Gunthorpe
  1 sibling, 1 reply; 60+ messages in thread
From: Lu Baolu @ 2021-11-19 11:14 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: baolu.lu, Christoph Hellwig, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Raj, Ashok, Chaitanya Kulkarni,
	kvm, rafael, linux-pci, Cornelia Huck, linux-kernel, iommu, Pan,
	Jacob jun, Diana Craciun, Will Deacon

On 2021/11/19 13:44, Tian, Kevin wrote:
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Thursday, November 18, 2021 9:33 PM
>>
>>> In concept a singleton group is different from a
>>> multi-devices group which has only one device bound to driver...
>>
>> Really? Why? I don't see it that way..
>>
>> A singleton group is just a multi-device group that hasn't been
>> hotplugged yet.
>>
>> We don't seem to have the concept of a "true" singleton group which is
>> permanently single due to HW features.
>>
>>> This series aims to avoid conflict having both user and kernel drivers
>>> mixed in a multi-devices group.
> 
> Well, the difference is just in literal. I don't know the background
> why the existing iommu_attach_device() users want to do it this
> way. But given the condition in iommu_attach_device() it could
> in theory imply some unknown hardware-level side effect which
> may break the desired functionality once the group size grows
> beyond singleton. Is it a real case? I don't know...
> 
> You are now redefining that condition from singleton group to
> multi-devices group with single driver bound. As long as no object
> from existing driver users, I'm fine with it. But still want to raise
> awareness as it does change the existing semantics (though might
> be considered as an imperfect way).

The singleton group requirement for iommu_attach/detach_device() was
added by below commit:

commit 426a273834eae65abcfc7132a21a85b3151e0bce
Author: Joerg Roedel <jroedel@suse.de>
Date:   Thu May 28 18:41:30 2015 +0200

     iommu: Limit iommu_attach/detach_device to devices with their own group

     This patch changes the behavior of the iommu_attach_device
     and iommu_detach_device functions. With this change these
     functions only work on devices that have their own group.
     For all other devices the iommu_group_attach/detach
     functions must be used.

     Signed-off-by: Joerg Roedel <jroedel@suse.de>

Joerg,can you please shed some light on the background of this
requirement? Does above idea of transition from singleton group
to group with single driver bound make sense to you?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-19  5:44             ` Tian, Kevin
  2021-11-19 11:14               ` Lu Baolu
@ 2021-11-19 12:56               ` Jason Gunthorpe
  1 sibling, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-19 12:56 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Lu Baolu, Christoph Hellwig, Greg Kroah-Hartman, Joerg Roedel,
	Alex Williamson, Bjorn Helgaas, Raj, Ashok, Chaitanya Kulkarni,
	kvm, rafael, linux-pci, Cornelia Huck, linux-kernel, iommu, Pan,
	Jacob jun, Diana Craciun, Will Deacon

On Fri, Nov 19, 2021 at 05:44:35AM +0000, Tian, Kevin wrote:

> Well, the difference is just in literal. I don't know the background
> why the existing iommu_attach_device() users want to do it this
> way. But given the condition in iommu_attach_device() it could
> in theory imply some unknown hardware-level side effect which 
> may break the desired functionality once the group size grows 
> beyond singleton. Is it a real case? I don't know...

No, this must not be.

Any device can have VFIO attached to it, and VFIO does the equivalent
of iommu_attach_device(). We cannot have "unknown hardware-level side
effects" to changing the domain's of multi device groups and also have
working VFIO.

If these HW problems exist they are a seperate issue and the iommu
core should flag such groups as being unable to change their domains
at all.

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-19 11:14               ` Lu Baolu
@ 2021-11-19 15:06                 ` Jörg Rödel
  2021-11-19 15:43                   ` Jason Gunthorpe
  2021-11-20 11:16                   ` Lu Baolu
  0 siblings, 2 replies; 60+ messages in thread
From: Jörg Rödel @ 2021-11-19 15:06 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Tian, Kevin, Jason Gunthorpe, Chaitanya Kulkarni, Raj, Ashok,
	kvm, rafael, Greg Kroah-Hartman, Cornelia Huck, Will Deacon,
	linux-kernel, iommu, Christoph Hellwig, Alex Williamson, Pan,
	Jacob jun, linux-pci, Bjorn Helgaas, Diana Craciun

On Fri, Nov 19, 2021 at 07:14:10PM +0800, Lu Baolu wrote:
> The singleton group requirement for iommu_attach/detach_device() was
> added by below commit:
> 
> commit 426a273834eae65abcfc7132a21a85b3151e0bce
> Author: Joerg Roedel <jroedel@suse.de>
> Date:   Thu May 28 18:41:30 2015 +0200
> 
>     iommu: Limit iommu_attach/detach_device to devices with their own group
> 
>     This patch changes the behavior of the iommu_attach_device
>     and iommu_detach_device functions. With this change these
>     functions only work on devices that have their own group.
>     For all other devices the iommu_group_attach/detach
>     functions must be used.
> 
>     Signed-off-by: Joerg Roedel <jroedel@suse.de>
> 
> Joerg,can you please shed some light on the background of this
> requirement? Does above idea of transition from singleton group
> to group with single driver bound make sense to you?

This change came to be because the iommu_attach/detach_device()
interface doesn't fit well into a world with iommu-groups. Devices
within a group are by definition not isolated between each other, so
they must all be in the same address space (== iommu_domain). So it
doesn't make sense to allow attaching a single device within a group to
a different iommu_domain.

I know that in theory it is safe to allow devices within a group to be
in different domains because there iommu-groups catch multiple
non-isolation cases:

	1) Devices behind a non-ACS capable bridge or multiple functions
	   of a PCI device. Here it is safe to put the devices into
	   different iommu-domains as long as all affected devices are
	   controlled by the same owner.

	2) Devices which share a single request-id and can't be
	   differentiated by the IOMMU hardware. These always need to be
	   in the same iommu_domain.

To lift the single-domain-per-group requirement the iommu core code
needs to learn the difference between the two cases above.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-19 15:06                 ` Jörg Rödel
@ 2021-11-19 15:43                   ` Jason Gunthorpe
  2021-11-20 11:16                   ` Lu Baolu
  1 sibling, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2021-11-19 15:43 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: Lu Baolu, Tian, Kevin, Chaitanya Kulkarni, Raj, Ashok, kvm,
	rafael, Greg Kroah-Hartman, Cornelia Huck, Will Deacon,
	linux-kernel, iommu, Christoph Hellwig, Alex Williamson, Pan,
	Jacob jun, linux-pci, Bjorn Helgaas, Diana Craciun

On Fri, Nov 19, 2021 at 04:06:12PM +0100, Jörg Rödel wrote:

> This change came to be because the iommu_attach/detach_device()
> interface doesn't fit well into a world with iommu-groups. Devices
> within a group are by definition not isolated between each other, so
> they must all be in the same address space (== iommu_domain). So it
> doesn't make sense to allow attaching a single device within a group to
> a different iommu_domain.

It is the same problem VFIO has. It changes the iommu_domain of a
group while it only has a single driver bound to one device in the
group.

Robin is also right to point out there is no guarentee that a single
device group will remain a single device group after a hot plug
event. This is something VFIO is also able to handle today.

So, I think the solution of this series applies equally well to this
problem. Let's see it in v2.

> I know that in theory it is safe to allow devices within a group to be
> in different domains because there iommu-groups catch multiple
> non-isolation cases:
> 
> 	1) Devices behind a non-ACS capable bridge or multiple functions
> 	   of a PCI device. Here it is safe to put the devices into
> 	   different iommu-domains as long as all affected devices are
> 	   controlled by the same owner.
> 
> 	2) Devices which share a single request-id and can't be
> 	   differentiated by the IOMMU hardware. These always need to be
> 	   in the same iommu_domain.

> To lift the single-domain-per-group requirement the iommu core code
> needs to learn the difference between the two cases above.

We had a long talk about this a while back, nobody came with
compelling arguments to justify doing this work. I've just been using
it as a guidepost for building APIs. If the API can accomodate #1 then
it is a better design than one that cannot.

Jason

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
  2021-11-19 15:06                 ` Jörg Rödel
  2021-11-19 15:43                   ` Jason Gunthorpe
@ 2021-11-20 11:16                   ` Lu Baolu
  1 sibling, 0 replies; 60+ messages in thread
From: Lu Baolu @ 2021-11-20 11:16 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: baolu.lu, Tian, Kevin, Jason Gunthorpe, Chaitanya Kulkarni, Raj,
	Ashok, kvm, rafael, Greg Kroah-Hartman, Cornelia Huck,
	Will Deacon, linux-kernel, iommu, Christoph Hellwig,
	Alex Williamson, Pan, Jacob jun, linux-pci, Bjorn Helgaas,
	Diana Craciun

Hi Joerg,

On 11/19/21 11:06 PM, Jörg Rödel wrote:
> On Fri, Nov 19, 2021 at 07:14:10PM +0800, Lu Baolu wrote:
>> The singleton group requirement for iommu_attach/detach_device() was
>> added by below commit:
>>
>> commit 426a273834eae65abcfc7132a21a85b3151e0bce
>> Author: Joerg Roedel <jroedel@suse.de>
>> Date:   Thu May 28 18:41:30 2015 +0200
>>
>>      iommu: Limit iommu_attach/detach_device to devices with their own group
>>
>>      This patch changes the behavior of the iommu_attach_device
>>      and iommu_detach_device functions. With this change these
>>      functions only work on devices that have their own group.
>>      For all other devices the iommu_group_attach/detach
>>      functions must be used.
>>
>>      Signed-off-by: Joerg Roedel <jroedel@suse.de>
>>
>> Joerg,can you please shed some light on the background of this
>> requirement? Does above idea of transition from singleton group
>> to group with single driver bound make sense to you?
> 
> This change came to be because the iommu_attach/detach_device()
> interface doesn't fit well into a world with iommu-groups. Devices
> within a group are by definition not isolated between each other, so
> they must all be in the same address space (== iommu_domain). So it
> doesn't make sense to allow attaching a single device within a group to
> a different iommu_domain.

Thanks for the explanation. It's very helpful. There seems to be a lot
of discussions around this, but I didn't see any meaningful reasons to
break the assumption of "all devices in a group being in a same address
space".

Best regards,
baolu

> 
> I know that in theory it is safe to allow devices within a group to be
> in different domains because there iommu-groups catch multiple
> non-isolation cases:
> 
> 	1) Devices behind a non-ACS capable bridge or multiple functions
> 	   of a PCI device. Here it is safe to put the devices into
> 	   different iommu-domains as long as all affected devices are
> 	   controlled by the same owner.
> 
> 	2) Devices which share a single request-id and can't be
> 	   differentiated by the IOMMU hardware. These always need to be
> 	   in the same iommu_domain.
> 
> To lift the single-domain-per-group requirement the iommu core code
> needs to learn the difference between the two cases above.
> 
> Regards,
> 
> 	Joerg
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2021-11-20 11:20 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
2021-11-15  2:05 ` [PATCH 01/11] iommu: Add device dma ownership set/release interfaces Lu Baolu
2021-11-15 13:14   ` Christoph Hellwig
2021-11-16  1:57     ` Lu Baolu
2021-11-16 13:46       ` Jason Gunthorpe
2021-11-17  5:22         ` Lu Baolu
2021-11-17 13:35           ` Jason Gunthorpe
2021-11-18  1:12             ` Lu Baolu
2021-11-18 14:10               ` Jason Gunthorpe
2021-11-18  2:39         ` Tian, Kevin
2021-11-18 13:33           ` Jason Gunthorpe
2021-11-19  5:44             ` Tian, Kevin
2021-11-19 11:14               ` Lu Baolu
2021-11-19 15:06                 ` Jörg Rödel
2021-11-19 15:43                   ` Jason Gunthorpe
2021-11-20 11:16                   ` Lu Baolu
2021-11-19 12:56               ` Jason Gunthorpe
2021-11-15 20:38   ` Bjorn Helgaas
2021-11-16  1:52     ` Lu Baolu
2021-11-15  2:05 ` [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind Lu Baolu
2021-11-15  6:59   ` Greg Kroah-Hartman
2021-11-15 13:20     ` Christoph Hellwig
2021-11-15 13:38     ` Jason Gunthorpe
2021-11-15 13:19   ` Christoph Hellwig
2021-11-15 13:24     ` Jason Gunthorpe
2021-11-15 15:37       ` Robin Murphy
2021-11-15 15:56         ` Jason Gunthorpe
2021-11-15 18:15           ` Christoph Hellwig
2021-11-15 18:35           ` Robin Murphy
2021-11-15 19:39             ` Jason Gunthorpe
2021-11-15  2:05 ` [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming Lu Baolu
2021-11-15 13:21   ` Christoph Hellwig
2021-11-15 13:31     ` Jason Gunthorpe
2021-11-15 15:14       ` Robin Murphy
2021-11-15 16:17         ` Jason Gunthorpe
2021-11-15 17:54           ` Robin Murphy
2021-11-15 18:19             ` Christoph Hellwig
2021-11-15 18:44               ` Robin Murphy
2021-11-15 19:22             ` Jason Gunthorpe
2021-11-15 20:58               ` Robin Murphy
2021-11-15 21:19                 ` Jason Gunthorpe
2021-11-15 20:48   ` Bjorn Helgaas
2021-11-15 22:17   ` Bjorn Helgaas
2021-11-16  6:05     ` Lu Baolu
2021-11-15  2:05 ` [PATCH 04/11] PCI: portdrv: " Lu Baolu
2021-11-15 20:44   ` Bjorn Helgaas
2021-11-16  7:24     ` Lu Baolu
2021-11-16 20:22       ` Bjorn Helgaas
2021-11-16 20:48         ` Jason Gunthorpe
2021-11-15  2:05 ` [PATCH 05/11] iommu: Add security context management for assigned devices Lu Baolu
2021-11-15 13:22   ` Christoph Hellwig
2021-11-16  7:25     ` Lu Baolu
2021-11-15  2:05 ` [PATCH 06/11] iommu: Expose group variants of dma ownership interfaces Lu Baolu
2021-11-15 13:27   ` Christoph Hellwig
2021-11-16  9:42     ` Lu Baolu
2021-11-15  2:05 ` [PATCH 07/11] vfio: Use DMA_OWNER_USER to declaim passthrough devices Lu Baolu
2021-11-15  2:05 ` [PATCH 08/11] vfio: Remove use of vfio_group_viable() Lu Baolu
2021-11-15  2:05 ` [PATCH 09/11] vfio: Delete the unbound_list Lu Baolu
2021-11-15  2:05 ` [PATCH 10/11] vfio: Remove iommu group notifier Lu Baolu
2021-11-15  2:05 ` [PATCH 11/11] iommu: Remove iommu group changes notifier Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).