LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 00/16] KVM: arm64: MMIO guard PV services
@ 2021-07-15 16:31 Marc Zyngier
  2021-07-15 16:31 ` [PATCH 01/16] KVM: arm64: Generalise VM features into a set of flags Marc Zyngier
                   ` (16 more replies)
  0 siblings, 17 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

KVM/arm64 currently considers that any memory access outside of a
memslot is a MMIO access. This so far has served us very well, but
obviously relies on the guest trusting the host, and especially
userspace to do the right thing.

As we keep on hacking away at pKVM, it becomes obvious that this trust
model is not really fit for a confidential computing environment, and
that the guest would require some guarantees that emulation only
occurs on portions of the address space that have clearly been
identified for this purpose.

This series aims at providing the two sides of the above coin:

- a set of PV services (collectively called 'MMIO guard' -- better
  name required!) where the guest can flag portion of its address
  space that it considers as MMIO, with map/unmap semantics. Any
  attempt to access a MMIO range outside of these regions will result
  in an external abort being injected.

- a set of hooks into the ioremap code allowing a Linux guest to tell
  KVM about things it want to consider as MMIO. I definitely hate this
  part of the series, as it feels clumsy and brittle.

For now, the enrolment in this scheme is controlled by a guest kernel
command-line parameters, but it is expected that KVM will enforce this
for protected VMs.

Note that this crucially misses a save/restore interface for non
protected VMs, and I currently don't have a good solution for
that. Ideas welcome.

I also plan to use this series as a base for some other purposes,
namely to trick the guest in telling us how it maps things like
prefetchable BARs (see the discussion at [1]). That part is not
implemented yet, but there is already some provision to pass the MAIR
index across.

Patches on top of 5.14-rc1, branch pushed at the usual location.

[1] 20210429162906.32742-1-sdonthineni@nvidia.com

Marc Zyngier (16):
  KVM: arm64: Generalise VM features into a set of flags
  KVM: arm64: Don't issue CMOs when the physical address is invalid
  KVM: arm64: Turn kvm_pgtable_stage2_set_owner into
    kvm_pgtable_stage2_annotate
  KVM: arm64: Add MMIO checking infrastructure
  KVM: arm64: Plumb MMIO checking into the fault handling
  KVM: arm64: Force a full unmap on vpcu reinit
  KVM: arm64: Wire MMIO guard hypercalls
  KVM: arm64: Add tracepoint for failed MMIO guard check
  KVM: arm64: Advertise a capability for MMIO guard
  KVM: arm64: Add some documentation for the MMIO guard feature
  firmware/smccc: Call arch-specific hook on discovering KVM services
  mm/ioremap: Add arch-specific callbacks on ioremap/iounmap calls
  arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard
  arm64: Enroll into KVM's MMIO guard if required
  arm64: Add a helper to retrieve the PTE of a fixmap
  arm64: Register earlycon fixmap with the MMIO guard

 .../admin-guide/kernel-parameters.txt         |   3 +
 Documentation/virt/kvm/arm/index.rst          |   1 +
 Documentation/virt/kvm/arm/mmio-guard.rst     |  73 +++++++++++
 arch/arm/include/asm/hypervisor.h             |   1 +
 arch/arm64/include/asm/fixmap.h               |   2 +
 arch/arm64/include/asm/hypervisor.h           |   2 +
 arch/arm64/include/asm/kvm_host.h             |  14 ++-
 arch/arm64/include/asm/kvm_mmu.h              |   5 +
 arch/arm64/include/asm/kvm_pgtable.h          |  12 +-
 arch/arm64/kernel/setup.c                     |   6 +
 arch/arm64/kvm/arm.c                          |  14 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  14 ++-
 arch/arm64/kvm/hyp/pgtable.c                  |  36 +++---
 arch/arm64/kvm/hypercalls.c                   |  20 +++
 arch/arm64/kvm/mmio.c                         |  13 +-
 arch/arm64/kvm/mmu.c                          | 117 ++++++++++++++++++
 arch/arm64/kvm/trace_arm.h                    |  17 +++
 arch/arm64/mm/ioremap.c                       | 107 ++++++++++++++++
 arch/arm64/mm/mmu.c                           |  15 +++
 drivers/firmware/smccc/kvm_guest.c            |   4 +
 include/linux/arm-smccc.h                     |  28 +++++
 include/linux/io.h                            |   3 +
 include/uapi/linux/kvm.h                      |   1 +
 mm/ioremap.c                                  |  13 +-
 mm/vmalloc.c                                  |   8 ++
 25 files changed, 492 insertions(+), 37 deletions(-)
 create mode 100644 Documentation/virt/kvm/arm/mmio-guard.rst

-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/16] KVM: arm64: Generalise VM features into a set of flags
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-27 18:10   ` Will Deacon
  2021-07-15 16:31 ` [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid Marc Zyngier
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

We currently deal with a set of booleans for VM features,
while they could be better represented as set of flags
contained in an unsigned long, similarily to what we are
doing on the CPU side.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 12 +++++++-----
 arch/arm64/kvm/arm.c              |  5 +++--
 arch/arm64/kvm/mmio.c             |  3 ++-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41911585ae0c..4add6c27251f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -122,7 +122,10 @@ struct kvm_arch {
 	 * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
 	 * supported.
 	 */
-	bool return_nisv_io_abort_to_user;
+#define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER	0
+	/* Memory Tagging Extension enabled for the guest */
+#define KVM_ARCH_FLAG_MTE_ENABLED			1
+	unsigned long flags;
 
 	/*
 	 * VM-wide PMU filter, implemented as a bitmap and big enough for
@@ -133,9 +136,6 @@ struct kvm_arch {
 
 	u8 pfr0_csv2;
 	u8 pfr0_csv3;
-
-	/* Memory Tagging Extension enabled for the guest */
-	bool mte_enabled;
 };
 
 struct kvm_vcpu_fault_info {
@@ -777,7 +777,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_arm_vcpu_sve_finalized(vcpu) \
 	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
 
-#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
+#define kvm_has_mte(kvm)					\
+	(system_supports_mte() &&				\
+	 test_bit(KVM_ARCH_FLAG_MTE_ENABLED, &(kvm)->arch.flags))
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9a2b8f27792..97ab1512c44f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -91,13 +91,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 	switch (cap->cap) {
 	case KVM_CAP_ARM_NISV_TO_USER:
 		r = 0;
-		kvm->arch.return_nisv_io_abort_to_user = true;
+		set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
+			&kvm->arch.flags);
 		break;
 	case KVM_CAP_ARM_MTE:
 		if (!system_supports_mte() || kvm->created_vcpus)
 			return -EINVAL;
 		r = 0;
-		kvm->arch.mte_enabled = true;
+		set_bit(KVM_ARCH_FLAG_MTE_ENABLED, &kvm->arch.flags);
 		break;
 	default:
 		r = -EINVAL;
diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index 3e2d8ba11a02..3dd38a151d2a 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c
@@ -135,7 +135,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 	 * volunteered to do so, and bail out otherwise.
 	 */
 	if (!kvm_vcpu_dabt_isvalid(vcpu)) {
-		if (vcpu->kvm->arch.return_nisv_io_abort_to_user) {
+		if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
+			     &vcpu->kvm->arch.flags)) {
 			run->exit_reason = KVM_EXIT_ARM_NISV;
 			run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu);
 			run->arm_nisv.fault_ipa = fault_ipa;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
  2021-07-15 16:31 ` [PATCH 01/16] KVM: arm64: Generalise VM features into a set of flags Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-19 17:18   ` Quentin Perret
  2021-07-27 18:10   ` Will Deacon
  2021-07-15 16:31 ` [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate Marc Zyngier
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

Make sure we don't issue CMOs when mapping something that
is not a memory address in the S2 page tables.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/pgtable.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 05321f4165e3..a5874ebd0354 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -619,12 +619,16 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	}
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
-	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
-						granule);
-
-	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
-		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
+	if (kvm_phys_is_valid(phys)) {
+		if (mm_ops->dcache_clean_inval_poc &&
+		    stage2_pte_cacheable(pgt, new))
+			mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new,
+								      mm_ops),
+						       granule);
+		if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
+			mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops),
+						 granule);
+	}
 
 	smp_store_release(ptep, new);
 	if (stage2_pte_is_counted(new))
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
  2021-07-15 16:31 ` [PATCH 01/16] KVM: arm64: Generalise VM features into a set of flags Marc Zyngier
  2021-07-15 16:31 ` [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-20 10:09   ` Quentin Perret
  2021-07-15 16:31 ` [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure Marc Zyngier
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

kvm_pgtable_stage2_set_owner() could be generalised into a way
to store up to 63 bits in the page tables, as long as we don't
set bit 0.

Let's just do that.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h  | 12 +++++++-----
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 14 ++++++++++++--
 arch/arm64/kvm/hyp/pgtable.c          | 20 ++++++--------------
 3 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index f004c0115d89..9579e8c2793b 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -274,14 +274,16 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   void *mc);
 
 /**
- * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
- *				    track ownership.
+ * kvm_pgtable_stage2_annotate() - Unmap and annotate pages in the IPA space
+ *				   to track ownership (and more).
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
  * @addr:	Base intermediate physical address to annotate.
  * @size:	Size of the annotated range.
  * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
  *		page-table pages.
- * @owner_id:	Unique identifier for the owner of the page.
+ * @annotation:	A 63 bit value that will be stored in the page tables.
+ *		@annotation[0] must be 0, and @annotation[63:1] is stored
+ *		in the page tables. @annotation as a whole must not be 0.
  *
  * By default, all page-tables are owned by identifier 0. This function can be
  * used to mark portions of the IPA space as owned by other entities. When a
@@ -290,8 +292,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
-				 void *mc, u8 owner_id);
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				void *mc, kvm_pte_t annotation);
 
 /**
  * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d938ce95d3bd..ffe482c3b818 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -245,6 +245,15 @@ static int host_stage2_idmap(u64 addr)
 	return ret;
 }
 
+#define KVM_INVALID_PTE_OWNER_MASK	GENMASK(63, 56)
+#define KVM_MAX_OWNER_ID		1
+
+static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
+{
+	BUG_ON(owner_id > KVM_MAX_OWNER_ID);
+	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
+}
+
 int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end)
 {
 	int ret;
@@ -257,8 +266,9 @@ int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end)
 		return -EINVAL;
 
 	hyp_spin_lock(&host_kvm.lock);
-	ret = kvm_pgtable_stage2_set_owner(&host_kvm.pgt, start, end - start,
-					   &host_s2_pool, pkvm_hyp_id);
+	ret = kvm_pgtable_stage2_annotate(&host_kvm.pgt, start, end - start,
+					  &host_s2_pool,
+					  kvm_init_invalid_leaf_owner(pkvm_hyp_id));
 	hyp_spin_unlock(&host_kvm.lock);
 
 	return ret != -EAGAIN ? ret : 0;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index a5874ebd0354..a065f6d960af 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -50,9 +50,6 @@
 
 #define KVM_PTE_LEAF_ATTR_S2_IGNORED	GENMASK(58, 55)
 
-#define KVM_INVALID_PTE_OWNER_MASK	GENMASK(63, 56)
-#define KVM_MAX_OWNER_ID		1
-
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -206,11 +203,6 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
 	return pte;
 }
 
-static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
-{
-	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
-}
-
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 				  u32 level, kvm_pte_t *ptep,
 				  enum kvm_pgtable_walk_flags flag)
@@ -466,7 +458,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 struct stage2_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
-	u8				owner_id;
+	u64				annotation;
 
 	kvm_pte_t			*anchor;
 	kvm_pte_t			*childp;
@@ -603,7 +595,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	if (kvm_phys_is_valid(phys))
 		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
 	else
-		new = kvm_init_invalid_leaf_owner(data->owner_id);
+		new = data->annotation;
 
 	if (stage2_pte_is_counted(old)) {
 		/*
@@ -796,8 +788,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
-				 void *mc, u8 owner_id)
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				void *mc, kvm_pte_t annotation)
 {
 	int ret;
 	struct stage2_map_data map_data = {
@@ -805,7 +797,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
 		.mm_ops		= pgt->mm_ops,
-		.owner_id	= owner_id,
+		.annotation	= annotation,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
@@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.arg		= &map_data,
 	};
 
-	if (owner_id > KVM_MAX_OWNER_ID)
+	if (!annotation || (annotation & PTE_VALID))
 		return -EINVAL;
 
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (2 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-20 11:13   ` Quentin Perret
  2021-07-27 18:11   ` Will Deacon
  2021-07-15 16:31 ` [PATCH 05/16] KVM: arm64: Plumb MMIO checking into the fault handling Marc Zyngier
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

Introduce the infrastructure required to identify an IPA region
that is expected to be used as an MMIO window.

This include mapping, unmapping and checking the regions. Nothing
calls into it yet, so no expected functional change.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |   2 +
 arch/arm64/include/asm/kvm_mmu.h  |   5 ++
 arch/arm64/kvm/mmu.c              | 115 ++++++++++++++++++++++++++++++
 3 files changed, 122 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4add6c27251f..914c1b7bb3ad 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -125,6 +125,8 @@ struct kvm_arch {
 #define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER	0
 	/* Memory Tagging Extension enabled for the guest */
 #define KVM_ARCH_FLAG_MTE_ENABLED			1
+	/* Gues has bought into the MMIO guard extension */
+#define KVM_ARCH_FLAG_MMIO_GUARD			2
 	unsigned long flags;
 
 	/*
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index b52c5c4b9a3d..f6b8fc1671b3 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -170,6 +170,11 @@ phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 int kvm_mmu_init(u32 *hyp_va_bits);
 
+/* MMIO guard */
+bool kvm_install_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
+bool kvm_remove_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
+bool kvm_check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
+
 static inline void *__kvm_vector_slot2addr(void *base,
 					   enum arm64_hyp_spectre_vector slot)
 {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3155c9e778f0..638827c8842b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1120,6 +1120,121 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 		kvm_set_pfn_accessed(pte_pfn(pte));
 }
 
+#define MMIO_NOTE	('M' << 24 | 'M' << 16 | 'I' << 8 | '0')
+
+bool kvm_install_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
+{
+	struct kvm_mmu_memory_cache *memcache;
+	struct kvm_memory_slot *memslot;
+	int ret, idx;
+
+	if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
+		return false;
+
+	/* Must be page-aligned */
+	if (ipa & ~PAGE_MASK)
+		return false;
+
+	/*
+	 * The page cannot be in a memslot. At some point, this will
+	 * have to deal with device mappings though.
+	 */
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+	memslot = gfn_to_memslot(vcpu->kvm, ipa >> PAGE_SHIFT);
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+
+	if (memslot)
+		return false;
+
+	/* Guest has direct access to the GICv2 virtual CPU interface */
+	if (irqchip_in_kernel(vcpu->kvm) &&
+	    vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2 &&
+	    ipa == vcpu->kvm->arch.vgic.vgic_cpu_base)
+		return true;
+
+	memcache = &vcpu->arch.mmu_page_cache;
+	if (kvm_mmu_topup_memory_cache(memcache,
+				       kvm_mmu_cache_min_pages(vcpu->kvm)))
+		return false;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+	ret = kvm_pgtable_stage2_annotate(vcpu->arch.hw_mmu->pgt,
+					  ipa, PAGE_SIZE, memcache,
+					  MMIO_NOTE);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return ret == 0;
+}
+
+struct s2_walk_data {
+	kvm_pte_t	pteval;
+	u32		level;
+};
+
+static int s2_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+		     enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+	struct s2_walk_data *data = arg;
+
+	data->level = level;
+	data->pteval = *ptep;
+	return 0;
+}
+
+/* Assumes mmu_lock taken */
+static bool __check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
+{
+	struct s2_walk_data data;
+	struct kvm_pgtable_walker walker = {
+		.cb             = s2_walker,
+		.flags          = KVM_PGTABLE_WALK_LEAF,
+		.arg            = &data,
+	};
+
+	kvm_pgtable_walk(vcpu->arch.hw_mmu->pgt, ALIGN_DOWN(ipa, PAGE_SIZE),
+			 PAGE_SIZE, &walker);
+
+	/* Must be a PAGE_SIZE mapping with our annotation */
+	return (BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(data.level)) == PAGE_SIZE &&
+		data.pteval == MMIO_NOTE);
+}
+
+bool kvm_remove_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
+{
+	bool ret;
+
+	if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
+		return false;
+
+	/* Keep the PT locked across the two walks */
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	ret = __check_ioguard_page(vcpu, ipa);
+	if (ret)		/* Drop the annotation */
+		kvm_pgtable_stage2_unmap(vcpu->arch.hw_mmu->pgt,
+					 ALIGN_DOWN(ipa, PAGE_SIZE), PAGE_SIZE);
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	return ret;
+}
+
+bool kvm_check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
+{
+	bool ret;
+
+	if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
+		return true;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+	ret = __check_ioguard_page(vcpu, ipa & PAGE_MASK);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	if (!ret)
+		kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+
+	return ret;
+}
+
 /**
  * kvm_handle_guest_abort - handles all 2nd stage aborts
  * @vcpu:	the VCPU pointer
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 05/16] KVM: arm64: Plumb MMIO checking into the fault handling
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (3 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-27 18:11   ` Will Deacon
  2021-07-15 16:31 ` [PATCH 06/16] KVM: arm64: Force a full unmap on vpcu reinit Marc Zyngier
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

Plumb the MMIO checking code into the MMIO fault handling code.
Nothing allows a region to be registered yet, so there should be
no funtional change either.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmio.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index 3dd38a151d2a..fd5747279d27 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c
@@ -6,6 +6,7 @@
 
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 #include <trace/events/kvm.h>
 
 #include "trace.h"
@@ -130,6 +131,10 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 	int len;
 	u8 data_buf[8];
 
+	/* Check failed? Return to the guest for debriefing... */
+	if (!kvm_check_ioguard_page(vcpu, fault_ipa))
+		return 1;
+
 	/*
 	 * No valid syndrome? Ask userspace for help if it has
 	 * volunteered to do so, and bail out otherwise.
@@ -156,6 +161,11 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 	len = kvm_vcpu_dabt_get_as(vcpu);
 	rt = kvm_vcpu_dabt_get_rd(vcpu);
 
+	/* If we cross a page boundary, check that too... */
+	if (((fault_ipa + len - 1) & PAGE_MASK) != (fault_ipa & PAGE_MASK) &&
+	    !kvm_check_ioguard_page(vcpu, fault_ipa + len - 1))
+		return 1;
+
 	if (is_write) {
 		data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt),
 					       len);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 06/16] KVM: arm64: Force a full unmap on vpcu reinit
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (4 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 05/16] KVM: arm64: Plumb MMIO checking into the fault handling Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-27 18:11   ` Will Deacon
  2021-07-15 16:31 ` [PATCH 07/16] KVM: arm64: Wire MMIO guard hypercalls Marc Zyngier
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

As we now keep information in the S2PT, we must be careful not
to keep it across a VM reboot, which could otherwise lead to
interesting problems.

Make sure that the S2 is completely discarded on reset of
a vcpu, and remove the flag that enforces the MMIO check.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arm.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 97ab1512c44f..b0d2225190d2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1096,12 +1096,18 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	 * ensuring that the data side is always coherent. We still
 	 * need to invalidate the I-cache though, as FWB does *not*
 	 * imply CTR_EL0.DIC.
+	 *
+	 * If the MMIO guard was enabled, we pay the price of a full
+	 * unmap to get back to a sane state (and clear the flag).
 	 */
 	if (vcpu->arch.has_run_once) {
-		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
+		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB) ||
+		    test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
 			stage2_unmap_vm(vcpu->kvm);
 		else
 			icache_inval_all_pou();
+
+		clear_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
 	}
 
 	vcpu_reset_hcr(vcpu);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 07/16] KVM: arm64: Wire MMIO guard hypercalls
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (5 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 06/16] KVM: arm64: Force a full unmap on vpcu reinit Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-27 18:11   ` Will Deacon
  2021-07-15 16:31 ` [PATCH 08/16] KVM: arm64: Add tracepoint for failed MMIO guard check Marc Zyngier
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

Plumb in the hypercall interface to allow a guest to discover,
enroll, map and unmap MMIO regions.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hypercalls.c | 20 ++++++++++++++++++++
 include/linux/arm-smccc.h   | 28 ++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 30da78f72b3b..a3deeb907fdd 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -5,6 +5,7 @@
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 
 #include <kvm/arm_hypercalls.h>
 #include <kvm/arm_psci.h>
@@ -129,10 +130,29 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
 		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
 		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP);
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
 		kvm_ptp_get_time(vcpu, val);
 		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID:
+		val[0] = PAGE_SIZE;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID:
+		set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
+		val[0] = SMCCC_RET_SUCCESS;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
+		if (kvm_install_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
+			val[0] = SMCCC_RET_SUCCESS;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID:
+		if (kvm_remove_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
+			val[0] = SMCCC_RET_SUCCESS;
+		break;
 	case ARM_SMCCC_TRNG_VERSION:
 	case ARM_SMCCC_TRNG_FEATURES:
 	case ARM_SMCCC_TRNG_GET_UUID:
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 7d1cabe15262..4aab2078d8d3 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -107,6 +107,10 @@
 /* KVM "vendor specific" services */
 #define ARM_SMCCC_KVM_FUNC_FEATURES		0
 #define ARM_SMCCC_KVM_FUNC_PTP			1
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO	2
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL	3
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP	4
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP	5
 #define ARM_SMCCC_KVM_FUNC_FEATURES_2		127
 #define ARM_SMCCC_KVM_NUM_FUNCS			128
 
@@ -133,6 +137,30 @@
 #define KVM_PTP_VIRT_COUNTER			0
 #define KVM_PTP_PHYS_COUNTER			1
 
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID		\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID		\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID		\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP)
+
 /* Paravirtualised time calls (defined by ARM DEN0057A) */
 #define ARM_SMCCC_HV_PV_TIME_FEATURES				\
 	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 08/16] KVM: arm64: Add tracepoint for failed MMIO guard check
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (6 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 07/16] KVM: arm64: Wire MMIO guard hypercalls Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-15 16:31 ` [PATCH 09/16] KVM: arm64: Advertise a capability for MMIO guard Marc Zyngier
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

In order to make debugging easier, expose a new trace point
that triggers when a MMIO check fails.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c       |  4 +++-
 arch/arm64/kvm/trace_arm.h | 17 +++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 638827c8842b..c2a23457552b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1229,8 +1229,10 @@ bool kvm_check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
 	ret = __check_ioguard_page(vcpu, ipa & PAGE_MASK);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
-	if (!ret)
+	if (!ret) {
+		trace_kvm_failed_mmio_check(*vcpu_pc(vcpu), ipa);
 		kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+	}
 
 	return ret;
 }
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index 33e4e7dd2719..e40cfeb251ad 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -89,6 +89,23 @@ TRACE_EVENT(kvm_access_fault,
 	TP_printk("IPA: %lx", __entry->ipa)
 );
 
+TRACE_EVENT(kvm_failed_mmio_check,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long ipa),
+	TP_ARGS(vcpu_pc, ipa),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	ipa		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc	= vcpu_pc;
+		__entry->ipa		= ipa;
+	),
+
+	TP_printk("PC: %lx IPA: %lx", __entry->vcpu_pc, __entry->ipa)
+);
+
 TRACE_EVENT(kvm_irq_line,
 	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
 	TP_ARGS(type, vcpu_idx, irq_num, level),
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 09/16] KVM: arm64: Advertise a capability for MMIO guard
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (7 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 08/16] KVM: arm64: Add tracepoint for failed MMIO guard check Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-15 16:31 ` [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature Marc Zyngier
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

In order for userspace to find out whether the MMIO guard is
exposed to a guest, expose a capability that says so.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arm.c     | 1 +
 include/uapi/linux/kvm.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b0d2225190d2..72ebad749b0c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -214,6 +214,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_ARM_MMIO_GUARD:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..d4a5715c5c8f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_ARM_MMIO_GUARD 206
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (8 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 09/16] KVM: arm64: Advertise a capability for MMIO guard Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-21 21:17   ` Andrew Jones
  2021-07-15 16:31 ` [PATCH 11/16] firmware/smccc: Call arch-specific hook on discovering KVM services Marc Zyngier
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

Document the hypercalls user for the MMIO guard infrastructure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 Documentation/virt/kvm/arm/index.rst      |  1 +
 Documentation/virt/kvm/arm/mmio-guard.rst | 73 +++++++++++++++++++++++
 2 files changed, 74 insertions(+)
 create mode 100644 Documentation/virt/kvm/arm/mmio-guard.rst

diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index 78a9b670aafe..e77a0ee2e2d4 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -11,3 +11,4 @@ ARM
    psci
    pvtime
    ptp_kvm
+   mmio-guard
diff --git a/Documentation/virt/kvm/arm/mmio-guard.rst b/Documentation/virt/kvm/arm/mmio-guard.rst
new file mode 100644
index 000000000000..a5563a3e12cc
--- /dev/null
+++ b/Documentation/virt/kvm/arm/mmio-guard.rst
@@ -0,0 +1,73 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+KVM MMIO guard
+==============
+
+KVM implements device emulation by handling translation faults to any
+IPA range that is not contained a memory slot. Such translation fault
+is in most cases passed on to userspace (or in rare cases to the host
+kernel) with the address, size and possibly data of the access for
+emulation.
+
+Should the guest exit with an address that is not one that corresponds
+to an emulatable device, userspace may take measures that are not the
+most graceful as far as the guest is concerned (such as terminating it
+or delivering a fatal exception).
+
+There is also an element of trust: by forwarding the request to
+userspace, the kernel asumes that the guest trusts userspace to do the
+right thing.
+
+The KVM MMIO guard offers a way to mitigate this last point: a guest
+can request that only certainly regions of the IPA space are valid as
+MMIO. Only these regions will be handled as an MMIO, and any other
+will result in an exception being delivered to the guest.
+
+This relies on a set of hypercalls defined in the KVM-specific range,
+using the HVC64 calling convention.
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO
+
+    ==============    ========    ================================
+    Function ID:      (uint32)    0xC6000002
+    Arguments:        none
+    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
+                      (uint64)    Protection Granule (PG) size in
+		                  bytes (r0)
+    ==============    ========    ================================
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL
+
+    ==============    ========    ==============================
+    Function ID:      (uint32)    0xC6000003
+    Arguments:        none
+    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
+                                  RET_SUCCESS(0) (r0)
+    ==============    ========    ==============================
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP
+
+    ==============    ========    ======================================
+    Function ID:      (uint32)    0xC6000004
+    Arguments:        (uint64)    The base of the PG-sized IPA range
+                                  that is allowed to be accessed as
+				  MMIO. Must aligned to the PG size (r1)
+                      (uint64)    Index in the MAIR_EL1 register
+		                  providing the memory attribute that
+				  is used by the guest (r2)
+    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
+                                  RET_SUCCESS(0) (r0)
+    ==============    ========    ======================================
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP
+
+    ==============    ========    ======================================
+    Function ID:      (uint32)    0xC6000004
+    Arguments:        (uint64)    The base of the PG-sized IPA range
+                                  that is forbidden to be accessed as
+				  MMIO. Must aligned to the PG size
+				  and have been previously mapped (r1)
+    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
+                                  RET_SUCCESS(0) (r0)
+    ==============    ========    ======================================
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 11/16] firmware/smccc: Call arch-specific hook on discovering KVM services
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (9 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-15 16:31 ` [PATCH 12/16] mm/ioremap: Add arch-specific callbacks on ioremap/iounmap calls Marc Zyngier
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

arm64 will soon require its own callback to initialise services
that are only availably on this architecture. Introduce a hook
that can be overloaded by the architecture.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm/include/asm/hypervisor.h   | 1 +
 arch/arm64/include/asm/hypervisor.h | 1 +
 drivers/firmware/smccc/kvm_guest.c  | 4 ++++
 3 files changed, 6 insertions(+)

diff --git a/arch/arm/include/asm/hypervisor.h b/arch/arm/include/asm/hypervisor.h
index bd61502b9715..8133c8c81a35 100644
--- a/arch/arm/include/asm/hypervisor.h
+++ b/arch/arm/include/asm/hypervisor.h
@@ -6,5 +6,6 @@
 
 void kvm_init_hyp_services(void);
 bool kvm_arm_hyp_service_available(u32 func_id);
+void kvm_arm_init_hyp_services(void);
 
 #endif
diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h
index 0ae427f352c8..8e77f411903f 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h
@@ -6,5 +6,6 @@
 
 void kvm_init_hyp_services(void);
 bool kvm_arm_hyp_service_available(u32 func_id);
+void kvm_arm_init_hyp_services(void);
 
 #endif
diff --git a/drivers/firmware/smccc/kvm_guest.c b/drivers/firmware/smccc/kvm_guest.c
index 2d3e866decaa..56169e73252a 100644
--- a/drivers/firmware/smccc/kvm_guest.c
+++ b/drivers/firmware/smccc/kvm_guest.c
@@ -9,6 +9,8 @@
 
 #include <asm/hypervisor.h>
 
+void __weak kvm_arm_init_hyp_services(void) {}
+
 static DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS) __ro_after_init = { };
 
 void __init kvm_init_hyp_services(void)
@@ -38,6 +40,8 @@ void __init kvm_init_hyp_services(void)
 
 	pr_info("hypervisor services detected (0x%08lx 0x%08lx 0x%08lx 0x%08lx)\n",
 		 res.a3, res.a2, res.a1, res.a0);
+
+	kvm_arm_init_hyp_services();
 }
 
 bool kvm_arm_hyp_service_available(u32 func_id)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 12/16] mm/ioremap: Add arch-specific callbacks on ioremap/iounmap calls
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (10 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 11/16] firmware/smccc: Call arch-specific hook on discovering KVM services Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-27 18:12   ` Will Deacon
  2021-07-15 16:31 ` [PATCH 13/16] arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard Marc Zyngier
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

Add a pair of hooks (ioremap_page_range_hook/iounmap_page_range_hook)
that can be implemented by an architecture.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 include/linux/io.h |  3 +++
 mm/ioremap.c       | 13 ++++++++++++-
 mm/vmalloc.c       |  8 ++++++++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/io.h b/include/linux/io.h
index 9595151d800d..0ffc265f114c 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -21,6 +21,9 @@ void __ioread32_copy(void *to, const void __iomem *from, size_t count);
 void __iowrite64_copy(void __iomem *to, const void *from, size_t count);
 
 #ifdef CONFIG_MMU
+void ioremap_page_range_hook(unsigned long addr, unsigned long end,
+			     phys_addr_t phys_addr, pgprot_t prot);
+void iounmap_page_range_hook(phys_addr_t phys_addr, size_t size);
 int ioremap_page_range(unsigned long addr, unsigned long end,
 		       phys_addr_t phys_addr, pgprot_t prot);
 #else
diff --git a/mm/ioremap.c b/mm/ioremap.c
index 8ee0136f8cb0..bd77a86088f2 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -28,10 +28,21 @@ early_param("nohugeiomap", set_nohugeiomap);
 static const unsigned int iomap_max_page_shift = PAGE_SHIFT;
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
+void __weak ioremap_page_range_hook(unsigned long addr, unsigned long end,
+				    phys_addr_t phys_addr, pgprot_t prot)
+{
+}
+
 int ioremap_page_range(unsigned long addr,
 		       unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
-	return vmap_range(addr, end, phys_addr, prot, iomap_max_page_shift);
+	int ret;
+
+	ret = vmap_range(addr, end, phys_addr, prot, iomap_max_page_shift);
+	if (!ret)
+		ioremap_page_range_hook(addr, end, phys_addr, prot);
+
+	return ret;
 }
 
 #ifdef CONFIG_GENERIC_IOREMAP
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d5cd52805149..af18a6141093 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -38,6 +38,7 @@
 #include <linux/pgtable.h>
 #include <linux/uaccess.h>
 #include <linux/hugetlb.h>
+#include <linux/io.h>
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
@@ -2551,6 +2552,10 @@ static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 	set_area_direct_map(area, set_direct_map_default_noflush);
 }
 
+void __weak iounmap_page_range_hook(phys_addr_t phys_addr, size_t size)
+{
+}
+
 static void __vunmap(const void *addr, int deallocate_pages)
 {
 	struct vm_struct *area;
@@ -2574,6 +2579,9 @@ static void __vunmap(const void *addr, int deallocate_pages)
 
 	kasan_poison_vmalloc(area->addr, get_vm_area_size(area));
 
+	if (area->flags & VM_IOREMAP)
+		iounmap_page_range_hook(area->phys_addr, get_vm_area_size(area));
+
 	vm_remove_mappings(area, deallocate_pages);
 
 	if (deallocate_pages) {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 13/16] arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (11 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 12/16] mm/ioremap: Add arch-specific callbacks on ioremap/iounmap calls Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-15 16:31 ` [PATCH 14/16] arm64: Enroll into KVM's MMIO guard if required Marc Zyngier
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

Implement the previously defined ioremap/iounmap hooks for arm64,
calling into KVM's MMIO guard if available.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/mm/ioremap.c | 56 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index b7c81dacabf0..0801fd92f0e3 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -9,13 +9,69 @@
  * Copyright (C) 2012 ARM Ltd.
  */
 
+#define pr_fmt(fmt)	"ioremap: " fmt
+
 #include <linux/export.h>
 #include <linux/mm.h>
 #include <linux/vmalloc.h>
 #include <linux/io.h>
+#include <linux/arm-smccc.h>
 
 #include <asm/fixmap.h>
 #include <asm/tlbflush.h>
+#include <asm/hypervisor.h>
+
+static DEFINE_STATIC_KEY_FALSE(ioremap_guard_key);
+
+void ioremap_page_range_hook(unsigned long addr, unsigned long end,
+			     phys_addr_t phys_addr, pgprot_t prot)
+{
+	size_t size = end - addr;
+
+	if (!static_branch_unlikely(&ioremap_guard_key))
+		return;
+
+	if (pfn_valid(__phys_to_pfn(phys_addr)))
+		return;
+
+	while (size) {
+		struct arm_smccc_res res;
+
+		arm_smccc_1_1_hvc(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID,
+				  phys_addr, prot, &res);
+		if (res.a0 != SMCCC_RET_SUCCESS) {
+			pr_warn_ratelimited("Failed to register %llx\n",
+					    phys_addr);
+			return;
+		}
+
+		size -= PAGE_SIZE;
+		phys_addr += PAGE_SIZE;
+	}
+}
+
+void iounmap_page_range_hook(phys_addr_t phys_addr, size_t size)
+{
+	if (!static_branch_unlikely(&ioremap_guard_key))
+		return;
+
+	VM_BUG_ON(phys_addr & ~PAGE_MASK || size & ~PAGE_MASK);
+
+	while (size) {
+		struct arm_smccc_res res;
+
+		arm_smccc_1_1_hvc(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID,
+				  phys_addr, &res);
+		if (res.a0 != SMCCC_RET_SUCCESS) {
+			pr_warn_ratelimited("Failed to unregister %llx\n",
+					    phys_addr);
+			return;
+		}
+
+		size -= PAGE_SIZE;
+		phys_addr += PAGE_SIZE;
+	}
+}
 
 static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
 				      pgprot_t prot, void *caller)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 14/16] arm64: Enroll into KVM's MMIO guard if required
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (12 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 13/16] arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-15 16:31 ` [PATCH 15/16] arm64: Add a helper to retrieve the PTE of a fixmap Marc Zyngier
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

Should a guest desire to enroll into the MMIO guard, allow it to
do so with a command-line option.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  3 ++
 arch/arm64/include/asm/hypervisor.h           |  1 +
 arch/arm64/kernel/setup.c                     |  6 +++
 arch/arm64/mm/ioremap.c                       | 38 +++++++++++++++++++
 4 files changed, 48 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bdb22006f713..a398585bed90 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2062,6 +2062,9 @@
 			1 - Bypass the IOMMU for DMA.
 			unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
 
+	ioremap_guard	[ARM64] enable the KVM MMIO guard functionality
+			if available.
+
 	io7=		[HW] IO7 for Marvel-based Alpha systems
 			See comment before marvel_specify_io7 in
 			arch/alpha/kernel/core_marvel.c.
diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h
index 8e77f411903f..b130c7b82eaa 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h
@@ -7,5 +7,6 @@
 void kvm_init_hyp_services(void);
 bool kvm_arm_hyp_service_available(u32 func_id);
 void kvm_arm_init_hyp_services(void);
+void kvm_init_ioremap_services(void);
 
 #endif
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index be5f85b0a24d..c325647f675f 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -49,6 +49,7 @@
 #include <asm/tlbflush.h>
 #include <asm/traps.h>
 #include <asm/efi.h>
+#include <asm/hypervisor.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/mmu_context.h>
 
@@ -445,3 +446,8 @@ static int __init register_arm64_panic_block(void)
 	return 0;
 }
 device_initcall(register_arm64_panic_block);
+
+void kvm_arm_init_hyp_services(void)
+{
+	kvm_init_ioremap_services();
+}
diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index 0801fd92f0e3..d82b63bcc554 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -23,6 +23,44 @@
 
 static DEFINE_STATIC_KEY_FALSE(ioremap_guard_key);
 
+static bool ioremap_guard;
+static int __init ioremap_guard_setup(char *str)
+{
+	ioremap_guard = true;
+
+	return 0;
+}
+early_param("ioremap_guard", ioremap_guard_setup);
+
+void kvm_init_ioremap_services(void)
+{
+	struct arm_smccc_res res;
+
+	if (!ioremap_guard)
+		return;
+
+	/* We need all the functions to be implemented */
+	if (!kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO) ||
+	    !kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL) ||
+	    !kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP) ||
+	    !kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP))
+		return;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID,
+			     &res);
+	if (res.a0 != PAGE_SIZE)
+		return;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID,
+			     &res);
+	if (res.a0 == SMCCC_RET_SUCCESS) {
+		static_branch_enable(&ioremap_guard_key);
+		pr_info("Using KVM MMIO guard for ioremap\n");
+	} else {
+		pr_warn("KVM MMIO guard registration failed (%ld)\n", res.a0);
+	}
+}
+
 void ioremap_page_range_hook(unsigned long addr, unsigned long end,
 			     phys_addr_t phys_addr, pgprot_t prot)
 {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 15/16] arm64: Add a helper to retrieve the PTE of a fixmap
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (13 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 14/16] arm64: Enroll into KVM's MMIO guard if required Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-20 11:16   ` Quentin Perret
  2021-07-15 16:31 ` [PATCH 16/16] arm64: Register earlycon fixmap with the MMIO guard Marc Zyngier
  2021-07-21 21:42 ` [PATCH 00/16] KVM: arm64: MMIO guard PV services Andrew Jones
  16 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

In order to transfer the early mapping state into KVM's MMIO
guard infrastucture, provide a small helper that will retrieve
the associated PTE.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/fixmap.h |  2 ++
 arch/arm64/mm/mmu.c             | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 4335800201c9..1aae625b944f 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -105,6 +105,8 @@ void __init early_fixmap_init(void);
 
 extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
 
+extern pte_t *__get_fixmap_pte(enum fixed_addresses idx);
+
 #include <asm-generic/fixmap.h>
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d74586508448..f1b7abd04025 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1286,6 +1286,21 @@ void __set_fixmap(enum fixed_addresses idx,
 	}
 }
 
+pte_t *__get_fixmap_pte(enum fixed_addresses idx)
+{
+	unsigned long 	addr = __fix_to_virt(idx);
+	pte_t *ptep;
+
+	BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses);
+
+	ptep = fixmap_pte(addr);
+
+	if (!pte_valid(*ptep))
+		return NULL;
+
+	return ptep;
+}
+
 void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
 {
 	const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 16/16] arm64: Register earlycon fixmap with the MMIO guard
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (14 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 15/16] arm64: Add a helper to retrieve the PTE of a fixmap Marc Zyngier
@ 2021-07-15 16:31 ` Marc Zyngier
  2021-07-21 21:42 ` [PATCH 00/16] KVM: arm64: MMIO guard PV services Andrew Jones
  16 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-15 16:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: will, qperret, dbrazdil, Srivatsa Vaddagiri,
	Shanker R Donthineni, James Morse, Suzuki K Poulose,
	Alexandru Elisei, kernel-team

On initialising the MMIO guard infrastructure, register the
earlycon mapping if present.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/mm/ioremap.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index d82b63bcc554..a27b58e03c93 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -32,6 +32,18 @@ static int __init ioremap_guard_setup(char *str)
 }
 early_param("ioremap_guard", ioremap_guard_setup);
 
+static void fixup_fixmap(void)
+{
+	unsigned long addr = __fix_to_virt(FIX_EARLYCON_MEM_BASE);
+	pte_t *ptep = __get_fixmap_pte(FIX_EARLYCON_MEM_BASE);
+
+	if (!ptep)
+		return;
+
+	ioremap_page_range_hook(addr, addr + PAGE_SIZE, __pte_to_phys(*ptep),
+				__pgprot(pte_val(*ptep) & PTE_ATTRINDX_MASK));
+}
+
 void kvm_init_ioremap_services(void)
 {
 	struct arm_smccc_res res;
@@ -55,6 +67,7 @@ void kvm_init_ioremap_services(void)
 			     &res);
 	if (res.a0 == SMCCC_RET_SUCCESS) {
 		static_branch_enable(&ioremap_guard_key);
+		fixup_fixmap();
 		pr_info("Using KVM MMIO guard for ioremap\n");
 	} else {
 		pr_warn("KVM MMIO guard registration failed (%ld)\n", res.a0);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid
  2021-07-15 16:31 ` [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid Marc Zyngier
@ 2021-07-19 17:18   ` Quentin Perret
  2021-07-20  8:04     ` Marc Zyngier
  2021-07-27 18:10   ` Will Deacon
  1 sibling, 1 reply; 62+ messages in thread
From: Quentin Perret @ 2021-07-19 17:18 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thursday 15 Jul 2021 at 17:31:45 (+0100), Marc Zyngier wrote:
> Make sure we don't issue CMOs when mapping something that
> is not a memory address in the S2 page tables.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 05321f4165e3..a5874ebd0354 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -619,12 +619,16 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>  	}
>  
>  	/* Perform CMOs before installation of the guest stage-2 PTE */
> -	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
> -		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
> -						granule);
> -
> -	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> -		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
> +	if (kvm_phys_is_valid(phys)) {
> +		if (mm_ops->dcache_clean_inval_poc &&
> +		    stage2_pte_cacheable(pgt, new))
> +			mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new,
> +								      mm_ops),
> +						       granule);
> +		if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> +			mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops),
> +						 granule);
> +	}

Hrmpf so this makes me realize we have a problem here, not really caused
by your patch though.

Specifically, calling kvm_pgtable_stage2_set_owner() can lead to
overriding valid mappings with invalid mappings, which is effectively an
unmap operation. In this case we should issue CMOs when unmapping a
cacheable page to ensure it is clean to the PoC, like the
kvm_pgtable_stage2_unmap() does.

Note that you patch is already an improvement over the current state of
things, because calling stage2_pte_cacheable(pgt, new),
kvm_pte_follow(new, mm_ops) and friends is bogus when 'new' is invalid
...

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid
  2021-07-19 17:18   ` Quentin Perret
@ 2021-07-20  8:04     ` Marc Zyngier
  0 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-20  8:04 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Mon, 19 Jul 2021 18:18:02 +0100,
Quentin Perret <qperret@google.com> wrote:
> 
> On Thursday 15 Jul 2021 at 17:31:45 (+0100), Marc Zyngier wrote:
> > Make sure we don't issue CMOs when mapping something that
> > is not a memory address in the S2 page tables.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 05321f4165e3..a5874ebd0354 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -619,12 +619,16 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> >  	}
> >  
> >  	/* Perform CMOs before installation of the guest stage-2 PTE */
> > -	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
> > -		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
> > -						granule);
> > -
> > -	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> > -		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
> > +	if (kvm_phys_is_valid(phys)) {
> > +		if (mm_ops->dcache_clean_inval_poc &&
> > +		    stage2_pte_cacheable(pgt, new))
> > +			mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new,
> > +								      mm_ops),
> > +						       granule);
> > +		if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> > +			mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops),
> > +						 granule);
> > +	}
> 
> Hrmpf so this makes me realize we have a problem here, not really caused
> by your patch though.
> 
> Specifically, calling kvm_pgtable_stage2_set_owner() can lead to
> overriding valid mappings with invalid mappings, which is effectively an
> unmap operation. In this case we should issue CMOs when unmapping a
> cacheable page to ensure it is clean to the PoC, like the
> kvm_pgtable_stage2_unmap() does.

You only need that if you to have a non-cacheable mapping to the same
page. Otherwise, you will be fine.

For pages that are moved from host-EL1 to EL2, we're good (I don't
think we ever have a non-cachable mapping at EL2). However, once we
start moving pages to guests, we'll need something.

> Note that you patch is already an improvement over the current state of
> things, because calling stage2_pte_cacheable(pgt, new),
> kvm_pte_follow(new, mm_ops) and friends is bogus when 'new' is invalid

Yeah, I had it mentally earmarked as a potential stable candidate. We
may have to do a bit better in the light of the above though.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate
  2021-07-15 16:31 ` [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate Marc Zyngier
@ 2021-07-20 10:09   ` Quentin Perret
  2021-07-20 10:21     ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Quentin Perret @ 2021-07-20 10:09 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  		.arg		= &map_data,
>  	};
>  
> -	if (owner_id > KVM_MAX_OWNER_ID)
> +	if (!annotation || (annotation & PTE_VALID))
>  		return -EINVAL;

Why do you consider annotation==0 invalid? The assumption so far has
been that the owner_id for the host is 0, so annotating a range with 0s
should be a valid operation -- this will be required when e.g.
transferring ownership of a page back to the host.

>  
>  	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate
  2021-07-20 10:09   ` Quentin Perret
@ 2021-07-20 10:21     ` Marc Zyngier
  2021-07-20 10:38       ` Quentin Perret
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-20 10:21 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 20 Jul 2021 11:09:21 +0100,
Quentin Perret <qperret@google.com> wrote:
> 
> On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> > @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> >  		.arg		= &map_data,
> >  	};
> >  
> > -	if (owner_id > KVM_MAX_OWNER_ID)
> > +	if (!annotation || (annotation & PTE_VALID))
> >  		return -EINVAL;
> 
> Why do you consider annotation==0 invalid? The assumption so far has
> been that the owner_id for the host is 0, so annotating a range with 0s
> should be a valid operation -- this will be required when e.g.
> transferring ownership of a page back to the host.

How do you then distinguish it from an empty entry that doesn't map to
anything at all?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate
  2021-07-20 10:21     ` Marc Zyngier
@ 2021-07-20 10:38       ` Quentin Perret
  2021-07-20 11:20         ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Quentin Perret @ 2021-07-20 10:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tuesday 20 Jul 2021 at 11:21:17 (+0100), Marc Zyngier wrote:
> On Tue, 20 Jul 2021 11:09:21 +0100,
> Quentin Perret <qperret@google.com> wrote:
> > 
> > On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> > > @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > >  		.arg		= &map_data,
> > >  	};
> > >  
> > > -	if (owner_id > KVM_MAX_OWNER_ID)
> > > +	if (!annotation || (annotation & PTE_VALID))
> > >  		return -EINVAL;
> > 
> > Why do you consider annotation==0 invalid? The assumption so far has
> > been that the owner_id for the host is 0, so annotating a range with 0s
> > should be a valid operation -- this will be required when e.g.
> > transferring ownership of a page back to the host.
> 
> How do you then distinguish it from an empty entry that doesn't map to
> anything at all?

You don't, but that's beauty of it :)

The host starts with a PGD full of zeroes, which in terms of ownership
means that it owns the entire (I)PA space. And it loses ownership of a
page only when we explicitly annotate it with an owner id != 0.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-15 16:31 ` [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure Marc Zyngier
@ 2021-07-20 11:13   ` Quentin Perret
  2021-07-20 13:15     ` Marc Zyngier
  2021-07-27 18:11   ` Will Deacon
  1 sibling, 1 reply; 62+ messages in thread
From: Quentin Perret @ 2021-07-20 11:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thursday 15 Jul 2021 at 17:31:47 (+0100), Marc Zyngier wrote:
> +struct s2_walk_data {
> +	kvm_pte_t	pteval;
> +	u32		level;
> +};
> +
> +static int s2_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +		     enum kvm_pgtable_walk_flags flag, void * const arg)
> +{
> +	struct s2_walk_data *data = arg;
> +
> +	data->level = level;
> +	data->pteval = *ptep;
> +	return 0;
> +}
> +
> +/* Assumes mmu_lock taken */
> +static bool __check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> +{
> +	struct s2_walk_data data;
> +	struct kvm_pgtable_walker walker = {
> +		.cb             = s2_walker,
> +		.flags          = KVM_PGTABLE_WALK_LEAF,
> +		.arg            = &data,
> +	};
> +
> +	kvm_pgtable_walk(vcpu->arch.hw_mmu->pgt, ALIGN_DOWN(ipa, PAGE_SIZE),
> +			 PAGE_SIZE, &walker);
> +
> +	/* Must be a PAGE_SIZE mapping with our annotation */
> +	return (BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(data.level)) == PAGE_SIZE &&
> +		data.pteval == MMIO_NOTE);

Nit: you could do this check in the walker directly and check the return
value of kvm_pgtable_walk() instead. That would allow to get rid of
struct s2_walk_data.

Also, though the compiler might be able to optimize, maybe simplify the
level check to level == (KVM_PGTABLE_MAX_LEVELS - 1)?

Thanks,
Quentin

> +}

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/16] arm64: Add a helper to retrieve the PTE of a fixmap
  2021-07-15 16:31 ` [PATCH 15/16] arm64: Add a helper to retrieve the PTE of a fixmap Marc Zyngier
@ 2021-07-20 11:16   ` Quentin Perret
  0 siblings, 0 replies; 62+ messages in thread
From: Quentin Perret @ 2021-07-20 11:16 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thursday 15 Jul 2021 at 17:31:58 (+0100), Marc Zyngier wrote:
> In order to transfer the early mapping state into KVM's MMIO
> guard infrastucture, provide a small helper that will retrieve
> the associated PTE.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/fixmap.h |  2 ++
>  arch/arm64/mm/mmu.c             | 15 +++++++++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
> index 4335800201c9..1aae625b944f 100644
> --- a/arch/arm64/include/asm/fixmap.h
> +++ b/arch/arm64/include/asm/fixmap.h
> @@ -105,6 +105,8 @@ void __init early_fixmap_init(void);
>  
>  extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
>  
> +extern pte_t *__get_fixmap_pte(enum fixed_addresses idx);
> +
>  #include <asm-generic/fixmap.h>
>  
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index d74586508448..f1b7abd04025 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1286,6 +1286,21 @@ void __set_fixmap(enum fixed_addresses idx,
>  	}
>  }
>  
> +pte_t *__get_fixmap_pte(enum fixed_addresses idx)
> +{
> +	unsigned long 	addr = __fix_to_virt(idx);

Nit: odd spacing here.

> +	pte_t *ptep;
> +
> +	BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses);
> +
> +	ptep = fixmap_pte(addr);
> +
> +	if (!pte_valid(*ptep))
> +		return NULL;
> +
> +	return ptep;
> +}
> +
>  void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
>  {
>  	const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate
  2021-07-20 10:38       ` Quentin Perret
@ 2021-07-20 11:20         ` Marc Zyngier
  2021-07-20 11:36           ` Quentin Perret
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-20 11:20 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 20 Jul 2021 11:38:17 +0100,
Quentin Perret <qperret@google.com> wrote:
> 
> On Tuesday 20 Jul 2021 at 11:21:17 (+0100), Marc Zyngier wrote:
> > On Tue, 20 Jul 2021 11:09:21 +0100,
> > Quentin Perret <qperret@google.com> wrote:
> > > 
> > > On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> > > > @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > > >  		.arg		= &map_data,
> > > >  	};
> > > >  
> > > > -	if (owner_id > KVM_MAX_OWNER_ID)
> > > > +	if (!annotation || (annotation & PTE_VALID))
> > > >  		return -EINVAL;
> > > 
> > > Why do you consider annotation==0 invalid? The assumption so far has
> > > been that the owner_id for the host is 0, so annotating a range with 0s
> > > should be a valid operation -- this will be required when e.g.
> > > transferring ownership of a page back to the host.
> > 
> > How do you then distinguish it from an empty entry that doesn't map to
> > anything at all?
> 
> You don't, but that's beauty of it :)
> 
> The host starts with a PGD full of zeroes, which in terms of ownership
> means that it owns the entire (I)PA space. And it loses ownership of a
> page only when we explicitly annotate it with an owner id != 0.

Right. But this scheme doesn't apply to the guests, does it? Don't we
need something that is non-null to preserve the table refcounting?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate
  2021-07-20 11:20         ` Marc Zyngier
@ 2021-07-20 11:36           ` Quentin Perret
  2021-07-20 13:13             ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Quentin Perret @ 2021-07-20 11:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tuesday 20 Jul 2021 at 12:20:58 (+0100), Marc Zyngier wrote:
> On Tue, 20 Jul 2021 11:38:17 +0100,
> Quentin Perret <qperret@google.com> wrote:
> > 
> > On Tuesday 20 Jul 2021 at 11:21:17 (+0100), Marc Zyngier wrote:
> > > On Tue, 20 Jul 2021 11:09:21 +0100,
> > > Quentin Perret <qperret@google.com> wrote:
> > > > 
> > > > On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> > > > > @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > > > >  		.arg		= &map_data,
> > > > >  	};
> > > > >  
> > > > > -	if (owner_id > KVM_MAX_OWNER_ID)
> > > > > +	if (!annotation || (annotation & PTE_VALID))
> > > > >  		return -EINVAL;
> > > > 
> > > > Why do you consider annotation==0 invalid? The assumption so far has
> > > > been that the owner_id for the host is 0, so annotating a range with 0s
> > > > should be a valid operation -- this will be required when e.g.
> > > > transferring ownership of a page back to the host.
> > > 
> > > How do you then distinguish it from an empty entry that doesn't map to
> > > anything at all?
> > 
> > You don't, but that's beauty of it :)
> > 
> > The host starts with a PGD full of zeroes, which in terms of ownership
> > means that it owns the entire (I)PA space. And it loses ownership of a
> > page only when we explicitly annotate it with an owner id != 0.
> 
> Right. But this scheme doesn't apply to the guests, does it?

Right, the meaning of a NULL PTE in guests will clearly be something
different, but I guess the interpretation of what invalid mappings mean
is up to the caller.

> Don't we
> need something that is non-null to preserve the table refcounting?

Sure, but do we care? If the table entry gets zeroed we're then
basically using an 'invalid block' mapping to annotate the entire block
range with '0', whatever that means. For guests it won't mean much, but
for the host that would mean sole ownership of the entire range.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate
  2021-07-20 11:36           ` Quentin Perret
@ 2021-07-20 13:13             ` Marc Zyngier
  0 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-20 13:13 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 20 Jul 2021 12:36:21 +0100,
Quentin Perret <qperret@google.com> wrote:
> 
> On Tuesday 20 Jul 2021 at 12:20:58 (+0100), Marc Zyngier wrote:
> > On Tue, 20 Jul 2021 11:38:17 +0100,
> > Quentin Perret <qperret@google.com> wrote:
> > > 
> > > On Tuesday 20 Jul 2021 at 11:21:17 (+0100), Marc Zyngier wrote:
> > > > On Tue, 20 Jul 2021 11:09:21 +0100,
> > > > Quentin Perret <qperret@google.com> wrote:
> > > > > 
> > > > > On Thursday 15 Jul 2021 at 17:31:46 (+0100), Marc Zyngier wrote:
> > > > > > @@ -815,7 +807,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > > > > >  		.arg		= &map_data,
> > > > > >  	};
> > > > > >  
> > > > > > -	if (owner_id > KVM_MAX_OWNER_ID)
> > > > > > +	if (!annotation || (annotation & PTE_VALID))
> > > > > >  		return -EINVAL;
> > > > > 
> > > > > Why do you consider annotation==0 invalid? The assumption so far has
> > > > > been that the owner_id for the host is 0, so annotating a range with 0s
> > > > > should be a valid operation -- this will be required when e.g.
> > > > > transferring ownership of a page back to the host.
> > > > 
> > > > How do you then distinguish it from an empty entry that doesn't map to
> > > > anything at all?
> > > 
> > > You don't, but that's beauty of it :)
> > > 
> > > The host starts with a PGD full of zeroes, which in terms of ownership
> > > means that it owns the entire (I)PA space. And it loses ownership of a
> > > page only when we explicitly annotate it with an owner id != 0.
> > 
> > Right. But this scheme doesn't apply to the guests, does it?
> 
> Right, the meaning of a NULL PTE in guests will clearly be something
> different, but I guess the interpretation of what invalid mappings mean
> is up to the caller.
> 
> > Don't we
> > need something that is non-null to preserve the table refcounting?
> 
> Sure, but do we care? If the table entry gets zeroed we're then
> basically using an 'invalid block' mapping to annotate the entire block
> range with '0', whatever that means. For guests it won't mean much, but
> for the host that would mean sole ownership of the entire range.

I see. You let the refcount drop to 0, unmap the table and let
transfer the 0 annotation one level up, covering the whole block.

I guess I'll revert back to allowing 0, but I'd like to make sure we
don't do that for guests unless we actually tear down the address
space (checking for KVM_PGTABLE_S2_IDMAP should work).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-20 11:13   ` Quentin Perret
@ 2021-07-20 13:15     ` Marc Zyngier
  2021-07-20 15:49       ` Quentin Perret
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-20 13:15 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 20 Jul 2021 12:13:20 +0100,
Quentin Perret <qperret@google.com> wrote:
> 
> On Thursday 15 Jul 2021 at 17:31:47 (+0100), Marc Zyngier wrote:
> > +struct s2_walk_data {
> > +	kvm_pte_t	pteval;
> > +	u32		level;
> > +};
> > +
> > +static int s2_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > +		     enum kvm_pgtable_walk_flags flag, void * const arg)
> > +{
> > +	struct s2_walk_data *data = arg;
> > +
> > +	data->level = level;
> > +	data->pteval = *ptep;
> > +	return 0;
> > +}
> > +
> > +/* Assumes mmu_lock taken */
> > +static bool __check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> > +{
> > +	struct s2_walk_data data;
> > +	struct kvm_pgtable_walker walker = {
> > +		.cb             = s2_walker,
> > +		.flags          = KVM_PGTABLE_WALK_LEAF,
> > +		.arg            = &data,
> > +	};
> > +
> > +	kvm_pgtable_walk(vcpu->arch.hw_mmu->pgt, ALIGN_DOWN(ipa, PAGE_SIZE),
> > +			 PAGE_SIZE, &walker);
> > +
> > +	/* Must be a PAGE_SIZE mapping with our annotation */
> > +	return (BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(data.level)) == PAGE_SIZE &&
> > +		data.pteval == MMIO_NOTE);
> 
> Nit: you could do this check in the walker directly and check the return
> value of kvm_pgtable_walk() instead. That would allow to get rid of
> struct s2_walk_data.
> 
> Also, though the compiler might be able to optimize, maybe simplify the
> level check to level == (KVM_PGTABLE_MAX_LEVELS - 1)?

Yup, all good points. I guess I could do the same in my other series
that parses the userspace PT to extract the level.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-20 13:15     ` Marc Zyngier
@ 2021-07-20 15:49       ` Quentin Perret
  2021-07-22 18:04         ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Quentin Perret @ 2021-07-20 15:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tuesday 20 Jul 2021 at 14:15:56 (+0100), Marc Zyngier wrote:
> On Tue, 20 Jul 2021 12:13:20 +0100,
> Quentin Perret <qperret@google.com> wrote:
> > 
> > On Thursday 15 Jul 2021 at 17:31:47 (+0100), Marc Zyngier wrote:
> > > +struct s2_walk_data {
> > > +	kvm_pte_t	pteval;
> > > +	u32		level;
> > > +};
> > > +
> > > +static int s2_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > > +		     enum kvm_pgtable_walk_flags flag, void * const arg)
> > > +{
> > > +	struct s2_walk_data *data = arg;
> > > +
> > > +	data->level = level;
> > > +	data->pteval = *ptep;
> > > +	return 0;
> > > +}
> > > +
> > > +/* Assumes mmu_lock taken */
> > > +static bool __check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> > > +{
> > > +	struct s2_walk_data data;
> > > +	struct kvm_pgtable_walker walker = {
> > > +		.cb             = s2_walker,
> > > +		.flags          = KVM_PGTABLE_WALK_LEAF,
> > > +		.arg            = &data,
> > > +	};
> > > +
> > > +	kvm_pgtable_walk(vcpu->arch.hw_mmu->pgt, ALIGN_DOWN(ipa, PAGE_SIZE),
> > > +			 PAGE_SIZE, &walker);
> > > +
> > > +	/* Must be a PAGE_SIZE mapping with our annotation */
> > > +	return (BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(data.level)) == PAGE_SIZE &&
> > > +		data.pteval == MMIO_NOTE);
> > 
> > Nit: you could do this check in the walker directly and check the return
> > value of kvm_pgtable_walk() instead. That would allow to get rid of
> > struct s2_walk_data.
> > 
> > Also, though the compiler might be able to optimize, maybe simplify the
> > level check to level == (KVM_PGTABLE_MAX_LEVELS - 1)?
> 
> Yup, all good points. I guess I could do the same in my other series
> that parses the userspace PT to extract the level.

Well, actually, let me take that back. I think something like you have
would be useful, but in pgtable.c directly and re-usable for stage-1 and
stage-2 walks. Maybe something like the below (totally untested)?

I could use such a walker in several places as well in the memory
ownership series:

 - following the idea of [1], I could remove the
   kvm_pgtable_stage2_find_range() function entirely;

 - [2] defines 2 custom walkers that do nothing but walk host stage-2
   and hyp stage-1 page-tables to check permissions and such --  they
   could be removed/re-implemented easily as well.

And you seem to need something similar here, so clearly there is a need.
WDYT?

Thanks,
Quentin

[1] https://lore.kernel.org/kvmarm/20210719104735.3681732-3-qperret@google.com/
[2] https://lore.kernel.org/kvmarm/20210719104735.3681732-14-qperret@google.com/

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index e0ae57dca827..bd6d26f27e1a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -357,6 +357,38 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
        return _kvm_pgtable_walk(&walk_data);
 }

+struct get_leaf_data {
+       kvm_pte_t *ptep;
+       u32 *level;
+};
+
+static int get_leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+                          enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+       struct get_leaf_data *data = arg;
+
+       *(data->ptep) = *ptep;
+       *(data->level) = level;
+
+       return 0;
+}
+
+int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr, kvm_pte_t *ptep,
+                        u32 *level)
+{
+       struct get_leaf_data data = {
+               .ptep = ptep,
+               .level = level,
+       };
+       struct kvm_pgtable_walker __get_leaf_walker = {
+               .cb             = get_leaf_walker,
+               .flags          = KVM_PGTABLE_WALK_LEAF,
+               .arg            = &data,
+       };
+
+       return kvm_pgtable_walk(pgt, addr, PAGE_SIZE, &__get_leaf_walker);
+}
+
 struct hyp_map_data {
        u64                             phys;
        kvm_pte_t                       attr;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature
  2021-07-15 16:31 ` [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature Marc Zyngier
@ 2021-07-21 21:17   ` Andrew Jones
  2021-07-23 13:30     ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Andrew Jones @ 2021-07-21 21:17 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, kernel-team,
	Srivatsa Vaddagiri, Shanker R Donthineni, will

On Thu, Jul 15, 2021 at 05:31:53PM +0100, Marc Zyngier wrote:
> Document the hypercalls user for the MMIO guard infrastructure.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  Documentation/virt/kvm/arm/index.rst      |  1 +
>  Documentation/virt/kvm/arm/mmio-guard.rst | 73 +++++++++++++++++++++++
>  2 files changed, 74 insertions(+)
>  create mode 100644 Documentation/virt/kvm/arm/mmio-guard.rst
> 
> diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
> index 78a9b670aafe..e77a0ee2e2d4 100644
> --- a/Documentation/virt/kvm/arm/index.rst
> +++ b/Documentation/virt/kvm/arm/index.rst
> @@ -11,3 +11,4 @@ ARM
>     psci
>     pvtime
>     ptp_kvm
> +   mmio-guard
> diff --git a/Documentation/virt/kvm/arm/mmio-guard.rst b/Documentation/virt/kvm/arm/mmio-guard.rst
> new file mode 100644
> index 000000000000..a5563a3e12cc
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/mmio-guard.rst
> @@ -0,0 +1,73 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============
> +KVM MMIO guard
> +==============
> +
> +KVM implements device emulation by handling translation faults to any
> +IPA range that is not contained a memory slot. Such translation fault
                                  ^ in                ^ a

> +is in most cases passed on to userspace (or in rare cases to the host
> +kernel) with the address, size and possibly data of the access for
> +emulation.
> +
> +Should the guest exit with an address that is not one that corresponds
> +to an emulatable device, userspace may take measures that are not the
> +most graceful as far as the guest is concerned (such as terminating it
> +or delivering a fatal exception).
> +
> +There is also an element of trust: by forwarding the request to
> +userspace, the kernel asumes that the guest trusts userspace to do the

assumes
  
> +right thing.
> +
> +The KVM MMIO guard offers a way to mitigate this last point: a guest
> +can request that only certainly regions of the IPA space are valid as

certain

> +MMIO. Only these regions will be handled as an MMIO, and any other
> +will result in an exception being delivered to the guest.
> +
> +This relies on a set of hypercalls defined in the KVM-specific range,
> +using the HVC64 calling convention.
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO
> +
> +    ==============    ========    ================================
> +    Function ID:      (uint32)    0xC6000002
> +    Arguments:        none
> +    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
> +                      (uint64)    Protection Granule (PG) size in
> +		                  bytes (r0)
> +    ==============    ========    ================================
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL
> +
> +    ==============    ========    ==============================
> +    Function ID:      (uint32)    0xC6000003
> +    Arguments:        none
> +    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
> +                                  RET_SUCCESS(0) (r0)
> +    ==============    ========    ==============================
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP
> +
> +    ==============    ========    ======================================
> +    Function ID:      (uint32)    0xC6000004
> +    Arguments:        (uint64)    The base of the PG-sized IPA range
> +                                  that is allowed to be accessed as
> +				  MMIO. Must aligned to the PG size (r1)

align

> +                      (uint64)    Index in the MAIR_EL1 register
> +		                  providing the memory attribute that
> +				  is used by the guest (r2)
> +    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
> +                                  RET_SUCCESS(0) (r0)
> +    ==============    ========    ======================================
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP
> +
> +    ==============    ========    ======================================
> +    Function ID:      (uint32)    0xC6000004

copy+paste error, should be 0xC6000005

> +    Arguments:        (uint64)    The base of the PG-sized IPA range
> +                                  that is forbidden to be accessed as

is now forbidden

or

was allowed

or just drop that part of the sentence because its covered by the "and
have been previously mapped" part. Something like

PG-sized IPA range aligned to the PG size which has been previously mapped
(r1)

> +				  MMIO. Must aligned to the PG size

align

> +				  and have been previously mapped (r1)
> +    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
> +                                  RET_SUCCESS(0) (r0)
> +    ==============    ========    ======================================
> -- 
> 2.30.2
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 

Thanks,
drew


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 00/16] KVM: arm64: MMIO guard PV services
  2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
                   ` (15 preceding siblings ...)
  2021-07-15 16:31 ` [PATCH 16/16] arm64: Register earlycon fixmap with the MMIO guard Marc Zyngier
@ 2021-07-21 21:42 ` Andrew Jones
  2021-07-22 10:00   ` Marc Zyngier
  16 siblings, 1 reply; 62+ messages in thread
From: Andrew Jones @ 2021-07-21 21:42 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, kernel-team,
	Srivatsa Vaddagiri, Shanker R Donthineni, will

On Thu, Jul 15, 2021 at 05:31:43PM +0100, Marc Zyngier wrote:
> KVM/arm64 currently considers that any memory access outside of a
> memslot is a MMIO access. This so far has served us very well, but
> obviously relies on the guest trusting the host, and especially
> userspace to do the right thing.
> 
> As we keep on hacking away at pKVM, it becomes obvious that this trust
> model is not really fit for a confidential computing environment, and
> that the guest would require some guarantees that emulation only
> occurs on portions of the address space that have clearly been
> identified for this purpose.

This trust model is hard for me to reason about. userspace is trusted to
control the life cycle of the VM, to prepare the memslots for the VM,
and [presumably] identify what MMIO ranges are valid, yet it's not
trusted to handle invalid MMIO accesses. I'd like to learn more about
this model and the userspace involved.

> 
> This series aims at providing the two sides of the above coin:
> 
> - a set of PV services (collectively called 'MMIO guard' -- better
>   name required!) where the guest can flag portion of its address
>   space that it considers as MMIO, with map/unmap semantics. Any
>   attempt to access a MMIO range outside of these regions will result
>   in an external abort being injected.
> 
> - a set of hooks into the ioremap code allowing a Linux guest to tell
>   KVM about things it want to consider as MMIO. I definitely hate this
>   part of the series, as it feels clumsy and brittle.
> 
> For now, the enrolment in this scheme is controlled by a guest kernel
> command-line parameters, but it is expected that KVM will enforce this
> for protected VMs.
> 
> Note that this crucially misses a save/restore interface for non
> protected VMs, and I currently don't have a good solution for
> that. Ideas welcome.
> 
> I also plan to use this series as a base for some other purposes,
> namely to trick the guest in telling us how it maps things like
> prefetchable BARs (see the discussion at [1]). That part is not
> implemented yet, but there is already some provision to pass the MAIR
> index across.
> 
> Patches on top of 5.14-rc1, branch pushed at the usual location.
> 
> [1] 20210429162906.32742-1-sdonthineni@nvidia.com

The fun never stops.

Thanks,
drew


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 00/16] KVM: arm64: MMIO guard PV services
  2021-07-21 21:42 ` [PATCH 00/16] KVM: arm64: MMIO guard PV services Andrew Jones
@ 2021-07-22 10:00   ` Marc Zyngier
  2021-07-22 13:25     ` Andrew Jones
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-22 10:00 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, kernel-team,
	Srivatsa Vaddagiri, Shanker R Donthineni, will

On Wed, 21 Jul 2021 22:42:43 +0100,
Andrew Jones <drjones@redhat.com> wrote:
> 
> On Thu, Jul 15, 2021 at 05:31:43PM +0100, Marc Zyngier wrote:
> > KVM/arm64 currently considers that any memory access outside of a
> > memslot is a MMIO access. This so far has served us very well, but
> > obviously relies on the guest trusting the host, and especially
> > userspace to do the right thing.
> > 
> > As we keep on hacking away at pKVM, it becomes obvious that this trust
> > model is not really fit for a confidential computing environment, and
> > that the guest would require some guarantees that emulation only
> > occurs on portions of the address space that have clearly been
> > identified for this purpose.
> 
> This trust model is hard for me to reason about. userspace is trusted to
> control the life cycle of the VM, to prepare the memslots for the VM,
> and [presumably] identify what MMIO ranges are valid, yet it's not
> trusted to handle invalid MMIO accesses. I'd like to learn more about
> this model and the userspace involved.

Imagine the following scenario:

On top of the normal memory described as memslots (which pKVM will
ensure that userspace cannot access), a malicious userspace describes
to the guest another memory region in a firmware table and does not
back it with a memslot.

The hypervisor cannot validate this firmware description (imagine
doing ACPI and DT parsing at EL2...), so the guest starts using this
"memory" for something, and data slowly trickles all the way to EL0.
Not what you wanted.

To ensure that this doesn't happen, we reverse the problem: userspace
(and ultimately the EL1 kernel) doesn't get involved on a translation
fault outside of a memslot *unless* the guest has explicitly asked for
that page to be handled as a MMIO. With that, we have a full
description of the IPA space contained in the S2 page tables:

- memory described via a memslot,
- directly mapped device (GICv2, for exmaple),
- MMIO exposed for emulation

and anything else is an invalid access that results in an abort.

Does this make sense to you?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 00/16] KVM: arm64: MMIO guard PV services
  2021-07-22 10:00   ` Marc Zyngier
@ 2021-07-22 13:25     ` Andrew Jones
  2021-07-22 15:30       ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Andrew Jones @ 2021-07-22 13:25 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, kernel-team,
	Srivatsa Vaddagiri, Shanker R Donthineni, will

On Thu, Jul 22, 2021 at 11:00:26AM +0100, Marc Zyngier wrote:
> On Wed, 21 Jul 2021 22:42:43 +0100,
> Andrew Jones <drjones@redhat.com> wrote:
> > 
> > On Thu, Jul 15, 2021 at 05:31:43PM +0100, Marc Zyngier wrote:
> > > KVM/arm64 currently considers that any memory access outside of a
> > > memslot is a MMIO access. This so far has served us very well, but
> > > obviously relies on the guest trusting the host, and especially
> > > userspace to do the right thing.
> > > 
> > > As we keep on hacking away at pKVM, it becomes obvious that this trust
> > > model is not really fit for a confidential computing environment, and
> > > that the guest would require some guarantees that emulation only
> > > occurs on portions of the address space that have clearly been
> > > identified for this purpose.
> > 
> > This trust model is hard for me to reason about. userspace is trusted to
> > control the life cycle of the VM, to prepare the memslots for the VM,
> > and [presumably] identify what MMIO ranges are valid, yet it's not
> > trusted to handle invalid MMIO accesses. I'd like to learn more about
> > this model and the userspace involved.
> 
> Imagine the following scenario:
> 
> On top of the normal memory described as memslots (which pKVM will
> ensure that userspace cannot access),

Ah, I didn't know that part.

> a malicious userspace describes
> to the guest another memory region in a firmware table and does not
> back it with a memslot.
> 
> The hypervisor cannot validate this firmware description (imagine
> doing ACPI and DT parsing at EL2...), so the guest starts using this
> "memory" for something, and data slowly trickles all the way to EL0.
> Not what you wanted.

Yes, I see that now, in light of the above.

> 
> To ensure that this doesn't happen, we reverse the problem: userspace
> (and ultimately the EL1 kernel) doesn't get involved on a translation
> fault outside of a memslot *unless* the guest has explicitly asked for
> that page to be handled as a MMIO. With that, we have a full
> description of the IPA space contained in the S2 page tables:
> 
> - memory described via a memslot,
> - directly mapped device (GICv2, for exmaple),
> - MMIO exposed for emulation
> 
> and anything else is an invalid access that results in an abort.
> 
> Does this make sense to you?

Now I understand better, but if we're worried about malicious userspaces,
then how do we protect the guest from "bad" MMIO devices that have been
described to it? The guest can request access to those using this new
mechanism.

Thanks,
drew


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 00/16] KVM: arm64: MMIO guard PV services
  2021-07-22 13:25     ` Andrew Jones
@ 2021-07-22 15:30       ` Marc Zyngier
  0 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-22 15:30 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, kernel-team,
	Srivatsa Vaddagiri, Shanker R Donthineni, will

On Thu, 22 Jul 2021 14:25:15 +0100,
Andrew Jones <drjones@redhat.com> wrote:
> 
> On Thu, Jul 22, 2021 at 11:00:26AM +0100, Marc Zyngier wrote:
> > On Wed, 21 Jul 2021 22:42:43 +0100,
> > Andrew Jones <drjones@redhat.com> wrote:
> > > 
> > > On Thu, Jul 15, 2021 at 05:31:43PM +0100, Marc Zyngier wrote:
> > > > KVM/arm64 currently considers that any memory access outside of a
> > > > memslot is a MMIO access. This so far has served us very well, but
> > > > obviously relies on the guest trusting the host, and especially
> > > > userspace to do the right thing.
> > > > 
> > > > As we keep on hacking away at pKVM, it becomes obvious that this trust
> > > > model is not really fit for a confidential computing environment, and
> > > > that the guest would require some guarantees that emulation only
> > > > occurs on portions of the address space that have clearly been
> > > > identified for this purpose.
> > > 
> > > This trust model is hard for me to reason about. userspace is trusted to
> > > control the life cycle of the VM, to prepare the memslots for the VM,
> > > and [presumably] identify what MMIO ranges are valid, yet it's not
> > > trusted to handle invalid MMIO accesses. I'd like to learn more about
> > > this model and the userspace involved.
> > 
> > Imagine the following scenario:
> > 
> > On top of the normal memory described as memslots (which pKVM will
> > ensure that userspace cannot access),
> 
> Ah, I didn't know that part.

Yeah, that's the crucial bit. By default, pKVM guests do not share any
memory with anyone, so the memslots are made inaccessible from both
the VMM and the host kernel. The guest has to explicitly change the
state of the memory it wants to share back with the host for things
like IO.

> 
> > a malicious userspace describes
> > to the guest another memory region in a firmware table and does not
> > back it with a memslot.
> > 
> > The hypervisor cannot validate this firmware description (imagine
> > doing ACPI and DT parsing at EL2...), so the guest starts using this
> > "memory" for something, and data slowly trickles all the way to EL0.
> > Not what you wanted.
> 
> Yes, I see that now, in light of the above.
> 
> > 
> > To ensure that this doesn't happen, we reverse the problem: userspace
> > (and ultimately the EL1 kernel) doesn't get involved on a translation
> > fault outside of a memslot *unless* the guest has explicitly asked for
> > that page to be handled as a MMIO. With that, we have a full
> > description of the IPA space contained in the S2 page tables:
> > 
> > - memory described via a memslot,
> > - directly mapped device (GICv2, for exmaple),
> > - MMIO exposed for emulation
> > 
> > and anything else is an invalid access that results in an abort.
> > 
> > Does this make sense to you?
> 
> Now I understand better, but if we're worried about malicious userspaces,
> then how do we protect the guest from "bad" MMIO devices that have been
> described to it? The guest can request access to those using this new
> mechanism.

We don't try to do anything about a malicious IO device. Any IO should
be considered as malicious, and you don't want to give it anything in
clear-text if it is supposed to be secret.

Eventually, you'd probably want directly assigned devices that can
attest to the guest that they are what they pretend to be, but that's
a long way away. For now, we only want to enable virtio with a reduced
level of trust (bounce buffering via shared pages for DMA, and reduced
MMIO exposure).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-20 15:49       ` Quentin Perret
@ 2021-07-22 18:04         ` Marc Zyngier
  2021-07-23 10:16           ` Quentin Perret
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-22 18:04 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 20 Jul 2021 16:49:45 +0100,
Quentin Perret <qperret@google.com> wrote:
> 
> On Tuesday 20 Jul 2021 at 14:15:56 (+0100), Marc Zyngier wrote:
> > On Tue, 20 Jul 2021 12:13:20 +0100,
> > Quentin Perret <qperret@google.com> wrote:
> > > 
> > > On Thursday 15 Jul 2021 at 17:31:47 (+0100), Marc Zyngier wrote:
> > > > +struct s2_walk_data {
> > > > +	kvm_pte_t	pteval;
> > > > +	u32		level;
> > > > +};
> > > > +
> > > > +static int s2_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > > > +		     enum kvm_pgtable_walk_flags flag, void * const arg)
> > > > +{
> > > > +	struct s2_walk_data *data = arg;
> > > > +
> > > > +	data->level = level;
> > > > +	data->pteval = *ptep;
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +/* Assumes mmu_lock taken */
> > > > +static bool __check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> > > > +{
> > > > +	struct s2_walk_data data;
> > > > +	struct kvm_pgtable_walker walker = {
> > > > +		.cb             = s2_walker,
> > > > +		.flags          = KVM_PGTABLE_WALK_LEAF,
> > > > +		.arg            = &data,
> > > > +	};
> > > > +
> > > > +	kvm_pgtable_walk(vcpu->arch.hw_mmu->pgt, ALIGN_DOWN(ipa, PAGE_SIZE),
> > > > +			 PAGE_SIZE, &walker);
> > > > +
> > > > +	/* Must be a PAGE_SIZE mapping with our annotation */
> > > > +	return (BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(data.level)) == PAGE_SIZE &&
> > > > +		data.pteval == MMIO_NOTE);
> > > 
> > > Nit: you could do this check in the walker directly and check the return
> > > value of kvm_pgtable_walk() instead. That would allow to get rid of
> > > struct s2_walk_data.
> > > 
> > > Also, though the compiler might be able to optimize, maybe simplify the
> > > level check to level == (KVM_PGTABLE_MAX_LEVELS - 1)?
> > 
> > Yup, all good points. I guess I could do the same in my other series
> > that parses the userspace PT to extract the level.
> 
> Well, actually, let me take that back. I think something like you have
> would be useful, but in pgtable.c directly and re-usable for stage-1 and
> stage-2 walks. Maybe something like the below (totally untested)?
> 
> I could use such a walker in several places as well in the memory
> ownership series:
> 
>  - following the idea of [1], I could remove the
>    kvm_pgtable_stage2_find_range() function entirely;
> 
>  - [2] defines 2 custom walkers that do nothing but walk host stage-2
>    and hyp stage-1 page-tables to check permissions and such --  they
>    could be removed/re-implemented easily as well.
> 
> And you seem to need something similar here, so clearly there is a need.
> WDYT?

So FWIW, I've now pushed out an updated series for the THP changes[1],
and you will find a similar patch at the base of the branch. Please
have a look and let me know what you think!

Thanks,

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/mmu/mapping-levels

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-22 18:04         ` Marc Zyngier
@ 2021-07-23 10:16           ` Quentin Perret
  0 siblings, 0 replies; 62+ messages in thread
From: Quentin Perret @ 2021-07-23 10:16 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, will, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thursday 22 Jul 2021 at 19:04:59 (+0100), Marc Zyngier wrote:
> So FWIW, I've now pushed out an updated series for the THP changes[1],
> and you will find a similar patch at the base of the branch. Please
> have a look and let me know what you think!

I Like the look of it! I'll pull this patch in my series and rebase on
top -- that should introduce three new users or so, and allow a few nice
cleanups.

Thanks!
Quentin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature
  2021-07-21 21:17   ` Andrew Jones
@ 2021-07-23 13:30     ` Marc Zyngier
  2021-07-23 13:38       ` Andrew Jones
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-23 13:30 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, kernel-team,
	Srivatsa Vaddagiri, Shanker R Donthineni, will

On 2021-07-21 22:17, Andrew Jones wrote:
> On Thu, Jul 15, 2021 at 05:31:53PM +0100, Marc Zyngier wrote:
>> Document the hypercalls user for the MMIO guard infrastructure.
>> 
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>>  Documentation/virt/kvm/arm/index.rst      |  1 +
>>  Documentation/virt/kvm/arm/mmio-guard.rst | 73 
>> +++++++++++++++++++++++
>>  2 files changed, 74 insertions(+)
>>  create mode 100644 Documentation/virt/kvm/arm/mmio-guard.rst
>> 
>> diff --git a/Documentation/virt/kvm/arm/index.rst 
>> b/Documentation/virt/kvm/arm/index.rst
>> index 78a9b670aafe..e77a0ee2e2d4 100644
>> --- a/Documentation/virt/kvm/arm/index.rst
>> +++ b/Documentation/virt/kvm/arm/index.rst
>> @@ -11,3 +11,4 @@ ARM
>>     psci
>>     pvtime
>>     ptp_kvm
>> +   mmio-guard
>> diff --git a/Documentation/virt/kvm/arm/mmio-guard.rst 
>> b/Documentation/virt/kvm/arm/mmio-guard.rst
>> new file mode 100644
>> index 000000000000..a5563a3e12cc
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/arm/mmio-guard.rst
>> @@ -0,0 +1,73 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==============
>> +KVM MMIO guard
>> +==============
>> +
>> +KVM implements device emulation by handling translation faults to any
>> +IPA range that is not contained a memory slot. Such translation fault
>                                   ^ in                ^ a
> 
>> +is in most cases passed on to userspace (or in rare cases to the host
>> +kernel) with the address, size and possibly data of the access for
>> +emulation.
>> +
>> +Should the guest exit with an address that is not one that 
>> corresponds
>> +to an emulatable device, userspace may take measures that are not the
>> +most graceful as far as the guest is concerned (such as terminating 
>> it
>> +or delivering a fatal exception).
>> +
>> +There is also an element of trust: by forwarding the request to
>> +userspace, the kernel asumes that the guest trusts userspace to do 
>> the
> 
> assumes
> 
>> +right thing.
>> +
>> +The KVM MMIO guard offers a way to mitigate this last point: a guest
>> +can request that only certainly regions of the IPA space are valid as
> 
> certain

Thanks, all corrections applied.

> 
>> +MMIO. Only these regions will be handled as an MMIO, and any other
>> +will result in an exception being delivered to the guest.
>> +
>> +This relies on a set of hypercalls defined in the KVM-specific range,
>> +using the HVC64 calling convention.
>> +
>> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO
>> +
>> +    ==============    ========    ================================
>> +    Function ID:      (uint32)    0xC6000002
>> +    Arguments:        none
>> +    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
>> +                      (uint64)    Protection Granule (PG) size in
>> +		                  bytes (r0)
>> +    ==============    ========    ================================
>> +
>> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL
>> +
>> +    ==============    ========    ==============================
>> +    Function ID:      (uint32)    0xC6000003
>> +    Arguments:        none
>> +    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
>> +                                  RET_SUCCESS(0) (r0)
>> +    ==============    ========    ==============================
>> +
>> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP
>> +
>> +    ==============    ========    
>> ======================================
>> +    Function ID:      (uint32)    0xC6000004
>> +    Arguments:        (uint64)    The base of the PG-sized IPA range
>> +                                  that is allowed to be accessed as
>> +				  MMIO. Must aligned to the PG size (r1)
> 
> align

Hmmm. Ugly mix of tab and spaces. I have no idea what the norm
is here, so I'll just put spaces. I'm sure someone will let me
know if I'm wrong! ;-)

> 
>> +                      (uint64)    Index in the MAIR_EL1 register
>> +		                  providing the memory attribute that
>> +				  is used by the guest (r2)
>> +    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
>> +                                  RET_SUCCESS(0) (r0)
>> +    ==============    ========    
>> ======================================
>> +
>> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP
>> +
>> +    ==============    ========    
>> ======================================
>> +    Function ID:      (uint32)    0xC6000004
> 
> copy+paste error, should be 0xC6000005

Gah, well cpotted.

> 
>> +    Arguments:        (uint64)    The base of the PG-sized IPA range
>> +                                  that is forbidden to be accessed as
> 
> is now forbidden
> 
> or
> 
> was allowed
> 
> or just drop that part of the sentence because its covered by the "and
> have been previously mapped" part. Something like
> 
> PG-sized IPA range aligned to the PG size which has been previously 
> mapped
> (r1)

Picked the latter.

Thanks again,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature
  2021-07-23 13:30     ` Marc Zyngier
@ 2021-07-23 13:38       ` Andrew Jones
  2021-07-23 13:52         ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Andrew Jones @ 2021-07-23 13:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, kernel-team,
	Srivatsa Vaddagiri, Shanker R Donthineni, will

On Fri, Jul 23, 2021 at 02:30:13PM +0100, Marc Zyngier wrote:
...
> > > +
> > > +    ==============    ========
> > > ======================================
> > > +    Function ID:      (uint32)    0xC6000004
> > > +    Arguments:        (uint64)    The base of the PG-sized IPA range
> > > +                                  that is allowed to be accessed as
> > > +				  MMIO. Must aligned to the PG size (r1)
> > 
> > align
> 
> Hmmm. Ugly mix of tab and spaces. I have no idea what the norm
> is here, so I'll just put spaces. I'm sure someone will let me
> know if I'm wrong! ;-)

Actually, my comment wasn't regarding the alignment of the text. I was
commenting that we should change 'aligned' to 'align' in the text. (Sorry,
that was indeed ambiguous.) Hmm, it might be better to just add 'be', i.e.
'be aligned'.

I'm not sure what to do about the tab/space mixing, but keeping it
consistent is good enough for me.

Thanks,
drew


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature
  2021-07-23 13:38       ` Andrew Jones
@ 2021-07-23 13:52         ` Marc Zyngier
  0 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-23 13:52 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, kernel-team,
	Srivatsa Vaddagiri, Shanker R Donthineni, will

On 2021-07-23 14:38, Andrew Jones wrote:
> On Fri, Jul 23, 2021 at 02:30:13PM +0100, Marc Zyngier wrote:
> ...
>> > > +
>> > > +    ==============    ========
>> > > ======================================
>> > > +    Function ID:      (uint32)    0xC6000004
>> > > +    Arguments:        (uint64)    The base of the PG-sized IPA range
>> > > +                                  that is allowed to be accessed as
>> > > +				  MMIO. Must aligned to the PG size (r1)
>> >
>> > align
>> 
>> Hmmm. Ugly mix of tab and spaces. I have no idea what the norm
>> is here, so I'll just put spaces. I'm sure someone will let me
>> know if I'm wrong! ;-)
> 
> Actually, my comment wasn't regarding the alignment of the text. I was
> commenting that we should change 'aligned' to 'align' in the text. 
> (Sorry,
> that was indeed ambiguous.) Hmm, it might be better to just add 'be', 
> i.e.
> 'be aligned'.

*blink*. duh, of course.

> I'm not sure what to do about the tab/space mixing, but keeping it
> consistent is good enough for me.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/16] KVM: arm64: Generalise VM features into a set of flags
  2021-07-15 16:31 ` [PATCH 01/16] KVM: arm64: Generalise VM features into a set of flags Marc Zyngier
@ 2021-07-27 18:10   ` Will Deacon
  2021-07-28  9:41     ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Will Deacon @ 2021-07-27 18:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thu, Jul 15, 2021 at 05:31:44PM +0100, Marc Zyngier wrote:
> We currently deal with a set of booleans for VM features,
> while they could be better represented as set of flags
> contained in an unsigned long, similarily to what we are
> doing on the CPU side.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h | 12 +++++++-----
>  arch/arm64/kvm/arm.c              |  5 +++--
>  arch/arm64/kvm/mmio.c             |  3 ++-
>  3 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 41911585ae0c..4add6c27251f 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -122,7 +122,10 @@ struct kvm_arch {
>  	 * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
>  	 * supported.
>  	 */
> -	bool return_nisv_io_abort_to_user;
> +#define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER	0
> +	/* Memory Tagging Extension enabled for the guest */
> +#define KVM_ARCH_FLAG_MTE_ENABLED			1
> +	unsigned long flags;

One downside of packing all these together is that updating 'flags' now
requires an atomic rmw sequence (i.e. set_bit()). Then again, that's
probably for the best anyway given that kvm_vm_ioctl_enable_cap() looks
like it doesn't hold any locks.

>  	/*
>  	 * VM-wide PMU filter, implemented as a bitmap and big enough for
> @@ -133,9 +136,6 @@ struct kvm_arch {
>  
>  	u8 pfr0_csv2;
>  	u8 pfr0_csv3;
> -
> -	/* Memory Tagging Extension enabled for the guest */
> -	bool mte_enabled;
>  };
>  
>  struct kvm_vcpu_fault_info {
> @@ -777,7 +777,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
>  #define kvm_arm_vcpu_sve_finalized(vcpu) \
>  	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
>  
> -#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
> +#define kvm_has_mte(kvm)					\
> +	(system_supports_mte() &&				\
> +	 test_bit(KVM_ARCH_FLAG_MTE_ENABLED, &(kvm)->arch.flags))

Not an issue with this patch, but I just noticed that the
system_supports_mte() check is redundant here as we only allow the flag to
be set if that's already the case.

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid
  2021-07-15 16:31 ` [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid Marc Zyngier
  2021-07-19 17:18   ` Quentin Perret
@ 2021-07-27 18:10   ` Will Deacon
  2021-07-28  9:45     ` Marc Zyngier
  1 sibling, 1 reply; 62+ messages in thread
From: Will Deacon @ 2021-07-27 18:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thu, Jul 15, 2021 at 05:31:45PM +0100, Marc Zyngier wrote:
> Make sure we don't issue CMOs when mapping something that
> is not a memory address in the S2 page tables.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 05321f4165e3..a5874ebd0354 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -619,12 +619,16 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>  	}
>  
>  	/* Perform CMOs before installation of the guest stage-2 PTE */
> -	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
> -		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
> -						granule);
> -
> -	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> -		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
> +	if (kvm_phys_is_valid(phys)) {
> +		if (mm_ops->dcache_clean_inval_poc &&
> +		    stage2_pte_cacheable(pgt, new))
> +			mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new,
> +								      mm_ops),
> +						       granule);
> +		if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> +			mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops),
> +						 granule);
> +	}

Given that this check corresponds to checking the validity of 'new', I
wonder whether we'd be better off pushing the validity checks down into
stage2_pte_{cacheable,executable}()?

I.e. have stage2_pte_cacheable() return false if !kvm_pte_valid()

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-15 16:31 ` [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure Marc Zyngier
  2021-07-20 11:13   ` Quentin Perret
@ 2021-07-27 18:11   ` Will Deacon
  2021-07-28  9:57     ` Marc Zyngier
  2021-07-30 12:58     ` Quentin Perret
  1 sibling, 2 replies; 62+ messages in thread
From: Will Deacon @ 2021-07-27 18:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thu, Jul 15, 2021 at 05:31:47PM +0100, Marc Zyngier wrote:
> Introduce the infrastructure required to identify an IPA region
> that is expected to be used as an MMIO window.
> 
> This include mapping, unmapping and checking the regions. Nothing
> calls into it yet, so no expected functional change.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |   2 +
>  arch/arm64/include/asm/kvm_mmu.h  |   5 ++
>  arch/arm64/kvm/mmu.c              | 115 ++++++++++++++++++++++++++++++
>  3 files changed, 122 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4add6c27251f..914c1b7bb3ad 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -125,6 +125,8 @@ struct kvm_arch {
>  #define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER	0
>  	/* Memory Tagging Extension enabled for the guest */
>  #define KVM_ARCH_FLAG_MTE_ENABLED			1
> +	/* Gues has bought into the MMIO guard extension */
> +#define KVM_ARCH_FLAG_MMIO_GUARD			2
>  	unsigned long flags;
>  
>  	/*
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index b52c5c4b9a3d..f6b8fc1671b3 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -170,6 +170,11 @@ phys_addr_t kvm_mmu_get_httbr(void);
>  phys_addr_t kvm_get_idmap_vector(void);
>  int kvm_mmu_init(u32 *hyp_va_bits);
>  
> +/* MMIO guard */
> +bool kvm_install_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
> +bool kvm_remove_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
> +bool kvm_check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
> +
>  static inline void *__kvm_vector_slot2addr(void *base,
>  					   enum arm64_hyp_spectre_vector slot)
>  {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 3155c9e778f0..638827c8842b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1120,6 +1120,121 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>  		kvm_set_pfn_accessed(pte_pfn(pte));
>  }
>  
> +#define MMIO_NOTE	('M' << 24 | 'M' << 16 | 'I' << 8 | '0')

Although this made me smile, maybe we should carve up the bit space a bit
more carefully ;) Also, you know somebody clever will "fix" that typo to
'O'!

Quentin, as the other user of this stuff at the moment, how do you see the
annotation space being allocated? Feels like we should have some 'type'
bits which decide how to parse the rest of the entry.

> +
> +bool kvm_install_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> +{
> +	struct kvm_mmu_memory_cache *memcache;
> +	struct kvm_memory_slot *memslot;
> +	int ret, idx;
> +
> +	if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
> +		return false;
> +
> +	/* Must be page-aligned */
> +	if (ipa & ~PAGE_MASK)
> +		return false;
> +
> +	/*
> +	 * The page cannot be in a memslot. At some point, this will
> +	 * have to deal with device mappings though.
> +	 */
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> +	memslot = gfn_to_memslot(vcpu->kvm, ipa >> PAGE_SHIFT);
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);

What does this memslot check achieve? A new memslot could be added after
you've checked, no?

> +/* Assumes mmu_lock taken */

You can use a lockdep assertion for that!

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 05/16] KVM: arm64: Plumb MMIO checking into the fault handling
  2021-07-15 16:31 ` [PATCH 05/16] KVM: arm64: Plumb MMIO checking into the fault handling Marc Zyngier
@ 2021-07-27 18:11   ` Will Deacon
  2021-07-28 10:21     ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Will Deacon @ 2021-07-27 18:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thu, Jul 15, 2021 at 05:31:48PM +0100, Marc Zyngier wrote:
> Plumb the MMIO checking code into the MMIO fault handling code.
> Nothing allows a region to be registered yet, so there should be
> no funtional change either.

Typo: functional

> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/mmio.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
> index 3dd38a151d2a..fd5747279d27 100644
> --- a/arch/arm64/kvm/mmio.c
> +++ b/arch/arm64/kvm/mmio.c
> @@ -6,6 +6,7 @@
>  
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  #include <trace/events/kvm.h>
>  
>  #include "trace.h"
> @@ -130,6 +131,10 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>  	int len;
>  	u8 data_buf[8];
>  
> +	/* Check failed? Return to the guest for debriefing... */
> +	if (!kvm_check_ioguard_page(vcpu, fault_ipa))
> +		return 1;
> +
>  	/*
>  	 * No valid syndrome? Ask userspace for help if it has
>  	 * volunteered to do so, and bail out otherwise.
> @@ -156,6 +161,11 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>  	len = kvm_vcpu_dabt_get_as(vcpu);
>  	rt = kvm_vcpu_dabt_get_rd(vcpu);
>  
> +	/* If we cross a page boundary, check that too... */
> +	if (((fault_ipa + len - 1) & PAGE_MASK) != (fault_ipa & PAGE_MASK) &&
> +	    !kvm_check_ioguard_page(vcpu, fault_ipa + len - 1))
> +		return 1;
> +

I find this a little odd as the checks straddle the invalid syndrome check,
meaning that the relative priorities of KVM_ARCH_FLAG_MMIO_GUARD and
KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER are unclear.

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 06/16] KVM: arm64: Force a full unmap on vpcu reinit
  2021-07-15 16:31 ` [PATCH 06/16] KVM: arm64: Force a full unmap on vpcu reinit Marc Zyngier
@ 2021-07-27 18:11   ` Will Deacon
  2021-07-28 10:38     ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Will Deacon @ 2021-07-27 18:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thu, Jul 15, 2021 at 05:31:49PM +0100, Marc Zyngier wrote:
> As we now keep information in the S2PT, we must be careful not
> to keep it across a VM reboot, which could otherwise lead to
> interesting problems.
> 
> Make sure that the S2 is completely discarded on reset of
> a vcpu, and remove the flag that enforces the MMIO check.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/arm.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 97ab1512c44f..b0d2225190d2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1096,12 +1096,18 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>  	 * ensuring that the data side is always coherent. We still
>  	 * need to invalidate the I-cache though, as FWB does *not*
>  	 * imply CTR_EL0.DIC.
> +	 *
> +	 * If the MMIO guard was enabled, we pay the price of a full
> +	 * unmap to get back to a sane state (and clear the flag).
>  	 */
>  	if (vcpu->arch.has_run_once) {
> -		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
> +		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB) ||
> +		    test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
>  			stage2_unmap_vm(vcpu->kvm);
>  		else
>  			icache_inval_all_pou();
> +
> +		clear_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);

What prevents this racing with another vCPU trying to set the bit?

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/16] KVM: arm64: Wire MMIO guard hypercalls
  2021-07-15 16:31 ` [PATCH 07/16] KVM: arm64: Wire MMIO guard hypercalls Marc Zyngier
@ 2021-07-27 18:11   ` Will Deacon
  2021-07-28 10:47     ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Will Deacon @ 2021-07-27 18:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thu, Jul 15, 2021 at 05:31:50PM +0100, Marc Zyngier wrote:
> Plumb in the hypercall interface to allow a guest to discover,
> enroll, map and unmap MMIO regions.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hypercalls.c | 20 ++++++++++++++++++++
>  include/linux/arm-smccc.h   | 28 ++++++++++++++++++++++++++++
>  2 files changed, 48 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 30da78f72b3b..a3deeb907fdd 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -5,6 +5,7 @@
>  #include <linux/kvm_host.h>
>  
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  
>  #include <kvm/arm_hypercalls.h>
>  #include <kvm/arm_psci.h>
> @@ -129,10 +130,29 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>  	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
>  		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
>  		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
> +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO);
> +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL);
> +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP);
> +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP);
>  		break;
>  	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
>  		kvm_ptp_get_time(vcpu, val);
>  		break;
> +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID:
> +		val[0] = PAGE_SIZE;
> +		break;

I get the nagging feeling that querying the stage-2 page-size outside of
MMIO guard is going to be useful once we start looking at memory sharing,
so perhaps rename this to something more generic?

> +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID:
> +		set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
> +		val[0] = SMCCC_RET_SUCCESS;
> +		break;
> +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
> +		if (kvm_install_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
> +			val[0] = SMCCC_RET_SUCCESS;
> +		break;
> +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID:
> +		if (kvm_remove_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
> +			val[0] = SMCCC_RET_SUCCESS;
> +		break;

I think there's a slight discrepancy between MAP and UNMAP here in that
calling UNMAP on something that hasn't been mapped will fail, whereas
calling MAP on something that's already been mapped will succeed. I think
that might mean you can't reason about the final state of the page if two
vCPUs race to call these functions in some cases (and both succeed).

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/16] mm/ioremap: Add arch-specific callbacks on ioremap/iounmap calls
  2021-07-15 16:31 ` [PATCH 12/16] mm/ioremap: Add arch-specific callbacks on ioremap/iounmap calls Marc Zyngier
@ 2021-07-27 18:12   ` Will Deacon
  2021-07-28 11:01     ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Will Deacon @ 2021-07-27 18:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Thu, Jul 15, 2021 at 05:31:55PM +0100, Marc Zyngier wrote:
> Add a pair of hooks (ioremap_page_range_hook/iounmap_page_range_hook)
> that can be implemented by an architecture.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  include/linux/io.h |  3 +++
>  mm/ioremap.c       | 13 ++++++++++++-
>  mm/vmalloc.c       |  8 ++++++++
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/io.h b/include/linux/io.h
> index 9595151d800d..0ffc265f114c 100644
> --- a/include/linux/io.h
> +++ b/include/linux/io.h
> @@ -21,6 +21,9 @@ void __ioread32_copy(void *to, const void __iomem *from, size_t count);
>  void __iowrite64_copy(void __iomem *to, const void *from, size_t count);
>  
>  #ifdef CONFIG_MMU
> +void ioremap_page_range_hook(unsigned long addr, unsigned long end,
> +			     phys_addr_t phys_addr, pgprot_t prot);
> +void iounmap_page_range_hook(phys_addr_t phys_addr, size_t size);
>  int ioremap_page_range(unsigned long addr, unsigned long end,
>  		       phys_addr_t phys_addr, pgprot_t prot);
>  #else

Can we avoid these hooks by instead not registering the regions proactively
in the guest and moving that logic to a fault handler which runs off the
back of the injected data abort? From there, we could check if the faulting
IPA is a memory address and register it as MMIO if not.

Dunno, you've spent more time than me thinking about this, but just
wondering if you'd had a crack at doing it that way, as it _seems_ simpler
to my naive brain.

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/16] KVM: arm64: Generalise VM features into a set of flags
  2021-07-27 18:10   ` Will Deacon
@ 2021-07-28  9:41     ` Marc Zyngier
  2021-07-28 14:51       ` Steven Price
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-28  9:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 27 Jul 2021 19:10:27 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Thu, Jul 15, 2021 at 05:31:44PM +0100, Marc Zyngier wrote:
> > We currently deal with a set of booleans for VM features,
> > while they could be better represented as set of flags
> > contained in an unsigned long, similarily to what we are
> > doing on the CPU side.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_host.h | 12 +++++++-----
> >  arch/arm64/kvm/arm.c              |  5 +++--
> >  arch/arm64/kvm/mmio.c             |  3 ++-
> >  3 files changed, 12 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 41911585ae0c..4add6c27251f 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -122,7 +122,10 @@ struct kvm_arch {
> >  	 * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> >  	 * supported.
> >  	 */
> > -	bool return_nisv_io_abort_to_user;
> > +#define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER	0
> > +	/* Memory Tagging Extension enabled for the guest */
> > +#define KVM_ARCH_FLAG_MTE_ENABLED			1
> > +	unsigned long flags;
> 
> One downside of packing all these together is that updating 'flags' now
> requires an atomic rmw sequence (i.e. set_bit()). Then again, that's
> probably for the best anyway given that kvm_vm_ioctl_enable_cap() looks
> like it doesn't hold any locks.

That, and these operations are supposed to be extremely rare anyway.

> 
> >  	/*
> >  	 * VM-wide PMU filter, implemented as a bitmap and big enough for
> > @@ -133,9 +136,6 @@ struct kvm_arch {
> >  
> >  	u8 pfr0_csv2;
> >  	u8 pfr0_csv3;
> > -
> > -	/* Memory Tagging Extension enabled for the guest */
> > -	bool mte_enabled;
> >  };
> >  
> >  struct kvm_vcpu_fault_info {
> > @@ -777,7 +777,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
> >  #define kvm_arm_vcpu_sve_finalized(vcpu) \
> >  	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
> >  
> > -#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
> > +#define kvm_has_mte(kvm)					\
> > +	(system_supports_mte() &&				\
> > +	 test_bit(KVM_ARCH_FLAG_MTE_ENABLED, &(kvm)->arch.flags))
> 
> Not an issue with this patch, but I just noticed that the
> system_supports_mte() check is redundant here as we only allow the flag to
> be set if that's already the case.

It allows us to save a memory access if system_supports_mte() is false
(it is eventually implemented as a static key). On the other hand,
there is so much inlining due to it being a non-final cap that we
probably lose on that too...

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid
  2021-07-27 18:10   ` Will Deacon
@ 2021-07-28  9:45     ` Marc Zyngier
  0 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-28  9:45 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 27 Jul 2021 19:10:45 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Thu, Jul 15, 2021 at 05:31:45PM +0100, Marc Zyngier wrote:
> > Make sure we don't issue CMOs when mapping something that
> > is not a memory address in the S2 page tables.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 05321f4165e3..a5874ebd0354 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -619,12 +619,16 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> >  	}
> >  
> >  	/* Perform CMOs before installation of the guest stage-2 PTE */
> > -	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
> > -		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
> > -						granule);
> > -
> > -	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> > -		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
> > +	if (kvm_phys_is_valid(phys)) {
> > +		if (mm_ops->dcache_clean_inval_poc &&
> > +		    stage2_pte_cacheable(pgt, new))
> > +			mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new,
> > +								      mm_ops),
> > +						       granule);
> > +		if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> > +			mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops),
> > +						 granule);
> > +	}
> 
> Given that this check corresponds to checking the validity of 'new', I
> wonder whether we'd be better off pushing the validity checks down into
> stage2_pte_{cacheable,executable}()?
> 
> I.e. have stage2_pte_cacheable() return false if !kvm_pte_valid()

That would work just as well. I'll update the patch.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-27 18:11   ` Will Deacon
@ 2021-07-28  9:57     ` Marc Zyngier
  2021-07-30 12:26       ` Will Deacon
  2021-07-30 12:58     ` Quentin Perret
  1 sibling, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-28  9:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 27 Jul 2021 19:11:08 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Thu, Jul 15, 2021 at 05:31:47PM +0100, Marc Zyngier wrote:
> > Introduce the infrastructure required to identify an IPA region
> > that is expected to be used as an MMIO window.
> > 
> > This include mapping, unmapping and checking the regions. Nothing
> > calls into it yet, so no expected functional change.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |   2 +
> >  arch/arm64/include/asm/kvm_mmu.h  |   5 ++
> >  arch/arm64/kvm/mmu.c              | 115 ++++++++++++++++++++++++++++++
> >  3 files changed, 122 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 4add6c27251f..914c1b7bb3ad 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -125,6 +125,8 @@ struct kvm_arch {
> >  #define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER	0
> >  	/* Memory Tagging Extension enabled for the guest */
> >  #define KVM_ARCH_FLAG_MTE_ENABLED			1
> > +	/* Gues has bought into the MMIO guard extension */
> > +#define KVM_ARCH_FLAG_MMIO_GUARD			2
> >  	unsigned long flags;
> >  
> >  	/*
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index b52c5c4b9a3d..f6b8fc1671b3 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -170,6 +170,11 @@ phys_addr_t kvm_mmu_get_httbr(void);
> >  phys_addr_t kvm_get_idmap_vector(void);
> >  int kvm_mmu_init(u32 *hyp_va_bits);
> >  
> > +/* MMIO guard */
> > +bool kvm_install_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
> > +bool kvm_remove_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
> > +bool kvm_check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
> > +
> >  static inline void *__kvm_vector_slot2addr(void *base,
> >  					   enum arm64_hyp_spectre_vector slot)
> >  {
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 3155c9e778f0..638827c8842b 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1120,6 +1120,121 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> >  		kvm_set_pfn_accessed(pte_pfn(pte));
> >  }
> >  
> > +#define MMIO_NOTE	('M' << 24 | 'M' << 16 | 'I' << 8 | '0')
> 
> Although this made me smile, maybe we should carve up the bit space a bit
> more carefully ;) Also, you know somebody clever will "fix" that typo to
> 'O'!

They'll get to keep the pieces when the whole thing breaks!

More seriously, happy to have a more elaborate allocation scheme. For
the purpose of this series, it really doesn't matter.

> Quentin, as the other user of this stuff at the moment, how do you see the
> annotation space being allocated? Feels like we should have some 'type'
> bits which decide how to parse the rest of the entry.
> 
> > +
> > +bool kvm_install_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> > +{
> > +	struct kvm_mmu_memory_cache *memcache;
> > +	struct kvm_memory_slot *memslot;
> > +	int ret, idx;
> > +
> > +	if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
> > +		return false;
> > +
> > +	/* Must be page-aligned */
> > +	if (ipa & ~PAGE_MASK)
> > +		return false;
> > +
> > +	/*
> > +	 * The page cannot be in a memslot. At some point, this will
> > +	 * have to deal with device mappings though.
> > +	 */
> > +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> > +	memslot = gfn_to_memslot(vcpu->kvm, ipa >> PAGE_SHIFT);
> > +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> 
> What does this memslot check achieve? A new memslot could be added after
> you've checked, no?

If you start allowing S2 annotations to coexist with potential memory
mappings, you're in for trouble. The faulting logic will happily
overwrite the annotation, and that's probably not what you want.

As for new (or moving) memslots, I guess they should be checked
against existing annotations.

> 
> > +/* Assumes mmu_lock taken */
> 
> You can use a lockdep assertion for that!

Sure.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 05/16] KVM: arm64: Plumb MMIO checking into the fault handling
  2021-07-27 18:11   ` Will Deacon
@ 2021-07-28 10:21     ` Marc Zyngier
  2021-07-30 12:38       ` Will Deacon
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-28 10:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 27 Jul 2021 19:11:21 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Thu, Jul 15, 2021 at 05:31:48PM +0100, Marc Zyngier wrote:
> > Plumb the MMIO checking code into the MMIO fault handling code.
> > Nothing allows a region to be registered yet, so there should be
> > no funtional change either.
> 
> Typo: functional
> 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/mmio.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
> > index 3dd38a151d2a..fd5747279d27 100644
> > --- a/arch/arm64/kvm/mmio.c
> > +++ b/arch/arm64/kvm/mmio.c
> > @@ -6,6 +6,7 @@
> >  
> >  #include <linux/kvm_host.h>
> >  #include <asm/kvm_emulate.h>
> > +#include <asm/kvm_mmu.h>
> >  #include <trace/events/kvm.h>
> >  
> >  #include "trace.h"
> > @@ -130,6 +131,10 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> >  	int len;
> >  	u8 data_buf[8];
> >  
> > +	/* Check failed? Return to the guest for debriefing... */
> > +	if (!kvm_check_ioguard_page(vcpu, fault_ipa))
> > +		return 1;
> > +
> >  	/*
> >  	 * No valid syndrome? Ask userspace for help if it has
> >  	 * volunteered to do so, and bail out otherwise.
> > @@ -156,6 +161,11 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> >  	len = kvm_vcpu_dabt_get_as(vcpu);
> >  	rt = kvm_vcpu_dabt_get_rd(vcpu);
> >  
> > +	/* If we cross a page boundary, check that too... */
> > +	if (((fault_ipa + len - 1) & PAGE_MASK) != (fault_ipa & PAGE_MASK) &&
> > +	    !kvm_check_ioguard_page(vcpu, fault_ipa + len - 1))
> > +		return 1;
> > +
> 
> I find this a little odd as the checks straddle the invalid syndrome check,
> meaning that the relative priorities of KVM_ARCH_FLAG_MMIO_GUARD and
> KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER are unclear.

Good point. And the combination of both flags on its own is odd. Maybe
KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER should be ignored or deemed
incompatible with the MMIO guard feature.

The lack of syndrome information means that we cannot really test for
the boundaries of the access (len is invalid), so I'd be tempted to
inject an abort in this case.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 06/16] KVM: arm64: Force a full unmap on vpcu reinit
  2021-07-27 18:11   ` Will Deacon
@ 2021-07-28 10:38     ` Marc Zyngier
  2021-07-30 12:50       ` Will Deacon
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-28 10:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 27 Jul 2021 19:11:33 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Thu, Jul 15, 2021 at 05:31:49PM +0100, Marc Zyngier wrote:
> > As we now keep information in the S2PT, we must be careful not
> > to keep it across a VM reboot, which could otherwise lead to
> > interesting problems.
> > 
> > Make sure that the S2 is completely discarded on reset of
> > a vcpu, and remove the flag that enforces the MMIO check.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/arm.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 97ab1512c44f..b0d2225190d2 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -1096,12 +1096,18 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
> >  	 * ensuring that the data side is always coherent. We still
> >  	 * need to invalidate the I-cache though, as FWB does *not*
> >  	 * imply CTR_EL0.DIC.
> > +	 *
> > +	 * If the MMIO guard was enabled, we pay the price of a full
> > +	 * unmap to get back to a sane state (and clear the flag).
> >  	 */
> >  	if (vcpu->arch.has_run_once) {
> > -		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
> > +		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB) ||
> > +		    test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
> >  			stage2_unmap_vm(vcpu->kvm);
> >  		else
> >  			icache_inval_all_pou();
> > +
> > +		clear_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
> 
> What prevents this racing with another vCPU trying to set the bit?

Not much. We could take the kvm lock on both ends to serialize it, but
that's pretty ugly. And should we care? What is the semantic of
resetting a vcpu while another is still running?

If we want to support this sort of behaviour, then our tracking is
totally bogus, because it is VM-wide. And you don't even have to play
with that bit from another vcpu: all the information is lost at the
point where we unmap the S2 PTs.

Maybe an alternative is to move this to the halt/reboot PSCI handlers,
making it clearer what we expect?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/16] KVM: arm64: Wire MMIO guard hypercalls
  2021-07-27 18:11   ` Will Deacon
@ 2021-07-28 10:47     ` Marc Zyngier
  2021-07-30 13:11       ` Will Deacon
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-28 10:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 27 Jul 2021 19:11:46 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Thu, Jul 15, 2021 at 05:31:50PM +0100, Marc Zyngier wrote:
> > Plumb in the hypercall interface to allow a guest to discover,
> > enroll, map and unmap MMIO regions.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/hypercalls.c | 20 ++++++++++++++++++++
> >  include/linux/arm-smccc.h   | 28 ++++++++++++++++++++++++++++
> >  2 files changed, 48 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index 30da78f72b3b..a3deeb907fdd 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -5,6 +5,7 @@
> >  #include <linux/kvm_host.h>
> >  
> >  #include <asm/kvm_emulate.h>
> > +#include <asm/kvm_mmu.h>
> >  
> >  #include <kvm/arm_hypercalls.h>
> >  #include <kvm/arm_psci.h>
> > @@ -129,10 +130,29 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> >  	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> >  		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> >  		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
> > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO);
> > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL);
> > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP);
> > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP);
> >  		break;
> >  	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> >  		kvm_ptp_get_time(vcpu, val);
> >  		break;
> > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID:
> > +		val[0] = PAGE_SIZE;
> > +		break;
> 
> I get the nagging feeling that querying the stage-2 page-size outside of
> MMIO guard is going to be useful once we start looking at memory sharing,
> so perhaps rename this to something more generic?

At this stage, why not follow the architecture and simply expose it as
ID_AA64MMFR0_EL1.TGran{4,64,16}_2? That's exactly what it is for, and
we already check for this in KVM itself.

> 
> > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID:
> > +		set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
> > +		val[0] = SMCCC_RET_SUCCESS;
> > +		break;
> > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
> > +		if (kvm_install_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
> > +			val[0] = SMCCC_RET_SUCCESS;
> > +		break;
> > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID:
> > +		if (kvm_remove_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
> > +			val[0] = SMCCC_RET_SUCCESS;
> > +		break;
> 
> I think there's a slight discrepancy between MAP and UNMAP here in that
> calling UNMAP on something that hasn't been mapped will fail, whereas
> calling MAP on something that's already been mapped will succeed. I think
> that might mean you can't reason about the final state of the page if two
> vCPUs race to call these functions in some cases (and both succeed).

I'm not sure that's the expected behaviour for ioremap(), for example
(you can ioremap two portions of the same page successfully).

I guess MAP could return something indicating that the page is already
mapped, but I wouldn't want to return a hard failure in this case.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/16] mm/ioremap: Add arch-specific callbacks on ioremap/iounmap calls
  2021-07-27 18:12   ` Will Deacon
@ 2021-07-28 11:01     ` Marc Zyngier
  2021-07-30 14:07       ` Will Deacon
  0 siblings, 1 reply; 62+ messages in thread
From: Marc Zyngier @ 2021-07-28 11:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tue, 27 Jul 2021 19:12:04 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Thu, Jul 15, 2021 at 05:31:55PM +0100, Marc Zyngier wrote:
> > Add a pair of hooks (ioremap_page_range_hook/iounmap_page_range_hook)
> > that can be implemented by an architecture.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  include/linux/io.h |  3 +++
> >  mm/ioremap.c       | 13 ++++++++++++-
> >  mm/vmalloc.c       |  8 ++++++++
> >  3 files changed, 23 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/io.h b/include/linux/io.h
> > index 9595151d800d..0ffc265f114c 100644
> > --- a/include/linux/io.h
> > +++ b/include/linux/io.h
> > @@ -21,6 +21,9 @@ void __ioread32_copy(void *to, const void __iomem *from, size_t count);
> >  void __iowrite64_copy(void __iomem *to, const void *from, size_t count);
> >  
> >  #ifdef CONFIG_MMU
> > +void ioremap_page_range_hook(unsigned long addr, unsigned long end,
> > +			     phys_addr_t phys_addr, pgprot_t prot);
> > +void iounmap_page_range_hook(phys_addr_t phys_addr, size_t size);
> >  int ioremap_page_range(unsigned long addr, unsigned long end,
> >  		       phys_addr_t phys_addr, pgprot_t prot);
> >  #else
> 
> Can we avoid these hooks by instead not registering the regions proactively
> in the guest and moving that logic to a fault handler which runs off the
> back of the injected data abort? From there, we could check if the faulting
> IPA is a memory address and register it as MMIO if not.
> 
> Dunno, you've spent more time than me thinking about this, but just
> wondering if you'd had a crack at doing it that way, as it _seems_ simpler
> to my naive brain.

I thought about it, but couldn't work out whether it was always
possible for the guest to handle these faults (first access in an
interrupt context, for example?).

Also, this changes the semantics of the protection this is supposed to
offer: any access out of the RAM space will generate an abort, and the
fault handler will grant MMIO forwarding for this page. Stray accesses
that would normally be properly handled as fatal would now succeed and
be forwarded to userspace, even if there was no emulated devices
there.

For this to work, we'd need to work out whether there is any existing
device mapping that actually points to this page. And whether it
actually is supposed to be forwarded to userspace. Do we have a rmap
for device mappings?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/16] KVM: arm64: Generalise VM features into a set of flags
  2021-07-28  9:41     ` Marc Zyngier
@ 2021-07-28 14:51       ` Steven Price
  0 siblings, 0 replies; 62+ messages in thread
From: Steven Price @ 2021-07-28 14:51 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On 28/07/2021 10:41, Marc Zyngier wrote:
> On Tue, 27 Jul 2021 19:10:27 +0100,
> Will Deacon <will@kernel.org> wrote:
>>
>> On Thu, Jul 15, 2021 at 05:31:44PM +0100, Marc Zyngier wrote:
>>> We currently deal with a set of booleans for VM features,
>>> while they could be better represented as set of flags
>>> contained in an unsigned long, similarily to what we are
>>> doing on the CPU side.
>>>
>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>>> ---
>>>  arch/arm64/include/asm/kvm_host.h | 12 +++++++-----
>>>  arch/arm64/kvm/arm.c              |  5 +++--
>>>  arch/arm64/kvm/mmio.c             |  3 ++-
>>>  3 files changed, 12 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 41911585ae0c..4add6c27251f 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -122,7 +122,10 @@ struct kvm_arch {
>>>  	 * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
>>>  	 * supported.
>>>  	 */
>>> -	bool return_nisv_io_abort_to_user;
>>> +#define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER	0
>>> +	/* Memory Tagging Extension enabled for the guest */
>>> +#define KVM_ARCH_FLAG_MTE_ENABLED			1
>>> +	unsigned long flags;
>>
>> One downside of packing all these together is that updating 'flags' now
>> requires an atomic rmw sequence (i.e. set_bit()). Then again, that's
>> probably for the best anyway given that kvm_vm_ioctl_enable_cap() looks
>> like it doesn't hold any locks.
> 
> That, and these operations are supposed to be extremely rare anyway.
> 
>>
>>>  	/*
>>>  	 * VM-wide PMU filter, implemented as a bitmap and big enough for
>>> @@ -133,9 +136,6 @@ struct kvm_arch {
>>>  
>>>  	u8 pfr0_csv2;
>>>  	u8 pfr0_csv3;
>>> -
>>> -	/* Memory Tagging Extension enabled for the guest */
>>> -	bool mte_enabled;
>>>  };
>>>  
>>>  struct kvm_vcpu_fault_info {
>>> @@ -777,7 +777,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
>>>  #define kvm_arm_vcpu_sve_finalized(vcpu) \
>>>  	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
>>>  
>>> -#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
>>> +#define kvm_has_mte(kvm)					\
>>> +	(system_supports_mte() &&				\
>>> +	 test_bit(KVM_ARCH_FLAG_MTE_ENABLED, &(kvm)->arch.flags))
>>
>> Not an issue with this patch, but I just noticed that the
>> system_supports_mte() check is redundant here as we only allow the flag to
>> be set if that's already the case.
> 
> It allows us to save a memory access if system_supports_mte() is false
> (it is eventually implemented as a static key). On the other hand,
> there is so much inlining due to it being a non-final cap that we
> probably lose on that too...

My original logic was that system_supports_mte() checks
IS_ENABLED(CONFIG_ARM64_MTE) - so this enables the code guarded with
kvm_has_mte() to be compiled out if CONFIG_ARM64_MTE is disabled.

Indeed it turns at we currently rely on this (with CONFIG_ARM64_MTE
disabled):

aarch64-linux-gnu-ld: arch/arm64/kvm/mmu.o: in function `sanitise_mte_tags':
/home/stepri01/work/linux/arch/arm64/kvm/mmu.c:887: undefined reference to `mte_clear_page_tags'
aarch64-linux-gnu-ld: arch/arm64/kvm/guest.o: in function `kvm_vm_ioctl_mte_copy_tags':
/home/stepri01/work/linux/arch/arm64/kvm/guest.c:1066: undefined reference to `mte_copy_tags_to_user'
aarch64-linux-gnu-ld: /home/stepri01/work/linux/arch/arm64/kvm/guest.c:1074: undefined reference to `mte_copy_tags_from_user'

Obviously we could pull just the IS_ENABLED() into kvm_has_mte() instead.

Steve

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-28  9:57     ` Marc Zyngier
@ 2021-07-30 12:26       ` Will Deacon
  2021-07-30 13:04         ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Will Deacon @ 2021-07-30 12:26 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Wed, Jul 28, 2021 at 10:57:30AM +0100, Marc Zyngier wrote:
> On Tue, 27 Jul 2021 19:11:08 +0100,
> Will Deacon <will@kernel.org> wrote:
> > On Thu, Jul 15, 2021 at 05:31:47PM +0100, Marc Zyngier wrote:
> > > +bool kvm_install_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> > > +{
> > > +	struct kvm_mmu_memory_cache *memcache;
> > > +	struct kvm_memory_slot *memslot;
> > > +	int ret, idx;
> > > +
> > > +	if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
> > > +		return false;
> > > +
> > > +	/* Must be page-aligned */
> > > +	if (ipa & ~PAGE_MASK)
> > > +		return false;
> > > +
> > > +	/*
> > > +	 * The page cannot be in a memslot. At some point, this will
> > > +	 * have to deal with device mappings though.
> > > +	 */
> > > +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> > > +	memslot = gfn_to_memslot(vcpu->kvm, ipa >> PAGE_SHIFT);
> > > +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> > 
> > What does this memslot check achieve? A new memslot could be added after
> > you've checked, no?
> 
> If you start allowing S2 annotations to coexist with potential memory
> mappings, you're in for trouble. The faulting logic will happily
> overwrite the annotation, and that's probably not what you want.

I don't disagree, but the check above appears to be racy.

> As for new (or moving) memslots, I guess they should be checked
> against existing annotations.

Something like that, but the devil is in the details as it will need to
synchronize with this check somehow.

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 05/16] KVM: arm64: Plumb MMIO checking into the fault handling
  2021-07-28 10:21     ` Marc Zyngier
@ 2021-07-30 12:38       ` Will Deacon
  0 siblings, 0 replies; 62+ messages in thread
From: Will Deacon @ 2021-07-30 12:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Wed, Jul 28, 2021 at 11:21:52AM +0100, Marc Zyngier wrote:
> On Tue, 27 Jul 2021 19:11:21 +0100,
> Will Deacon <will@kernel.org> wrote:
> > 
> > On Thu, Jul 15, 2021 at 05:31:48PM +0100, Marc Zyngier wrote:
> > > Plumb the MMIO checking code into the MMIO fault handling code.
> > > Nothing allows a region to be registered yet, so there should be
> > > no funtional change either.
> > 
> > Typo: functional
> > 
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/kvm/mmio.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
> > > index 3dd38a151d2a..fd5747279d27 100644
> > > --- a/arch/arm64/kvm/mmio.c
> > > +++ b/arch/arm64/kvm/mmio.c
> > > @@ -6,6 +6,7 @@
> > >  
> > >  #include <linux/kvm_host.h>
> > >  #include <asm/kvm_emulate.h>
> > > +#include <asm/kvm_mmu.h>
> > >  #include <trace/events/kvm.h>
> > >  
> > >  #include "trace.h"
> > > @@ -130,6 +131,10 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> > >  	int len;
> > >  	u8 data_buf[8];
> > >  
> > > +	/* Check failed? Return to the guest for debriefing... */
> > > +	if (!kvm_check_ioguard_page(vcpu, fault_ipa))
> > > +		return 1;
> > > +
> > >  	/*
> > >  	 * No valid syndrome? Ask userspace for help if it has
> > >  	 * volunteered to do so, and bail out otherwise.
> > > @@ -156,6 +161,11 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> > >  	len = kvm_vcpu_dabt_get_as(vcpu);
> > >  	rt = kvm_vcpu_dabt_get_rd(vcpu);
> > >  
> > > +	/* If we cross a page boundary, check that too... */
> > > +	if (((fault_ipa + len - 1) & PAGE_MASK) != (fault_ipa & PAGE_MASK) &&
> > > +	    !kvm_check_ioguard_page(vcpu, fault_ipa + len - 1))
> > > +		return 1;
> > > +
> > 
> > I find this a little odd as the checks straddle the invalid syndrome check,
> > meaning that the relative priorities of KVM_ARCH_FLAG_MMIO_GUARD and
> > KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER are unclear.
> 
> Good point. And the combination of both flags on its own is odd. Maybe
> KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER should be ignored or deemed
> incompatible with the MMIO guard feature.
> 
> The lack of syndrome information means that we cannot really test for
> the boundaries of the access (len is invalid), so I'd be tempted to
> inject an abort in this case.
> 
> Thoughts?

I agree. Probably worth rejecting both flags anyway so the VMM knows what
it's getting, but injecting an abort into the guest if we don't have
sufficient syndrom information to triage it safely feels like the right
thing to do.

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 06/16] KVM: arm64: Force a full unmap on vpcu reinit
  2021-07-28 10:38     ` Marc Zyngier
@ 2021-07-30 12:50       ` Will Deacon
  0 siblings, 0 replies; 62+ messages in thread
From: Will Deacon @ 2021-07-30 12:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Wed, Jul 28, 2021 at 11:38:35AM +0100, Marc Zyngier wrote:
> On Tue, 27 Jul 2021 19:11:33 +0100,
> Will Deacon <will@kernel.org> wrote:
> > 
> > On Thu, Jul 15, 2021 at 05:31:49PM +0100, Marc Zyngier wrote:
> > > As we now keep information in the S2PT, we must be careful not
> > > to keep it across a VM reboot, which could otherwise lead to
> > > interesting problems.
> > > 
> > > Make sure that the S2 is completely discarded on reset of
> > > a vcpu, and remove the flag that enforces the MMIO check.
> > > 
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/kvm/arm.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index 97ab1512c44f..b0d2225190d2 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -1096,12 +1096,18 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
> > >  	 * ensuring that the data side is always coherent. We still
> > >  	 * need to invalidate the I-cache though, as FWB does *not*
> > >  	 * imply CTR_EL0.DIC.
> > > +	 *
> > > +	 * If the MMIO guard was enabled, we pay the price of a full
> > > +	 * unmap to get back to a sane state (and clear the flag).
> > >  	 */
> > >  	if (vcpu->arch.has_run_once) {
> > > -		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
> > > +		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB) ||
> > > +		    test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
> > >  			stage2_unmap_vm(vcpu->kvm);
> > >  		else
> > >  			icache_inval_all_pou();
> > > +
> > > +		clear_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
> > 
> > What prevents this racing with another vCPU trying to set the bit?
> 
> Not much. We could take the kvm lock on both ends to serialize it, but
> that's pretty ugly. And should we care? What is the semantic of
> resetting a vcpu while another is still running?

It's definitely weird, but given that this is an attack vector I don't think
we can rule out attackers trying whacky stuff like this (although maybe
we end up forbidding vcpu reset in pKVM -- dunno).

> If we want to support this sort of behaviour, then our tracking is
> totally bogus, because it is VM-wide. And you don't even have to play
> with that bit from another vcpu: all the information is lost at the
> point where we unmap the S2 PTs.
> 
> Maybe an alternative is to move this to the halt/reboot PSCI handlers,
> making it clearer what we expect?

I think that's probably worth looking at. The race is quite hard to reason
about otherwise, so if clearing the bit can be done on the teardown path
in single-threaded context then I think that's better. It looks like
kvm_prepare_system_event() has all the synchronisation we need there.

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-27 18:11   ` Will Deacon
  2021-07-28  9:57     ` Marc Zyngier
@ 2021-07-30 12:58     ` Quentin Perret
  1 sibling, 0 replies; 62+ messages in thread
From: Quentin Perret @ 2021-07-30 12:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, kvm, linux-kernel,
	dbrazdil, Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Tuesday 27 Jul 2021 at 19:11:08 (+0100), Will Deacon wrote:
> On Thu, Jul 15, 2021 at 05:31:47PM +0100, Marc Zyngier wrote:
> > Introduce the infrastructure required to identify an IPA region
> > that is expected to be used as an MMIO window.
> > 
> > This include mapping, unmapping and checking the regions. Nothing
> > calls into it yet, so no expected functional change.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |   2 +
> >  arch/arm64/include/asm/kvm_mmu.h  |   5 ++
> >  arch/arm64/kvm/mmu.c              | 115 ++++++++++++++++++++++++++++++
> >  3 files changed, 122 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 4add6c27251f..914c1b7bb3ad 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -125,6 +125,8 @@ struct kvm_arch {
> >  #define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER	0
> >  	/* Memory Tagging Extension enabled for the guest */
> >  #define KVM_ARCH_FLAG_MTE_ENABLED			1
> > +	/* Gues has bought into the MMIO guard extension */
> > +#define KVM_ARCH_FLAG_MMIO_GUARD			2
> >  	unsigned long flags;
> >  
> >  	/*
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index b52c5c4b9a3d..f6b8fc1671b3 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -170,6 +170,11 @@ phys_addr_t kvm_mmu_get_httbr(void);
> >  phys_addr_t kvm_get_idmap_vector(void);
> >  int kvm_mmu_init(u32 *hyp_va_bits);
> >  
> > +/* MMIO guard */
> > +bool kvm_install_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
> > +bool kvm_remove_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
> > +bool kvm_check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa);
> > +
> >  static inline void *__kvm_vector_slot2addr(void *base,
> >  					   enum arm64_hyp_spectre_vector slot)
> >  {
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 3155c9e778f0..638827c8842b 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1120,6 +1120,121 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> >  		kvm_set_pfn_accessed(pte_pfn(pte));
> >  }
> >  
> > +#define MMIO_NOTE	('M' << 24 | 'M' << 16 | 'I' << 8 | '0')
> 
> Although this made me smile, maybe we should carve up the bit space a bit
> more carefully ;) Also, you know somebody clever will "fix" that typo to
> 'O'!
> 
> Quentin, as the other user of this stuff at the moment, how do you see the
> annotation space being allocated? Feels like we should have some 'type'
> bits which decide how to parse the rest of the entry.

Yes, that's probably worth doing. I've been thinking about using fancy
data structures with bitfields and such, but none of that really seems
to bit a simpler approach. So how about we make the annotate() function
accept two arguments instead of one: an 'enum kvm_pte_inval_type type'
and 'u64 payload', and then we provide a static inline helper in
pgtable.h to unpack an invalid PTE into type/payload members? I'd guess
7 bits for the type should be more than enough and the payload can use
the rest.

Thoughts?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure
  2021-07-30 12:26       ` Will Deacon
@ 2021-07-30 13:04         ` Marc Zyngier
  0 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-07-30 13:04 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Fri, 30 Jul 2021 13:26:59 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Wed, Jul 28, 2021 at 10:57:30AM +0100, Marc Zyngier wrote:
> > On Tue, 27 Jul 2021 19:11:08 +0100,
> > Will Deacon <will@kernel.org> wrote:
> > > On Thu, Jul 15, 2021 at 05:31:47PM +0100, Marc Zyngier wrote:
> > > > +bool kvm_install_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> > > > +{
> > > > +	struct kvm_mmu_memory_cache *memcache;
> > > > +	struct kvm_memory_slot *memslot;
> > > > +	int ret, idx;
> > > > +
> > > > +	if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
> > > > +		return false;
> > > > +
> > > > +	/* Must be page-aligned */
> > > > +	if (ipa & ~PAGE_MASK)
> > > > +		return false;
> > > > +
> > > > +	/*
> > > > +	 * The page cannot be in a memslot. At some point, this will
> > > > +	 * have to deal with device mappings though.
> > > > +	 */
> > > > +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> > > > +	memslot = gfn_to_memslot(vcpu->kvm, ipa >> PAGE_SHIFT);
> > > > +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> > > 
> > > What does this memslot check achieve? A new memslot could be added after
> > > you've checked, no?
> > 
> > If you start allowing S2 annotations to coexist with potential memory
> > mappings, you're in for trouble. The faulting logic will happily
> > overwrite the annotation, and that's probably not what you want.
> 
> I don't disagree, but the check above appears to be racy.

Yup, the srcu_read_unlock() should be moved at the end of this
function. It's rather silly as it is currently written...

> 
> > As for new (or moving) memslots, I guess they should be checked
> > against existing annotations.
> 
> Something like that, but the devil is in the details as it will need to
> synchronize with this check somehow.

The SRCU read lock should protect us against memslots being removed
whilst we're accessing it. In a way, this is no different from taking
a page fault.

For new memslots, it is a lot less clear. There are multiple levels of
locking, more or less documented... It feels like slots_arch_lock is
the right tool for this job, but I need to page all that stuff in...

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/16] KVM: arm64: Wire MMIO guard hypercalls
  2021-07-28 10:47     ` Marc Zyngier
@ 2021-07-30 13:11       ` Will Deacon
  2021-08-01 11:20         ` Marc Zyngier
  0 siblings, 1 reply; 62+ messages in thread
From: Will Deacon @ 2021-07-30 13:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Wed, Jul 28, 2021 at 11:47:20AM +0100, Marc Zyngier wrote:
> On Tue, 27 Jul 2021 19:11:46 +0100,
> Will Deacon <will@kernel.org> wrote:
> > 
> > On Thu, Jul 15, 2021 at 05:31:50PM +0100, Marc Zyngier wrote:
> > > Plumb in the hypercall interface to allow a guest to discover,
> > > enroll, map and unmap MMIO regions.
> > > 
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/kvm/hypercalls.c | 20 ++++++++++++++++++++
> > >  include/linux/arm-smccc.h   | 28 ++++++++++++++++++++++++++++
> > >  2 files changed, 48 insertions(+)
> > > 
> > > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > > index 30da78f72b3b..a3deeb907fdd 100644
> > > --- a/arch/arm64/kvm/hypercalls.c
> > > +++ b/arch/arm64/kvm/hypercalls.c
> > > @@ -5,6 +5,7 @@
> > >  #include <linux/kvm_host.h>
> > >  
> > >  #include <asm/kvm_emulate.h>
> > > +#include <asm/kvm_mmu.h>
> > >  
> > >  #include <kvm/arm_hypercalls.h>
> > >  #include <kvm/arm_psci.h>
> > > @@ -129,10 +130,29 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > >  	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> > >  		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> > >  		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
> > > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO);
> > > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL);
> > > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP);
> > > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP);
> > >  		break;
> > >  	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> > >  		kvm_ptp_get_time(vcpu, val);
> > >  		break;
> > > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID:
> > > +		val[0] = PAGE_SIZE;
> > > +		break;
> > 
> > I get the nagging feeling that querying the stage-2 page-size outside of
> > MMIO guard is going to be useful once we start looking at memory sharing,
> > so perhaps rename this to something more generic?
> 
> At this stage, why not follow the architecture and simply expose it as
> ID_AA64MMFR0_EL1.TGran{4,64,16}_2? That's exactly what it is for, and
> we already check for this in KVM itself.

Nice, I hadn't thought of that. On reflection, though, I don't agree that
it's "exactly what it is for" -- the ID register talks about the supported
stage-2 page-sizes, whereas we want to advertise the one page size that
we're currently using. In other words, it's important that we only ever
populate one of the fields and I wonder if that could bite us in future
somehow?

Up to you, you've definitely got a better feel for this than me.

> > > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID:
> > > +		set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
> > > +		val[0] = SMCCC_RET_SUCCESS;
> > > +		break;
> > > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
> > > +		if (kvm_install_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
> > > +			val[0] = SMCCC_RET_SUCCESS;
> > > +		break;
> > > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID:
> > > +		if (kvm_remove_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
> > > +			val[0] = SMCCC_RET_SUCCESS;
> > > +		break;
> > 
> > I think there's a slight discrepancy between MAP and UNMAP here in that
> > calling UNMAP on something that hasn't been mapped will fail, whereas
> > calling MAP on something that's already been mapped will succeed. I think
> > that might mean you can't reason about the final state of the page if two
> > vCPUs race to call these functions in some cases (and both succeed).
> 
> I'm not sure that's the expected behaviour for ioremap(), for example
> (you can ioremap two portions of the same page successfully).

Hmm, good point. Does that mean we should be refcounting the stage-2?
Otherwise if we do something like:

	foo = ioremap(page, 0x100);
	bar = ioremap(page+0x100, 0x100);
	iounmap(foo);

then bar will break. Or did I miss something in the series?

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/16] mm/ioremap: Add arch-specific callbacks on ioremap/iounmap calls
  2021-07-28 11:01     ` Marc Zyngier
@ 2021-07-30 14:07       ` Will Deacon
  0 siblings, 0 replies; 62+ messages in thread
From: Will Deacon @ 2021-07-30 14:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Wed, Jul 28, 2021 at 12:01:53PM +0100, Marc Zyngier wrote:
> On Tue, 27 Jul 2021 19:12:04 +0100,
> Will Deacon <will@kernel.org> wrote:
> > 
> > On Thu, Jul 15, 2021 at 05:31:55PM +0100, Marc Zyngier wrote:
> > > Add a pair of hooks (ioremap_page_range_hook/iounmap_page_range_hook)
> > > that can be implemented by an architecture.
> > > 
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  include/linux/io.h |  3 +++
> > >  mm/ioremap.c       | 13 ++++++++++++-
> > >  mm/vmalloc.c       |  8 ++++++++
> > >  3 files changed, 23 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/linux/io.h b/include/linux/io.h
> > > index 9595151d800d..0ffc265f114c 100644
> > > --- a/include/linux/io.h
> > > +++ b/include/linux/io.h
> > > @@ -21,6 +21,9 @@ void __ioread32_copy(void *to, const void __iomem *from, size_t count);
> > >  void __iowrite64_copy(void __iomem *to, const void *from, size_t count);
> > >  
> > >  #ifdef CONFIG_MMU
> > > +void ioremap_page_range_hook(unsigned long addr, unsigned long end,
> > > +			     phys_addr_t phys_addr, pgprot_t prot);
> > > +void iounmap_page_range_hook(phys_addr_t phys_addr, size_t size);
> > >  int ioremap_page_range(unsigned long addr, unsigned long end,
> > >  		       phys_addr_t phys_addr, pgprot_t prot);
> > >  #else
> > 
> > Can we avoid these hooks by instead not registering the regions proactively
> > in the guest and moving that logic to a fault handler which runs off the
> > back of the injected data abort? From there, we could check if the faulting
> > IPA is a memory address and register it as MMIO if not.
> > 
> > Dunno, you've spent more time than me thinking about this, but just
> > wondering if you'd had a crack at doing it that way, as it _seems_ simpler
> > to my naive brain.
> 
> I thought about it, but couldn't work out whether it was always
> possible for the guest to handle these faults (first access in an
> interrupt context, for example?).

If the check is a simple pfn_valid() I think it should be ok, but yes, we'd
definitely not want to do anything more involved given that this could run
in all sorts of horrible contexts.

> Also, this changes the semantics of the protection this is supposed to
> offer: any access out of the RAM space will generate an abort, and the
> fault handler will grant MMIO forwarding for this page. Stray accesses
> that would normally be properly handled as fatal would now succeed and
> be forwarded to userspace, even if there was no emulated devices
> there.

That's true, it would offer much weaker guarantees to the guest. It's more
like a guarantee that memory never traps to the VMM. It also then wouldn't
help with the write-combine fun. It would be simpler though, but with less
functionality.

> For this to work, we'd need to work out whether there is any existing
> device mapping that actually points to this page. And whether it
> actually is supposed to be forwarded to userspace. Do we have a rmap
> for device mappings?

I don't think this would be possible given your comments above.

So let's stick with the approach you've taken. It just feels like there
should be a way to do this without introducing new hooks into the core
code. If it wasn't for pci_remap_iospace(), we could simply hook our
definition of __ioremap_caller(). Another avenue to explore would be
looking at the IO resource instead; I see x86 already uses
IORES_MAP_ENCRYPTED and IORES_MAP_SYSTEM_RAM to drive pgprot...

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/16] KVM: arm64: Wire MMIO guard hypercalls
  2021-07-30 13:11       ` Will Deacon
@ 2021-08-01 11:20         ` Marc Zyngier
  0 siblings, 0 replies; 62+ messages in thread
From: Marc Zyngier @ 2021-08-01 11:20 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, qperret, dbrazdil,
	Srivatsa Vaddagiri, Shanker R Donthineni, James Morse,
	Suzuki K Poulose, Alexandru Elisei, kernel-team

On Fri, 30 Jul 2021 14:11:03 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Wed, Jul 28, 2021 at 11:47:20AM +0100, Marc Zyngier wrote:
> > On Tue, 27 Jul 2021 19:11:46 +0100,
> > Will Deacon <will@kernel.org> wrote:
> > > 
> > > On Thu, Jul 15, 2021 at 05:31:50PM +0100, Marc Zyngier wrote:
> > > > Plumb in the hypercall interface to allow a guest to discover,
> > > > enroll, map and unmap MMIO regions.
> > > > 
> > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > ---
> > > >  arch/arm64/kvm/hypercalls.c | 20 ++++++++++++++++++++
> > > >  include/linux/arm-smccc.h   | 28 ++++++++++++++++++++++++++++
> > > >  2 files changed, 48 insertions(+)
> > > > 
> > > > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > > > index 30da78f72b3b..a3deeb907fdd 100644
> > > > --- a/arch/arm64/kvm/hypercalls.c
> > > > +++ b/arch/arm64/kvm/hypercalls.c
> > > > @@ -5,6 +5,7 @@
> > > >  #include <linux/kvm_host.h>
> > > >  
> > > >  #include <asm/kvm_emulate.h>
> > > > +#include <asm/kvm_mmu.h>
> > > >  
> > > >  #include <kvm/arm_hypercalls.h>
> > > >  #include <kvm/arm_psci.h>
> > > > @@ -129,10 +130,29 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > >  	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> > > >  		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> > > >  		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
> > > > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO);
> > > > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL);
> > > > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP);
> > > > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP);
> > > >  		break;
> > > >  	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> > > >  		kvm_ptp_get_time(vcpu, val);
> > > >  		break;
> > > > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID:
> > > > +		val[0] = PAGE_SIZE;
> > > > +		break;
> > > 
> > > I get the nagging feeling that querying the stage-2 page-size outside of
> > > MMIO guard is going to be useful once we start looking at memory sharing,
> > > so perhaps rename this to something more generic?
> > 
> > At this stage, why not follow the architecture and simply expose it as
> > ID_AA64MMFR0_EL1.TGran{4,64,16}_2? That's exactly what it is for, and
> > we already check for this in KVM itself.
> 
> Nice, I hadn't thought of that. On reflection, though, I don't agree that
> it's "exactly what it is for" -- the ID register talks about the supported
> stage-2 page-sizes, whereas we want to advertise the one page size that
> we're currently using. In other words, it's important that we only ever
> populate one of the fields and I wonder if that could bite us in future
> somehow?

Either that, or we expose all the page sizes >= to that of the host
(using the fact that larger page sizes are multiples of the base one),
and use the guest's page size to work out the granularity. Which is
what NV does already.

> Up to you, you've definitely got a better feel for this than me.

I'll have a look. The "one size" version is dead easy.

> 
> > > > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID:
> > > > +		set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
> > > > +		val[0] = SMCCC_RET_SUCCESS;
> > > > +		break;
> > > > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
> > > > +		if (kvm_install_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
> > > > +			val[0] = SMCCC_RET_SUCCESS;
> > > > +		break;
> > > > +	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID:
> > > > +		if (kvm_remove_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
> > > > +			val[0] = SMCCC_RET_SUCCESS;
> > > > +		break;
> > > 
> > > I think there's a slight discrepancy between MAP and UNMAP here in that
> > > calling UNMAP on something that hasn't been mapped will fail, whereas
> > > calling MAP on something that's already been mapped will succeed. I think
> > > that might mean you can't reason about the final state of the page if two
> > > vCPUs race to call these functions in some cases (and both succeed).
> > 
> > I'm not sure that's the expected behaviour for ioremap(), for example
> > (you can ioremap two portions of the same page successfully).
> 
> Hmm, good point. Does that mean we should be refcounting the stage-2?
> Otherwise if we do something like:
> 
> 	foo = ioremap(page, 0x100);
> 	bar = ioremap(page+0x100, 0x100);
> 	iounmap(foo);
> 
> then bar will break. Or did I miss something in the series?

No, I don't think you have. But I don't think we should implement this
refcounting in the hypervisor side. We really should do it guest side.

I'll have a look.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2021-08-01 11:21 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-15 16:31 [PATCH 00/16] KVM: arm64: MMIO guard PV services Marc Zyngier
2021-07-15 16:31 ` [PATCH 01/16] KVM: arm64: Generalise VM features into a set of flags Marc Zyngier
2021-07-27 18:10   ` Will Deacon
2021-07-28  9:41     ` Marc Zyngier
2021-07-28 14:51       ` Steven Price
2021-07-15 16:31 ` [PATCH 02/16] KVM: arm64: Don't issue CMOs when the physical address is invalid Marc Zyngier
2021-07-19 17:18   ` Quentin Perret
2021-07-20  8:04     ` Marc Zyngier
2021-07-27 18:10   ` Will Deacon
2021-07-28  9:45     ` Marc Zyngier
2021-07-15 16:31 ` [PATCH 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate Marc Zyngier
2021-07-20 10:09   ` Quentin Perret
2021-07-20 10:21     ` Marc Zyngier
2021-07-20 10:38       ` Quentin Perret
2021-07-20 11:20         ` Marc Zyngier
2021-07-20 11:36           ` Quentin Perret
2021-07-20 13:13             ` Marc Zyngier
2021-07-15 16:31 ` [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure Marc Zyngier
2021-07-20 11:13   ` Quentin Perret
2021-07-20 13:15     ` Marc Zyngier
2021-07-20 15:49       ` Quentin Perret
2021-07-22 18:04         ` Marc Zyngier
2021-07-23 10:16           ` Quentin Perret
2021-07-27 18:11   ` Will Deacon
2021-07-28  9:57     ` Marc Zyngier
2021-07-30 12:26       ` Will Deacon
2021-07-30 13:04         ` Marc Zyngier
2021-07-30 12:58     ` Quentin Perret
2021-07-15 16:31 ` [PATCH 05/16] KVM: arm64: Plumb MMIO checking into the fault handling Marc Zyngier
2021-07-27 18:11   ` Will Deacon
2021-07-28 10:21     ` Marc Zyngier
2021-07-30 12:38       ` Will Deacon
2021-07-15 16:31 ` [PATCH 06/16] KVM: arm64: Force a full unmap on vpcu reinit Marc Zyngier
2021-07-27 18:11   ` Will Deacon
2021-07-28 10:38     ` Marc Zyngier
2021-07-30 12:50       ` Will Deacon
2021-07-15 16:31 ` [PATCH 07/16] KVM: arm64: Wire MMIO guard hypercalls Marc Zyngier
2021-07-27 18:11   ` Will Deacon
2021-07-28 10:47     ` Marc Zyngier
2021-07-30 13:11       ` Will Deacon
2021-08-01 11:20         ` Marc Zyngier
2021-07-15 16:31 ` [PATCH 08/16] KVM: arm64: Add tracepoint for failed MMIO guard check Marc Zyngier
2021-07-15 16:31 ` [PATCH 09/16] KVM: arm64: Advertise a capability for MMIO guard Marc Zyngier
2021-07-15 16:31 ` [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature Marc Zyngier
2021-07-21 21:17   ` Andrew Jones
2021-07-23 13:30     ` Marc Zyngier
2021-07-23 13:38       ` Andrew Jones
2021-07-23 13:52         ` Marc Zyngier
2021-07-15 16:31 ` [PATCH 11/16] firmware/smccc: Call arch-specific hook on discovering KVM services Marc Zyngier
2021-07-15 16:31 ` [PATCH 12/16] mm/ioremap: Add arch-specific callbacks on ioremap/iounmap calls Marc Zyngier
2021-07-27 18:12   ` Will Deacon
2021-07-28 11:01     ` Marc Zyngier
2021-07-30 14:07       ` Will Deacon
2021-07-15 16:31 ` [PATCH 13/16] arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard Marc Zyngier
2021-07-15 16:31 ` [PATCH 14/16] arm64: Enroll into KVM's MMIO guard if required Marc Zyngier
2021-07-15 16:31 ` [PATCH 15/16] arm64: Add a helper to retrieve the PTE of a fixmap Marc Zyngier
2021-07-20 11:16   ` Quentin Perret
2021-07-15 16:31 ` [PATCH 16/16] arm64: Register earlycon fixmap with the MMIO guard Marc Zyngier
2021-07-21 21:42 ` [PATCH 00/16] KVM: arm64: MMIO guard PV services Andrew Jones
2021-07-22 10:00   ` Marc Zyngier
2021-07-22 13:25     ` Andrew Jones
2021-07-22 15:30       ` Marc Zyngier

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox