LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 0/2] KVM: x86/mmu: Fix a TDP MMU leak and optimize zap all
@ 2021-08-12 18:14 Sean Christopherson
  2021-08-12 18:14 ` [PATCH v2 1/2] KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs Sean Christopherson
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Sean Christopherson @ 2021-08-12 18:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Patch 1 fixes a leak of root-1 shadow pages, patch 2 is a minor
optimization to the zap all flow that avoids re-reading and re-checking
the root-1 SPTEs after they've been zapped by "zap all" flows.

I'm still somewhat on the fence for patch 2, feel free to drop it.

v2:
 - Replaced magic number silliness with Paolo's much more clever suggestion.
 - Elaborated on the benefits of the optimization.
 - Add Ben's somewhat reluctant review for the optimization.

v1: https://lkml.kernel.org/r/20210812050717.3176478-1-seanjc@google.com

Sean Christopherson (2):
  KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs
  KVM: x86/mmu: Don't step down in the TDP iterator when zapping all
    SPTEs

 arch/x86/kvm/mmu/tdp_mmu.c | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs
  2021-08-12 18:14 [PATCH v2 0/2] KVM: x86/mmu: Fix a TDP MMU leak and optimize zap all Sean Christopherson
@ 2021-08-12 18:14 ` Sean Christopherson
  2021-08-12 18:14 ` [PATCH v2 2/2] KVM: x86/mmu: Don't step down in the TDP iterator " Sean Christopherson
  2021-08-13  7:36 ` [PATCH v2 0/2] KVM: x86/mmu: Fix a TDP MMU leak and optimize zap all Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2021-08-12 18:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and
really zap all SPTEs in this case.  As is, zap_gfn_range() skips non-leaf
SPTEs whose range exceeds the range to be zapped.  If shadow_phys_bits is
not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level
paging, the "zap all" flows will skip top-level SPTEs whose range extends
beyond shadow_phys_bits and leak their SPs when the VM is destroyed.

Use the current upper bound (based on host.MAXPHYADDR) to detect that the
caller wants to zap all SPTEs, e.g. instead of using the max theoretical
gfn, 1 << (52 - 12).  The more precise upper bound allows the TDP iterator
to terminate its walk earlier when running on hosts with MAXPHYADDR < 52.

Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to
help future debuggers should KVM decide to leak SPTEs again.

The bug is most easily reproduced by running (and unloading!) KVM in a
VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped.

  =============================================================================
  BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------
  Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1)
  CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack_lvl+0x45/0x59
   slab_err+0x95/0xc9
   __kmem_cache_shutdown.cold+0x3c/0x158
   kmem_cache_destroy+0x3d/0xf0
   kvm_mmu_module_exit+0xa/0x30 [kvm]
   kvm_arch_exit+0x5d/0x90 [kvm]
   kvm_exit+0x78/0x90 [kvm]
   vmx_exit+0x1a/0x50 [kvm_intel]
   __x64_sys_delete_module+0x13f/0x220
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 47ec9f968406..4a9d52283ec5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -43,6 +43,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 	if (!kvm->arch.tdp_mmu_enabled)
 		return;
 
+	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_pages));
 	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
 
 	/*
@@ -81,8 +82,6 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
 void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 			  bool shared)
 {
-	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
-
 	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
 	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
@@ -94,7 +93,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	list_del_rcu(&root->link);
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
 
-	zap_gfn_range(kvm, root, 0, max_gfn, false, false, shared);
+	zap_gfn_range(kvm, root, 0, -1ull, false, false, shared);
 
 	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
@@ -739,8 +738,17 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 			  gfn_t start, gfn_t end, bool can_yield, bool flush,
 			  bool shared)
 {
+	gfn_t max_gfn_host = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
+	bool zap_all = (start == 0 && end >= max_gfn_host);
 	struct tdp_iter iter;
 
+	/*
+	 * Bound the walk at host.MAXPHYADDR, guest accesses beyond that will
+	 * hit a #PF(RSVD) and never get to an EPT Violation/Misconfig / #NPF,
+	 * and so KVM will never install a SPTE for such addresses.
+	 */
+	end = min(end, max_gfn_host);
+
 	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
 	rcu_read_lock();
@@ -759,9 +767,10 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 		/*
 		 * If this is a non-last-level SPTE that covers a larger range
 		 * than should be zapped, continue, and zap the mappings at a
-		 * lower level.
+		 * lower level, except when zapping all SPTEs.
 		 */
-		if ((iter.gfn < start ||
+		if (!zap_all &&
+		    (iter.gfn < start ||
 		     iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end) &&
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
@@ -803,12 +812,11 @@ bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
 
 void kvm_tdp_mmu_zap_all(struct kvm *kvm)
 {
-	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
 	bool flush = false;
 	int i;
 
 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
-		flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, 0, max_gfn, flush);
+		flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, 0, -1ull, flush);
 
 	if (flush)
 		kvm_flush_remote_tlbs(kvm);
@@ -846,7 +854,6 @@ static struct kvm_mmu_page *next_invalidated_root(struct kvm *kvm,
  */
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
 {
-	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
 	struct kvm_mmu_page *next_root;
 	struct kvm_mmu_page *root;
 	bool flush = false;
@@ -862,8 +869,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
 
 		rcu_read_unlock();
 
-		flush = zap_gfn_range(kvm, root, 0, max_gfn, true, flush,
-				      true);
+		flush = zap_gfn_range(kvm, root, 0, -1ull, true, flush, true);
 
 		/*
 		 * Put the reference acquired in
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs
  2021-08-12 18:14 [PATCH v2 0/2] KVM: x86/mmu: Fix a TDP MMU leak and optimize zap all Sean Christopherson
  2021-08-12 18:14 ` [PATCH v2 1/2] KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs Sean Christopherson
@ 2021-08-12 18:14 ` Sean Christopherson
  2021-08-13  7:36 ` [PATCH v2 0/2] KVM: x86/mmu: Fix a TDP MMU leak and optimize zap all Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2021-08-12 18:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Set the min_level for the TDP iterator at the root level when zapping all
SPTEs to optimize the iterator's try_step_down().  Zapping a non-leaf
SPTE will recursively zap all its children, thus there is no need for the
iterator to attempt to step down.  This avoids rereading the top-level
SPTEs after they are zapped by causing try_step_down() to short-circuit.

In most cases, optimizing try_step_down() will be in the noise as the cost
of zapping SPTEs completely dominates the overall time.  The optimization
is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM
is zapping in response to multiple memslot updates when userspace is adding
and removing read-only memslots for option ROMs.  In that case, the task
doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock
for read and thus can be a noisy neighbor of sorts.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 4a9d52283ec5..499dadeb45a3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -742,6 +742,12 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 	bool zap_all = (start == 0 && end >= max_gfn_host);
 	struct tdp_iter iter;
 
+	/*
+	 * No need to try to step down in the iterator when zapping all SPTEs,
+	 * zapping the top-level non-leaf SPTEs will recurse on their children.
+	 */
+	int min_level = zap_all ? root->role.level : PG_LEVEL_4K;
+
 	/*
 	 * Bound the walk at host.MAXPHYADDR, guest accesses beyond that will
 	 * hit a #PF(RSVD) and never get to an EPT Violation/Misconfig / #NPF,
@@ -753,7 +759,8 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	tdp_root_for_each_pte(iter, root, start, end) {
+	for_each_tdp_pte_min_level(iter, root->spt, root->role.level,
+				   min_level, start, end) {
 retry:
 		if (can_yield &&
 		    tdp_mmu_iter_cond_resched(kvm, &iter, flush, shared)) {
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 0/2] KVM: x86/mmu: Fix a TDP MMU leak and optimize zap all
  2021-08-12 18:14 [PATCH v2 0/2] KVM: x86/mmu: Fix a TDP MMU leak and optimize zap all Sean Christopherson
  2021-08-12 18:14 ` [PATCH v2 1/2] KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs Sean Christopherson
  2021-08-12 18:14 ` [PATCH v2 2/2] KVM: x86/mmu: Don't step down in the TDP iterator " Sean Christopherson
@ 2021-08-13  7:36 ` Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2021-08-13  7:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On 12/08/21 20:14, Sean Christopherson wrote:
> Patch 1 fixes a leak of root-1 shadow pages, patch 2 is a minor
> optimization to the zap all flow that avoids re-reading and re-checking
> the root-1 SPTEs after they've been zapped by "zap all" flows.
> 
> I'm still somewhat on the fence for patch 2, feel free to drop it.
> 
> v2:
>   - Replaced magic number silliness with Paolo's much more clever suggestion.
>   - Elaborated on the benefits of the optimization.
>   - Add Ben's somewhat reluctant review for the optimization.
> 
> v1: https://lkml.kernel.org/r/20210812050717.3176478-1-seanjc@google.com
> 
> Sean Christopherson (2):
>    KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs
>    KVM: x86/mmu: Don't step down in the TDP iterator when zapping all
>      SPTEs
> 
>   arch/x86/kvm/mmu/tdp_mmu.c | 35 ++++++++++++++++++++++++-----------
>   1 file changed, 24 insertions(+), 11 deletions(-)
> 

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-13  7:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-12 18:14 [PATCH v2 0/2] KVM: x86/mmu: Fix a TDP MMU leak and optimize zap all Sean Christopherson
2021-08-12 18:14 ` [PATCH v2 1/2] KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs Sean Christopherson
2021-08-12 18:14 ` [PATCH v2 2/2] KVM: x86/mmu: Don't step down in the TDP iterator " Sean Christopherson
2021-08-13  7:36 ` [PATCH v2 0/2] KVM: x86/mmu: Fix a TDP MMU leak and optimize zap all Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).