LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next()
@ 2021-08-12  4:36 Lai Jiangshan
  2021-08-12  4:36 ` [PATCH 2/2] KVM: X86: Remove the present check from for_each_shadow_entry* loop body Lai Jiangshan
  2021-08-12 16:03 ` [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Sean Christopherson
  0 siblings, 2 replies; 11+ messages in thread
From: Lai Jiangshan @ 2021-08-12  4:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

So far, the loop bodies already ensure the pte is present before calling
__shadow_walk_next().  But checking pte present in __shadow_walk_next()
is a more prudent way of programing and loop bodies will not need
to always check it.  It allows us removing is_shadow_present_pte()
in the loop bodies.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a272ccbddfa1..c48ecb25d5f8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2231,7 +2231,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
 static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
 			       u64 spte)
 {
-	if (is_last_spte(spte, iterator->level)) {
+	if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
 		iterator->level = 0;
 		return;
 	}
-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] KVM: X86: Remove the present check from for_each_shadow_entry* loop body
  2021-08-12  4:36 [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Lai Jiangshan
@ 2021-08-12  4:36 ` Lai Jiangshan
  2021-08-12 16:03 ` [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Sean Christopherson
  1 sibling, 0 replies; 11+ messages in thread
From: Lai Jiangshan @ 2021-08-12  4:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

The function __shadow_walk_next() for the for_each_shadow_entry* looping
has the check and propagates the result to shadow_walk_okay().

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/mmu/mmu.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c48ecb25d5f8..42eebba6782e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3152,9 +3152,6 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
 	for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
 		sptep = iterator.sptep;
 		*spte = old_spte;
-
-		if (!is_shadow_present_pte(old_spte))
-			break;
 	}
 
 	return sptep;
@@ -3694,9 +3691,6 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
 		spte = mmu_spte_get_lockless(iterator.sptep);
 
 		sptes[leaf] = spte;
-
-		if (!is_shadow_present_pte(spte))
-			break;
 	}
 
 	return leaf;
@@ -3811,11 +3805,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
 	u64 spte;
 
 	walk_shadow_page_lockless_begin(vcpu);
-	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
+	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
 		clear_sp_write_flooding_count(iterator.sptep);
-		if (!is_shadow_present_pte(spte))
-			break;
-	}
 	walk_shadow_page_lockless_end(vcpu);
 }
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next()
  2021-08-12  4:36 [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Lai Jiangshan
  2021-08-12  4:36 ` [PATCH 2/2] KVM: X86: Remove the present check from for_each_shadow_entry* loop body Lai Jiangshan
@ 2021-08-12 16:03 ` Sean Christopherson
  2021-08-13  3:16   ` [PATCH V2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next() Lai Jiangshan
  2021-08-14  9:47   ` [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Lai Jiangshan
  1 sibling, 2 replies; 11+ messages in thread
From: Sean Christopherson @ 2021-08-12 16:03 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Lai Jiangshan, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, kvm

On Thu, Aug 12, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> So far, the loop bodies already ensure the pte is present before calling
> __shadow_walk_next().  But checking pte present in __shadow_walk_next()
> is a more prudent way of programing and loop bodies will not need
> to always check it.  It allows us removing is_shadow_present_pte()
> in the loop bodies.

There needs to be more analysis in the changelog, as there are many more callers
to __shadow_walk_next() than the three that are modified in the next patch.  It
might even make sense to squash the two patches together, i.e. make it a "move"
instead of an "add + remove", and then explicitly explain why it's ok to add the
check for paths that do _not_ currently have a !is_shadow_present_pte() in the
loop body.

Specifically, FNAME(fetch) via shadow_walk_next() and __direct_map() via
for_each_shadow_entry() do not currently terminate their walks with a !PRESENT,
but they get away with it because they install present non-leaf SPTEs in the loop
itself.

The other argument for the audit (changelog+patch) of all users is that the next
patch misses FNAME(invlpg), e.g. 

@@ -977,7 +980,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
                        FNAME(update_pte)(vcpu, sp, sptep, &gpte);
                }

-               if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
+               if (!sp->unsync_children)
                        break;
        }
        write_unlock(&vcpu->kvm->mmu_lock);

It would also be worthwhile to document via the changelog that terminating on
!is_shadow_present_pte() is 100% the correct behavior, as walking past a !PRESENT
SPTE would lead to attempting to read a the next level SPTE from a garbage
iter->shadow_addr.

And for clarity and safety, I think it would be worth adding the patch below as
a prep patch to document and enforce that walking the non-leaf SPTEs when faulting
in a page should never terminate early.


From 1c202a7e82b1931e4eb37b23aa9d7108340a6cd2 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 12 Aug 2021 08:38:35 -0700
Subject: [PATCH] KVM: x86/mmu: Verify shadow walk doesn't terminate early in
 page faults

WARN and bail if the shadow walk for faulting in a SPTE terminates early,
i.e. doesn't reach the expected level because the walk encountered a
terminal SPTE.  The shadow walks for page faults are subtle in that they
install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop
body, and consume the newly created non-leaf SPTE in the loop control,
e.g. __shadow_walk_next().  In other words, the walks guarantee that the
walk will stop if and only if the target level is reached by installing
non-leaf SPTEs to guarantee the walk remains valid.

Opportunistically use fault->goal-level instead of it.level in
FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at
the target level.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 3 +++
 arch/x86/kvm/mmu/paging_tmpl.h | 7 +++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a272ccbddfa1..2a243ae1d64c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2992,6 +2992,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			account_huge_nx_page(vcpu->kvm, sp);
 	}

+	if (WARN_ON_ONCE(it.level != fault->goal_level))
+		return -EFAULT;
+
 	ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
 			   fault->write, fault->goal_level, base_gfn, fault->pfn,
 			   fault->prefault, fault->map_writable);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index f70afecbf3a2..3a8a7b2f9979 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -749,9 +749,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		}
 	}

+	if (WARN_ON_ONCE(it.level != fault->goal_level))
+		return -EFAULT;
+
 	ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
-			   it.level, base_gfn, fault->pfn, fault->prefault,
-			   fault->map_writable);
+			   fault->goal_level, base_gfn, fault->pfn,
+			   fault->prefault, fault->map_writable);
 	if (ret == RET_PF_SPURIOUS)
 		return ret;

--
2.33.0.rc1.237.g0d66db33f3-goog

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2] KVM: X86: Move PTE present check from loop body to  __shadow_walk_next()
  2021-08-12 16:03 ` [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Sean Christopherson
@ 2021-08-13  3:16   ` Lai Jiangshan
  2021-08-24  8:30     ` Lai Jiangshan
  2021-09-02 20:43     ` Sean Christopherson
  2021-08-14  9:47   ` [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Lai Jiangshan
  1 sibling, 2 replies; 11+ messages in thread
From: Lai Jiangshan @ 2021-08-13  3:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

So far, the loop bodies already ensure the PTE is present before calling
__shadow_walk_next():  Some loop bodies simply exit with a !PRESENT
directly and some other loop bodies, i.e. FNAME(fetch) and __direct_map()
do not currently terminate their walks with a !PRESENT, but they get away
with it because they install present non-leaf SPTEs in the loop itself.

But checking pte present in __shadow_walk_next() is a more prudent way of
programing and loop bodies will not need to always check it. It allows us
removing unneded is_shadow_present_pte() in the loop bodies.

Terminating on !is_shadow_present_pte() is 100% the correct behavior, as
walking past a !PRESENT SPTE would lead to attempting to read a the next
level SPTE from a garbage iter->shadow_addr.  Even some paths that do _not_
currently have a !is_shadow_present_pte() in the loop body is Ok since
they will install present non-leaf SPTEs and the additinal present check
is just an NOP.

The checking result in __shadow_walk_next() will be propagated to
shadow_walk_okay() for being used in any for(;;) loop.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
Changed from V1:
	Merge the two patches
	Update changelog
	Remove !is_shadow_present_pte() in FNAME(invlpg)
 arch/x86/kvm/mmu/mmu.c         | 13 ++-----------
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a272ccbddfa1..42eebba6782e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2231,7 +2231,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
 static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
 			       u64 spte)
 {
-	if (is_last_spte(spte, iterator->level)) {
+	if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
 		iterator->level = 0;
 		return;
 	}
@@ -3152,9 +3152,6 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
 	for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
 		sptep = iterator.sptep;
 		*spte = old_spte;
-
-		if (!is_shadow_present_pte(old_spte))
-			break;
 	}
 
 	return sptep;
@@ -3694,9 +3691,6 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
 		spte = mmu_spte_get_lockless(iterator.sptep);
 
 		sptes[leaf] = spte;
-
-		if (!is_shadow_present_pte(spte))
-			break;
 	}
 
 	return leaf;
@@ -3811,11 +3805,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
 	u64 spte;
 
 	walk_shadow_page_lockless_begin(vcpu);
-	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
+	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
 		clear_sp_write_flooding_count(iterator.sptep);
-		if (!is_shadow_present_pte(spte))
-			break;
-	}
 	walk_shadow_page_lockless_end(vcpu);
 }
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index f70afecbf3a2..13138b03cc69 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -977,7 +977,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 			FNAME(update_pte)(vcpu, sp, sptep, &gpte);
 		}
 
-		if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
+		if (!sp->unsync_children)
 			break;
 	}
 	write_unlock(&vcpu->kvm->mmu_lock);
-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next()
  2021-08-12 16:03 ` [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Sean Christopherson
  2021-08-13  3:16   ` [PATCH V2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next() Lai Jiangshan
@ 2021-08-14  9:47   ` Lai Jiangshan
  1 sibling, 0 replies; 11+ messages in thread
From: Lai Jiangshan @ 2021-08-14  9:47 UTC (permalink / raw)
  To: Sean Christopherson, Lai Jiangshan
  Cc: linux-kernel, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, kvm



On 2021/8/13 00:03, Sean Christopherson wrote:

> 
> And for clarity and safety, I think it would be worth adding the patch below as
> a prep patch to document and enforce that walking the non-leaf SPTEs when faulting
> in a page should never terminate early.

I'v sent V2 patch which was updated as you suggested.
The patch you provided below doesn't need to be updated. It and my V2 patch
do not depend on each other, so I didn't resent your patch with mine together.

For your patch

Acked-by: Lai Jiangshan <jiangshanlai@gmail.com>

And it can be queued earlier.

> 
> 
>  From 1c202a7e82b1931e4eb37b23aa9d7108340a6cd2 Mon Sep 17 00:00:00 2001
> From: Sean Christopherson <seanjc@google.com>
> Date: Thu, 12 Aug 2021 08:38:35 -0700
> Subject: [PATCH] KVM: x86/mmu: Verify shadow walk doesn't terminate early in
>   page faults
> 
> WARN and bail if the shadow walk for faulting in a SPTE terminates early,
> i.e. doesn't reach the expected level because the walk encountered a
> terminal SPTE.  The shadow walks for page faults are subtle in that they
> install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop
> body, and consume the newly created non-leaf SPTE in the loop control,
> e.g. __shadow_walk_next().  In other words, the walks guarantee that the
> walk will stop if and only if the target level is reached by installing
> non-leaf SPTEs to guarantee the walk remains valid.
> 
> Opportunistically use fault->goal-level instead of it.level in
> FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at
> the target level.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c         | 3 +++
>   arch/x86/kvm/mmu/paging_tmpl.h | 7 +++++--
>   2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a272ccbddfa1..2a243ae1d64c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2992,6 +2992,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>   			account_huge_nx_page(vcpu->kvm, sp);
>   	}
> 
> +	if (WARN_ON_ONCE(it.level != fault->goal_level))
> +		return -EFAULT;
> +
>   	ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
>   			   fault->write, fault->goal_level, base_gfn, fault->pfn,
>   			   fault->prefault, fault->map_writable);
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index f70afecbf3a2..3a8a7b2f9979 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -749,9 +749,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>   		}
>   	}
> 
> +	if (WARN_ON_ONCE(it.level != fault->goal_level))
> +		return -EFAULT;
> +
>   	ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
> -			   it.level, base_gfn, fault->pfn, fault->prefault,
> -			   fault->map_writable);
> +			   fault->goal_level, base_gfn, fault->pfn,
> +			   fault->prefault, fault->map_writable);
>   	if (ret == RET_PF_SPURIOUS)
>   		return ret;
> 
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next()
  2021-08-13  3:16   ` [PATCH V2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next() Lai Jiangshan
@ 2021-08-24  8:30     ` Lai Jiangshan
  2021-09-02 20:43     ` Sean Christopherson
  1 sibling, 0 replies; 11+ messages in thread
From: Lai Jiangshan @ 2021-08-24  8:30 UTC (permalink / raw)
  To: LKML
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	H. Peter Anvin, kvm

Hello, Paolo

Could you have a review please.

Thanks
Lai

On Fri, Aug 13, 2021 at 9:01 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
> From: Lai Jiangshan <laijs@linux.alibaba.com>
>
> So far, the loop bodies already ensure the PTE is present before calling
> __shadow_walk_next():  Some loop bodies simply exit with a !PRESENT
> directly and some other loop bodies, i.e. FNAME(fetch) and __direct_map()
> do not currently terminate their walks with a !PRESENT, but they get away
> with it because they install present non-leaf SPTEs in the loop itself.
>
> But checking pte present in __shadow_walk_next() is a more prudent way of
> programing and loop bodies will not need to always check it. It allows us
> removing unneded is_shadow_present_pte() in the loop bodies.
>
> Terminating on !is_shadow_present_pte() is 100% the correct behavior, as
> walking past a !PRESENT SPTE would lead to attempting to read a the next
> level SPTE from a garbage iter->shadow_addr.  Even some paths that do _not_
> currently have a !is_shadow_present_pte() in the loop body is Ok since
> they will install present non-leaf SPTEs and the additinal present check
> is just an NOP.
>
> The checking result in __shadow_walk_next() will be propagated to
> shadow_walk_okay() for being used in any for(;;) loop.
>
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---
> Changed from V1:
>         Merge the two patches
>         Update changelog
>         Remove !is_shadow_present_pte() in FNAME(invlpg)
>  arch/x86/kvm/mmu/mmu.c         | 13 ++-----------
>  arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
>  2 files changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a272ccbddfa1..42eebba6782e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2231,7 +2231,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
>  static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
>                                u64 spte)
>  {
> -       if (is_last_spte(spte, iterator->level)) {
> +       if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
>                 iterator->level = 0;
>                 return;
>         }
> @@ -3152,9 +3152,6 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
>         for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
>                 sptep = iterator.sptep;
>                 *spte = old_spte;
> -
> -               if (!is_shadow_present_pte(old_spte))
> -                       break;
>         }
>
>         return sptep;
> @@ -3694,9 +3691,6 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
>                 spte = mmu_spte_get_lockless(iterator.sptep);
>
>                 sptes[leaf] = spte;
> -
> -               if (!is_shadow_present_pte(spte))
> -                       break;
>         }
>
>         return leaf;
> @@ -3811,11 +3805,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
>         u64 spte;
>
>         walk_shadow_page_lockless_begin(vcpu);
> -       for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
> +       for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
>                 clear_sp_write_flooding_count(iterator.sptep);
> -               if (!is_shadow_present_pte(spte))
> -                       break;
> -       }
>         walk_shadow_page_lockless_end(vcpu);
>  }
>
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index f70afecbf3a2..13138b03cc69 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -977,7 +977,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>                         FNAME(update_pte)(vcpu, sp, sptep, &gpte);
>                 }
>
> -               if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
> +               if (!sp->unsync_children)
>                         break;
>         }
>         write_unlock(&vcpu->kvm->mmu_lock);
> --
> 2.19.1.6.gb485710b
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next()
  2021-08-13  3:16   ` [PATCH V2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next() Lai Jiangshan
  2021-08-24  8:30     ` Lai Jiangshan
@ 2021-09-02 20:43     ` Sean Christopherson
  2021-09-06 12:25       ` [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults Lai Jiangshan
  1 sibling, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2021-09-02 20:43 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Lai Jiangshan, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, kvm

On Fri, Aug 13, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> So far, the loop bodies already ensure the PTE is present before calling
> __shadow_walk_next():  Some loop bodies simply exit with a !PRESENT
> directly and some other loop bodies, i.e. FNAME(fetch) and __direct_map()
> do not currently terminate their walks with a !PRESENT, but they get away
> with it because they install present non-leaf SPTEs in the loop itself.
> 
> But checking pte present in __shadow_walk_next() is a more prudent way of
> programing and loop bodies will not need to always check it. It allows us
> removing unneded is_shadow_present_pte() in the loop bodies.
           ^^^^^^^
	   unneeded

> 
> Terminating on !is_shadow_present_pte() is 100% the correct behavior, as
> walking past a !PRESENT SPTE would lead to attempting to read a the next
> level SPTE from a garbage iter->shadow_addr.  Even some paths that do _not_
> currently have a !is_shadow_present_pte() in the loop body is Ok since
> they will install present non-leaf SPTEs and the additinal present check
                                                   ^^^^^^^^^
						   additional
> is just an NOP.
> 
> The checking result in __shadow_walk_next() will be propagated to
> shadow_walk_okay() for being used in any for(;;) loop.
> 
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---

Nits aside,

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in  page faults
  2021-09-02 20:43     ` Sean Christopherson
@ 2021-09-06 12:25       ` Lai Jiangshan
  2021-09-06 12:25         ` [PATCH V3 2/2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next() Lai Jiangshan
  2021-09-24  9:33         ` [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults Paolo Bonzini
  0 siblings, 2 replies; 11+ messages in thread
From: Lai Jiangshan @ 2021-09-06 12:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Sean Christopherson, Lai Jiangshan, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Sean Christopherson <seanjc@google.com>

WARN and bail if the shadow walk for faulting in a SPTE terminates early,
i.e. doesn't reach the expected level because the walk encountered a
terminal SPTE.  The shadow walks for page faults are subtle in that they
install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop
body, and consume the newly created non-leaf SPTE in the loop control,
e.g. __shadow_walk_next().  In other words, the walks guarantee that the
walk will stop if and only if the target level is reached by installing
non-leaf SPTEs to guarantee the walk remains valid.

Opportunistically use fault->goal-level instead of it.level in
FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at
the target level.

Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/mmu/mmu.c         | 3 +++
 arch/x86/kvm/mmu/paging_tmpl.h | 7 +++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4853c033e6ce..538be037549d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3006,6 +3006,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			account_huge_nx_page(vcpu->kvm, sp);
 	}
 
+	if (WARN_ON_ONCE(it.level != fault->goal_level))
+		return -EFAULT;
+
 	ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
 			   fault->write, fault->goal_level, base_gfn, fault->pfn,
 			   fault->prefault, fault->map_writable);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 50ade6450ace..4d559d2d4d66 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -749,9 +749,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		}
 	}
 
+	if (WARN_ON_ONCE(it.level != fault->goal_level))
+		return -EFAULT;
+
 	ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
-			   it.level, base_gfn, fault->pfn, fault->prefault,
-			   fault->map_writable);
+			   fault->goal_level, base_gfn, fault->pfn,
+			   fault->prefault, fault->map_writable);
 	if (ret == RET_PF_SPURIOUS)
 		return ret;
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V3 2/2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next()
  2021-09-06 12:25       ` [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults Lai Jiangshan
@ 2021-09-06 12:25         ` Lai Jiangshan
  2021-09-24  9:56           ` Paolo Bonzini
  2021-09-24  9:33         ` [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults Paolo Bonzini
  1 sibling, 1 reply; 11+ messages in thread
From: Lai Jiangshan @ 2021-09-06 12:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

So far, the loop bodies already ensure the PTE is present before calling
__shadow_walk_next():  Some loop bodies simply exit with a !PRESENT
directly and some other loop bodies, i.e. FNAME(fetch) and __direct_map()
do not currently terminate their walks with a !PRESENT, but they get away
with it because they install present non-leaf SPTEs in the loop itself.

But checking pte present in __shadow_walk_next() is a more prudent way of
programing and loop bodies will not need to always check it. It allows us
removing unneeded is_shadow_present_pte() in the loop bodies.

Terminating on !is_shadow_present_pte() is 100% the correct behavior, as
walking past a !PRESENT SPTE would lead to attempting to read a the next
level SPTE from a garbage iter->shadow_addr.  Even some paths that do _not_
currently have a !is_shadow_present_pte() in the loop body is Ok since
they will install present non-leaf SPTEs and the additional present check
is just an NOP.

The checking result in __shadow_walk_next() will be propagated to
shadow_walk_okay() for being used in any for(;;) loop.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
Changed from V2:
	Fix typo in the changelog reported by Sean
	Add Reviewed-by from Sean
Changed from V1:
	Merge the two patches
	Update changelog
	Remove !is_shadow_present_pte() in FNAME(invlpg)
 arch/x86/kvm/mmu/mmu.c         | 13 ++-----------
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 538be037549d..26f6bd238a77 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2223,7 +2223,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
 static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
 			       u64 spte)
 {
-	if (is_last_spte(spte, iterator->level)) {
+	if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
 		iterator->level = 0;
 		return;
 	}
@@ -3159,9 +3159,6 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
 	for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
 		sptep = iterator.sptep;
 		*spte = old_spte;
-
-		if (!is_shadow_present_pte(old_spte))
-			break;
 	}
 
 	return sptep;
@@ -3721,9 +3718,6 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
 		spte = mmu_spte_get_lockless(iterator.sptep);
 
 		sptes[leaf] = spte;
-
-		if (!is_shadow_present_pte(spte))
-			break;
 	}
 
 	return leaf;
@@ -3838,11 +3832,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
 	u64 spte;
 
 	walk_shadow_page_lockless_begin(vcpu);
-	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
+	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
 		clear_sp_write_flooding_count(iterator.sptep);
-		if (!is_shadow_present_pte(spte))
-			break;
-	}
 	walk_shadow_page_lockless_end(vcpu);
 }
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 4d559d2d4d66..72f358613786 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -982,7 +982,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 			FNAME(update_pte)(vcpu, sp, sptep, &gpte);
 		}
 
-		if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
+		if (!sp->unsync_children)
 			break;
 	}
 	write_unlock(&vcpu->kvm->mmu_lock);
-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults
  2021-09-06 12:25       ` [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults Lai Jiangshan
  2021-09-06 12:25         ` [PATCH V3 2/2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next() Lai Jiangshan
@ 2021-09-24  9:33         ` Paolo Bonzini
  1 sibling, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-09-24  9:33 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Sean Christopherson, Lai Jiangshan, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, kvm

On 06/09/21 14:25, Lai Jiangshan wrote:
> Opportunistically use fault->goal-level instead of it.level in
> FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at
> the target level.

This argument will go away when mmu_set_spte starts fishing out the 
level from the kvm_mmu_page role, but it makes a good point for now.

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V3 2/2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next()
  2021-09-06 12:25         ` [PATCH V3 2/2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next() Lai Jiangshan
@ 2021-09-24  9:56           ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-09-24  9:56 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, kvm

On 06/09/21 14:25, Lai Jiangshan wrote:
> But checking pte present in __shadow_walk_next() is a more prudent way of
> programing and loop bodies will not need to always check it. It allows us
> removing unneeded is_shadow_present_pte() in the loop bodies.
> 
> Terminating on !is_shadow_present_pte() is 100% the correct behavior, as
> walking past a !PRESENT SPTE would lead to attempting to read a the next
> level SPTE from a garbage iter->shadow_addr.  Even some paths that do_not_
> currently have a !is_shadow_present_pte() in the loop body is Ok since
> they will install present non-leaf SPTEs and the additional present check
> is just an NOP.
> 
> The checking result in __shadow_walk_next() will be propagated to
> shadow_walk_okay() for being used in any for(;;) loop.
> 
> Reviewed-by: Sean Christopherson<seanjc@google.com>
> Signed-off-by: Lai Jiangshan<laijs@linux.alibaba.com>
> ---
> Changed from V2:
> 	Fix typo in the changelog reported by Sean
> 	Add Reviewed-by from Sean
> Changed from V1:
> 	Merge the two patches
> 	Update changelog
> 	Remove !is_shadow_present_pte() in FNAME(invlpg)
>   arch/x86/kvm/mmu/mmu.c         | 13 ++-----------
>   arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
>   2 files changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 538be037549d..26f6bd238a77 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2223,7 +2223,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
>   static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
>   			       u64 spte)
>   {
> -	if (is_last_spte(spte, iterator->level)) {
> +	if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
>   		iterator->level = 0;
>   		return;
>   	}
> @@ -3159,9 +3159,6 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
>   	for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
>   		sptep = iterator.sptep;
>   		*spte = old_spte;
> -
> -		if (!is_shadow_present_pte(old_spte))
> -			break;
>   	}
>   
>   	return sptep;
> @@ -3721,9 +3718,6 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
>   		spte = mmu_spte_get_lockless(iterator.sptep);
>   
>   		sptes[leaf] = spte;
> -
> -		if (!is_shadow_present_pte(spte))
> -			break;
>   	}
>   
>   	return leaf;
> @@ -3838,11 +3832,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
>   	u64 spte;
>   
>   	walk_shadow_page_lockless_begin(vcpu);
> -	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
> +	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
>   		clear_sp_write_flooding_count(iterator.sptep);
> -		if (!is_shadow_present_pte(spte))
> -			break;
> -	}
>   	walk_shadow_page_lockless_end(vcpu);
>   }
>   
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 4d559d2d4d66..72f358613786 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -982,7 +982,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>   			FNAME(update_pte)(vcpu, sp, sptep, &gpte);
>   		}
>   
> -		if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
> +		if (!sp->unsync_children)

Queued both, thanks.

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-24  9:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-12  4:36 [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Lai Jiangshan
2021-08-12  4:36 ` [PATCH 2/2] KVM: X86: Remove the present check from for_each_shadow_entry* loop body Lai Jiangshan
2021-08-12 16:03 ` [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Sean Christopherson
2021-08-13  3:16   ` [PATCH V2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next() Lai Jiangshan
2021-08-24  8:30     ` Lai Jiangshan
2021-09-02 20:43     ` Sean Christopherson
2021-09-06 12:25       ` [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults Lai Jiangshan
2021-09-06 12:25         ` [PATCH V3 2/2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next() Lai Jiangshan
2021-09-24  9:56           ` Paolo Bonzini
2021-09-24  9:33         ` [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults Paolo Bonzini
2021-08-14  9:47   ` [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next() Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).