LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Lai Jiangshan <laijs@linux.alibaba.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Avi Kivity <avi@redhat.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH 2/7] KVM: X86: Synchronize the shadow pagetable before link it
Date: Thu, 2 Sep 2021 23:40:58 +0000	[thread overview]
Message-ID: <YTFhCt87vzo4xDrc@google.com> (raw)
In-Reply-To: <20210824075524.3354-3-jiangshanlai@gmail.com>

On Tue, Aug 24, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> If gpte is changed from non-present to present, the guest doesn't need
> to flush tlb per SDM.  So the host must synchronze sp before
> link it.  Otherwise the guest might use a wrong mapping.
> 
> For example: the guest first changes a level-1 pagetable, and then
> links its parent to a new place where the original gpte is non-present.
> Finally the guest can access the remapped area without flushing
> the tlb.  The guest's behavior should be allowed per SDM, but the host
> kvm mmu makes it wrong.

Ah, are you saying, given:

VA_x = PML4_A -> PDP_B -> PD_C -> PT_D

the guest can modify PT_D, then link it with

VA_y = PML4_A -> PDP_B -> PD_E -> PT_D

and access it via VA_y without flushing, and so KVM must sync PT_D.  Is that
correct?

> Fixes: 4731d4c7a077 ("KVM: MMU: out of sync shadow core")
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---

...

> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 50ade6450ace..48c7fe1b2d50 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -664,7 +664,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
>   * emulate this operation, return 1 to indicate this case.
>   */
>  static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> -			 struct guest_walker *gw)
> +			 struct guest_walker *gw, unsigned long mmu_seq)
>  {
>  	struct kvm_mmu_page *sp = NULL;
>  	struct kvm_shadow_walk_iterator it;
> @@ -678,6 +678,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>  	top_level = vcpu->arch.mmu->root_level;
>  	if (top_level == PT32E_ROOT_LEVEL)
>  		top_level = PT32_ROOT_LEVEL;
> +
> +again:
>  	/*
>  	 * Verify that the top-level gpte is still there.  Since the page
>  	 * is a root page, it is either write protected (and cannot be
> @@ -713,8 +715,28 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>  		if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))
>  			goto out_gpte_changed;
>  
> -		if (sp)
> +		if (sp) {
> +			/*
> +			 * We must synchronize the pagetable before link it
> +			 * because the guest doens't need to flush tlb when
> +			 * gpte is changed from non-present to present.
> +			 * Otherwise, the guest may use the wrong mapping.
> +			 *
> +			 * For PG_LEVEL_4K, kvm_mmu_get_page() has already
> +			 * synchronized it transiently via kvm_sync_page().
> +			 *
> +			 * For higher level pagetable, we synchronize it
> +			 * via slower mmu_sync_children().  If it once
> +			 * released the mmu_lock, we need to restart from
> +			 * the root since we don't have reference to @sp.
> +			 */
> +			if (sp->unsync_children && !mmu_sync_children(vcpu, sp, false)) {

I don't like dropping mmu_lock in the page fault path.  I agree that it's not
all that different than grabbing various things in kvm_mmu_do_page_fault() long
before acquiring mmu_lock, but I'm not 100% convinced we don't have a latent
bug hiding somehwere in there :-), and (b) there's a possibility, however small,
that something in FNAME(fetch) that we're missing.  Case in point, this technically
needs to do make_mmu_pages_available().

And I believe kvm_mmu_get_page() already tries to handle this case by requesting
KVM_REQ_MMU_SYNC if it uses a sp with unsync_children, it just doesn't handle SMP
interaction, e.g. can link a sp that's immediately available to other vCPUs before
the sync.

Rather than force the sync here, what about kicking all vCPUs and retrying the
page fault?  The only gross part is that kvm_mmu_get_page() can now fail :-(

---
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/mmu/mmu.c          | 9 +++++++--
 arch/x86/kvm/mmu/paging_tmpl.h  | 4 ++++
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09b256db394a..332b9fb3454c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -57,7 +57,8 @@
 #define KVM_REQ_MIGRATE_TIMER		KVM_ARCH_REQ(0)
 #define KVM_REQ_REPORT_TPR_ACCESS	KVM_ARCH_REQ(1)
 #define KVM_REQ_TRIPLE_FAULT		KVM_ARCH_REQ(2)
-#define KVM_REQ_MMU_SYNC		KVM_ARCH_REQ(3)
+#define KVM_REQ_MMU_SYNC \
+	KVM_ARCH_REQ_FLAGS(3, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_CLOCK_UPDATE		KVM_ARCH_REQ(4)
 #define KVM_REQ_LOAD_MMU_PGD		KVM_ARCH_REQ(5)
 #define KVM_REQ_EVENT			KVM_ARCH_REQ(6)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4853c033e6ce..03293cd3c7ae 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2143,8 +2143,10 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 		}

-		if (sp->unsync_children)
-			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+		if (sp->unsync_children) {
+			kvm_make_all_cpus_request(KVM_REQ_MMU_SYNC, vcpu);
+			return NULL;
+		}

 		__clear_sp_write_flooding_count(sp);

@@ -2999,6 +3001,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

 		sp = kvm_mmu_get_page(vcpu, base_gfn, it.addr,
 				      it.level - 1, true, ACC_ALL);
+		BUG_ON(!sp);

 		link_shadow_page(vcpu, it.sptep, sp);
 		if (fault->is_tdp && fault->huge_page_disallowed &&
@@ -3383,6 +3386,8 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
 	struct kvm_mmu_page *sp;

 	sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL);
+	BUG_ON(!sp);
+
 	++sp->root_count;

 	return __pa(sp->spt);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 50ade6450ace..f573d45e2c6f 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -704,6 +704,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			access = gw->pt_access[it.level - 2];
 			sp = kvm_mmu_get_page(vcpu, table_gfn, fault->addr,
 					      it.level-1, false, access);
+			if (!sp)
+				return RET_PF_RETRY;
 		}

 		/*
@@ -742,6 +744,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		if (!is_shadow_present_pte(*it.sptep)) {
 			sp = kvm_mmu_get_page(vcpu, base_gfn, fault->addr,
 					      it.level - 1, true, direct_access);
+			BUG_ON(!sp);
+
 			link_shadow_page(vcpu, it.sptep, sp);
 			if (fault->huge_page_disallowed &&
 			    fault->req_level >= it.level)
--

  reply	other threads:[~2021-09-02 23:41 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24  7:55 [PATCH 0/7] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
2021-08-24  7:55 ` [PATCH 1/7] KVM: X86: Fix missed remote tlb flush in rmap_write_protect() Lai Jiangshan
2021-09-02 21:38   ` Sean Christopherson
2021-09-13  9:57   ` Maxim Levitsky
2021-08-24  7:55 ` [PATCH 2/7] KVM: X86: Synchronize the shadow pagetable before link it Lai Jiangshan
2021-09-02 23:40   ` Sean Christopherson [this message]
2021-09-02 23:54     ` Sean Christopherson
2021-09-03  0:44       ` Lai Jiangshan
2021-09-03 16:06         ` Sean Christopherson
2021-09-03 16:25           ` Lai Jiangshan
2021-09-03 16:40             ` Sean Christopherson
2021-09-03 17:00               ` Lai Jiangshan
2021-09-03 16:33           ` Lai Jiangshan
2021-09-03  0:51     ` Lai Jiangshan
2021-09-13 11:30     ` Maxim Levitsky
2021-09-13 20:49       ` Sean Christopherson
2021-09-13 22:31         ` Maxim Levitsky
2021-08-24  7:55 ` [PATCH 3/7] KVM: X86: Zap the invalid list after remote tlb flushing Lai Jiangshan
2021-09-02 21:54   ` Sean Christopherson
2021-08-24  7:55 ` [PATCH 4/7] KVM: X86: Remove FNAME(update_pte) Lai Jiangshan
2021-09-13  9:49   ` Maxim Levitsky
2021-08-24  7:55 ` [PATCH 5/7] KVM: X86: Don't unsync pagetables when speculative Lai Jiangshan
2021-09-13 11:02   ` Maxim Levitsky
2021-09-18  3:06     ` Lai Jiangshan
2021-08-24  7:55 ` [PATCH 6/7] KVM: X86: Don't check unsync if the original spte is writible Lai Jiangshan
2021-08-24  7:55 ` [PATCH 7/7] KVM: X86: Also prefetch the last range in __direct_pte_prefetch() Lai Jiangshan
2021-08-25 15:18   ` Sean Christopherson
2021-08-25 22:58     ` Lai Jiangshan
2021-08-31 18:02 ` [PATCH 0/7] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
2021-08-31 21:57   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTFhCt87vzo4xDrc@google.com \
    --to=seanjc@google.com \
    --cc=avi@redhat.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jiangshanlai@gmail.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=laijs@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --subject='Re: [PATCH 2/7] KVM: X86: Synchronize the shadow pagetable before link it' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).