LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: <gregkh@suse.de>
To: jeremy@goop.org, ak@muc.de, akpm@linux-foundation.org,
	chrisw@sous-sol.org, gregkh@suse.de, jeremy@xensource.com,
	keir@xensource.com, linux-kernel@vger.kernel.org,
	stable@kernel.org
Cc: <stable@kernel.org>, <stable-commits@vger.kernel.org>
Subject: patch xen-handle-lazy-cr3-on-unpin.patch queued to -stable tree
Date: Tue, 13 Nov 2007 14:50:37 -0800	[thread overview]
Message-ID: <20071113225049.2776414543E5@imap.suse.de> (raw)
In-Reply-To: <20071012211148.208637000@goop.org>


This is a note to let you know that we have just queued up the patch titled

     Subject: xen: deal with stale cr3 values when unpinning pagetables

to the 2.6.23-stable tree.  Its filename is

     xen-handle-lazy-cr3-on-unpin.patch

A git repo of this tree can be found at 
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary


>From stable-bounces@linux.kernel.org Fri Oct 12 14:33:57 2007
From: Jeremy Fitzhardinge <jeremy@goop.org>
Date: Fri, 12 Oct 2007 14:11:37 -0700
Subject: xen: deal with stale cr3 values when unpinning pagetables
To: LKML <linux-kernel@vger.kernel.org>
Cc: xen-devel@lists.xensource.com, virtualization@lists.osdl.org, Chris Wright <chrisw@sous-sol.org>, Andi Kleen <ak@muc.de>, Andrew Morton <akpm@linux-foundation.org>, Keir Fraser <keir@xensource.com>, Stable Kernel <stable@kernel.org>
Message-ID: <20071012211148.208637000@goop.org>
Content-Disposition: inline; filename=xen-handle-lazy-cr3-on-unpin.patch

From: Jeremy Fitzhardinge <jeremy@goop.org>

patch 9f79991d4186089e228274196413572cc000143b in mainline.

When a pagetable is no longer in use, it must be unpinned so that its
pages can be freed.  However, this is only possible if there are no
stray uses of the pagetable.  The code currently deals with all the
usual cases, but there's a rare case where a vcpu is changing cr3, but
is doing so lazily, and the change hasn't actually happened by the time
the pagetable is unpinned, even though it appears to have been completed.

This change adds a second per-cpu cr3 variable - xen_current_cr3 -
which tracks the actual state of the vcpu cr3.  It is only updated once
the actual hypercall to set cr3 has been completed.  Other processors
wishing to unpin a pagetable can check other vcpu's xen_current_cr3
values to see if any cross-cpu IPIs are needed to clean things up.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/i386/xen/enlighten.c |   55 +++++++++++++++++++++++++++++++---------------
 arch/i386/xen/mmu.c       |   29 +++++++++++++++++++++---
 arch/i386/xen/xen-ops.h   |    1 
 3 files changed, 65 insertions(+), 20 deletions(-)

--- a/arch/i386/xen/enlighten.c
+++ b/arch/i386/xen/enlighten.c
@@ -56,7 +56,23 @@ DEFINE_PER_CPU(enum paravirt_lazy_mode, 
 
 DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu);
 DEFINE_PER_CPU(struct vcpu_info, xen_vcpu_info);
-DEFINE_PER_CPU(unsigned long, xen_cr3);
+
+/*
+ * Note about cr3 (pagetable base) values:
+ *
+ * xen_cr3 contains the current logical cr3 value; it contains the
+ * last set cr3.  This may not be the current effective cr3, because
+ * its update may be being lazily deferred.  However, a vcpu looking
+ * at its own cr3 can use this value knowing that it everything will
+ * be self-consistent.
+ *
+ * xen_current_cr3 contains the actual vcpu cr3; it is set once the
+ * hypercall to set the vcpu cr3 is complete (so it may be a little
+ * out of date, but it will never be set early).  If one vcpu is
+ * looking at another vcpu's cr3 value, it should use this variable.
+ */
+DEFINE_PER_CPU(unsigned long, xen_cr3);	 /* cr3 stored as physaddr */
+DEFINE_PER_CPU(unsigned long, xen_current_cr3);	 /* actual vcpu cr3 */
 
 struct start_info *xen_start_info;
 EXPORT_SYMBOL_GPL(xen_start_info);
@@ -632,32 +648,36 @@ static unsigned long xen_read_cr3(void)
 	return x86_read_percpu(xen_cr3);
 }
 
+static void set_current_cr3(void *v)
+{
+	x86_write_percpu(xen_current_cr3, (unsigned long)v);
+}
+
 static void xen_write_cr3(unsigned long cr3)
 {
+	struct mmuext_op *op;
+	struct multicall_space mcs;
+	unsigned long mfn = pfn_to_mfn(PFN_DOWN(cr3));
+
 	BUG_ON(preemptible());
 
-	if (cr3 == x86_read_percpu(xen_cr3)) {
-		/* just a simple tlb flush */
-		xen_flush_tlb();
-		return;
-	}
+	mcs = xen_mc_entry(sizeof(*op));  /* disables interrupts */
 
+	/* Update while interrupts are disabled, so its atomic with
+	   respect to ipis */
 	x86_write_percpu(xen_cr3, cr3);
 
+	op = mcs.args;
+	op->cmd = MMUEXT_NEW_BASEPTR;
+	op->arg1.mfn = mfn;
 
-	{
-		struct mmuext_op *op;
-		struct multicall_space mcs = xen_mc_entry(sizeof(*op));
-		unsigned long mfn = pfn_to_mfn(PFN_DOWN(cr3));
-
-		op = mcs.args;
-		op->cmd = MMUEXT_NEW_BASEPTR;
-		op->arg1.mfn = mfn;
+	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
 
-		MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
+	/* Update xen_update_cr3 once the batch has actually
+	   been submitted. */
+	xen_mc_callback(set_current_cr3, (void *)cr3);
 
-		xen_mc_issue(PARAVIRT_LAZY_CPU);
-	}
+	xen_mc_issue(PARAVIRT_LAZY_CPU);  /* interrupts restored */
 }
 
 /* Early in boot, while setting up the initial pagetable, assume
@@ -1113,6 +1133,7 @@ asmlinkage void __init xen_start_kernel(
 	/* keep using Xen gdt for now; no urgent need to change it */
 
 	x86_write_percpu(xen_cr3, __pa(pgd));
+	x86_write_percpu(xen_current_cr3, __pa(pgd));
 
 #ifdef CONFIG_SMP
 	/* Don't do the full vcpu_info placement stuff until we have a
--- a/arch/i386/xen/mmu.c
+++ b/arch/i386/xen/mmu.c
@@ -515,20 +515,43 @@ static void drop_other_mm_ref(void *info
 
 	if (__get_cpu_var(cpu_tlbstate).active_mm == mm)
 		leave_mm(smp_processor_id());
+
+	/* If this cpu still has a stale cr3 reference, then make sure
+	   it has been flushed. */
+	if (x86_read_percpu(xen_current_cr3) == __pa(mm->pgd)) {
+		load_cr3(swapper_pg_dir);
+		arch_flush_lazy_cpu_mode();
+	}
 }
 
 static void drop_mm_ref(struct mm_struct *mm)
 {
+	cpumask_t mask;
+	unsigned cpu;
+
 	if (current->active_mm == mm) {
 		if (current->mm == mm)
 			load_cr3(swapper_pg_dir);
 		else
 			leave_mm(smp_processor_id());
+		arch_flush_lazy_cpu_mode();
+	}
+
+	/* Get the "official" set of cpus referring to our pagetable. */
+	mask = mm->cpu_vm_mask;
+
+	/* It's possible that a vcpu may have a stale reference to our
+	   cr3, because its in lazy mode, and it hasn't yet flushed
+	   its set of pending hypercalls yet.  In this case, we can
+	   look at its actual current cr3 value, and force it to flush
+	   if needed. */
+	for_each_online_cpu(cpu) {
+		if (per_cpu(xen_current_cr3, cpu) == __pa(mm->pgd))
+			cpu_set(cpu, mask);
 	}
 
-	if (!cpus_empty(mm->cpu_vm_mask))
-		xen_smp_call_function_mask(mm->cpu_vm_mask, drop_other_mm_ref,
-					   mm, 1);
+	if (!cpus_empty(mask))
+		xen_smp_call_function_mask(mask, drop_other_mm_ref, mm, 1);
 }
 #else
 static void drop_mm_ref(struct mm_struct *mm)
--- a/arch/i386/xen/xen-ops.h
+++ b/arch/i386/xen/xen-ops.h
@@ -11,6 +11,7 @@ void xen_copy_trap_info(struct trap_info
 
 DECLARE_PER_CPU(struct vcpu_info *, xen_vcpu);
 DECLARE_PER_CPU(unsigned long, xen_cr3);
+DECLARE_PER_CPU(unsigned long, xen_current_cr3);
 
 extern struct start_info *xen_start_info;
 extern struct shared_info *HYPERVISOR_shared_info;


Patches currently in stable-queue which might be from jeremy@goop.org are

queue-2.6.23/xen-handle-lazy-cr3-on-unpin.patch
queue-2.6.23/xen-multicall-callbacks.patch
queue-2.6.23/xen-fix-register_vcpu_info.patch
queue-2.6.23/xen-xfs-unmap.patch

  reply	other threads:[~2007-11-13 22:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-12 21:11 [PATCH 00/10] REVIEW: Xen patches for 2.6.24 Jeremy Fitzhardinge
2007-10-12 21:11 ` [PATCH 01/10] remove dead code in pgtable_cache_init Jeremy Fitzhardinge
2007-10-12 21:11 ` [PATCH 02/10] Clean up duplicate includes in arch/i386/xen/ Jeremy Fitzhardinge
2007-10-12 21:11 ` [PATCH 03/10] xen: yield to IPI target if necessary Jeremy Fitzhardinge
2007-10-12 21:11 ` [PATCH 04/10] xen: add batch completion callbacks Jeremy Fitzhardinge
2007-11-13 22:50   ` patch xen-multicall-callbacks.patch queued to -stable tree gregkh
2007-10-12 21:11 ` [PATCH 05/10] xen: deal with stale cr3 values when unpinning pagetables Jeremy Fitzhardinge
2007-11-13 22:50   ` gregkh [this message]
2007-10-12 21:11 ` [PATCH 06/10] xen: lock pte pages while pinning/unpinning Jeremy Fitzhardinge
2007-10-12 21:11 ` [PATCH 07/10] xen: ask the hypervisor how much space it needs reserved Jeremy Fitzhardinge
2007-10-12 21:11 ` [PATCH 08/10] xen: fix incorrect vcpu_register_vcpu_info hypercall argument Jeremy Fitzhardinge
2007-11-13 22:50   ` patch xen-fix-register_vcpu_info.patch queued to -stable tree gregkh
2007-10-12 21:11 ` [PATCH 09/10] xen: add some debug output for failed multicalls Jeremy Fitzhardinge
2007-10-12 21:11 ` [PATCH 10/10] xfs: eagerly remove vmap mappings to avoid upsetting Xen Jeremy Fitzhardinge
2007-11-13 22:50   ` patch xen-xfs-unmap.patch queued to -stable tree gregkh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071113225049.2776414543E5@imap.suse.de \
    --to=gregkh@suse.de \
    --cc=ak@muc.de \
    --cc=akpm@linux-foundation.org \
    --cc=chrisw@sous-sol.org \
    --cc=jeremy@goop.org \
    --cc=jeremy@xensource.com \
    --cc=keir@xensource.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable-commits@vger.kernel.org \
    --cc=stable@kernel.org \
    --subject='Re: patch xen-handle-lazy-cr3-on-unpin.patch queued to -stable tree' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).