LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Paul Mackerras <paulus@ozlabs.org>
Subject: [PATCH 4.14 61/62] KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler
Date: Mon, 14 May 2018 08:49:17 +0200	[thread overview]
Message-ID: <20180514064819.879286275@linuxfoundation.org> (raw)
In-Reply-To: <20180514064816.436958006@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paul Mackerras <paulus@ozlabs.org>

commit c3856aeb29402e94ad9b3879030165cc6a4fdc56 upstream.

This fixes several bugs in the radix page fault handler relating to
the way large pages in the memory backing the guest were handled.
First, the check for large pages only checked for explicit huge pages
and missed transparent huge pages.  Then the check that the addresses
(host virtual vs. guest physical) had appropriate alignment was
wrong, meaning that the code never put a large page in the partition
scoped radix tree; it was always demoted to a small page.

Fixing this exposed bugs in kvmppc_create_pte().  We were never
invalidating a 2MB PTE, which meant that if a page was initially
faulted in without write permission and the guest then attempted
to store to it, we would never update the PTE to have write permission.
If we find a valid 2MB PTE in the PMD, we need to clear it and
do a TLB invalidation before installing either the new 2MB PTE or
a pointer to a page table page.

This also corrects an assumption that get_user_pages_fast would set
the _PAGE_DIRTY bit if we are writing, which is not true.  Instead we
mark the page dirty explicitly with set_page_dirty_lock().  This
also means we don't need the dirty bit set on the host PTE when
providing write access on a read fault.

[paulus@ozlabs.org - use mark_pages_dirty instead of
 kvmppc_update_dirty_map]

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   72 +++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 26 deletions(-)

--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -19,6 +19,9 @@
 #include <asm/pgalloc.h>
 #include <asm/pte-walk.h>
 
+static void mark_pages_dirty(struct kvm *kvm, struct kvm_memory_slot *memslot,
+			     unsigned long gfn, unsigned int order);
+
 /*
  * Supported radix tree geometry.
  * Like p9, we support either 5 or 9 bits at the first (lowest) level,
@@ -195,6 +198,12 @@ static void kvmppc_pte_free(pte_t *ptep)
 	kmem_cache_free(kvm_pte_cache, ptep);
 }
 
+/* Like pmd_huge() and pmd_large(), but works regardless of config options */
+static inline int pmd_is_leaf(pmd_t pmd)
+{
+	return !!(pmd_val(pmd) & _PAGE_PTE);
+}
+
 static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
 			     unsigned int level, unsigned long mmu_seq)
 {
@@ -219,7 +228,7 @@ static int kvmppc_create_pte(struct kvm
 	else
 		new_pmd = pmd_alloc_one(kvm->mm, gpa);
 
-	if (level == 0 && !(pmd && pmd_present(*pmd)))
+	if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
 		new_ptep = kvmppc_pte_alloc();
 
 	/* Check if we might have been invalidated; let the guest retry if so */
@@ -244,12 +253,30 @@ static int kvmppc_create_pte(struct kvm
 		new_pmd = NULL;
 	}
 	pmd = pmd_offset(pud, gpa);
-	if (pmd_large(*pmd)) {
-		/* Someone else has instantiated a large page here; retry */
-		ret = -EAGAIN;
-		goto out_unlock;
-	}
-	if (level == 1 && !pmd_none(*pmd)) {
+	if (pmd_is_leaf(*pmd)) {
+		unsigned long lgpa = gpa & PMD_MASK;
+
+		/*
+		 * If we raced with another CPU which has just put
+		 * a 2MB pte in after we saw a pte page, try again.
+		 */
+		if (level == 0 && !new_ptep) {
+			ret = -EAGAIN;
+			goto out_unlock;
+		}
+		/* Valid 2MB page here already, remove it */
+		old = kvmppc_radix_update_pte(kvm, pmdp_ptep(pmd),
+					      ~0UL, 0, lgpa, PMD_SHIFT);
+		kvmppc_radix_tlbie_page(kvm, lgpa, PMD_SHIFT);
+		if (old & _PAGE_DIRTY) {
+			unsigned long gfn = lgpa >> PAGE_SHIFT;
+			struct kvm_memory_slot *memslot;
+			memslot = gfn_to_memslot(kvm, gfn);
+			if (memslot)
+				mark_pages_dirty(kvm, memslot, gfn,
+						 PMD_SHIFT - PAGE_SHIFT);
+		}
+	} else if (level == 1 && !pmd_none(*pmd)) {
 		/*
 		 * There's a page table page here, but we wanted
 		 * to install a large page.  Tell the caller and let
@@ -412,28 +439,24 @@ int kvmppc_book3s_radix_page_fault(struc
 	} else {
 		page = pages[0];
 		pfn = page_to_pfn(page);
-		if (PageHuge(page)) {
-			page = compound_head(page);
-			pte_size <<= compound_order(page);
+		if (PageCompound(page)) {
+			pte_size <<= compound_order(compound_head(page));
 			/* See if we can insert a 2MB large-page PTE here */
 			if (pte_size >= PMD_SIZE &&
-			    (gpa & PMD_MASK & PAGE_MASK) ==
-			    (hva & PMD_MASK & PAGE_MASK)) {
+			    (gpa & (PMD_SIZE - PAGE_SIZE)) ==
+			    (hva & (PMD_SIZE - PAGE_SIZE))) {
 				level = 1;
 				pfn &= ~((PMD_SIZE >> PAGE_SHIFT) - 1);
 			}
 		}
 		/* See if we can provide write access */
 		if (writing) {
-			/*
-			 * We assume gup_fast has set dirty on the host PTE.
-			 */
 			pgflags |= _PAGE_WRITE;
 		} else {
 			local_irq_save(flags);
 			ptep = find_current_mm_pte(current->mm->pgd,
 						   hva, NULL, NULL);
-			if (ptep && pte_write(*ptep) && pte_dirty(*ptep))
+			if (ptep && pte_write(*ptep))
 				pgflags |= _PAGE_WRITE;
 			local_irq_restore(flags);
 		}
@@ -459,18 +482,15 @@ int kvmppc_book3s_radix_page_fault(struc
 		pte = pfn_pte(pfn, __pgprot(pgflags));
 		ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq);
 	}
-	if (ret == 0 || ret == -EAGAIN)
-		ret = RESUME_GUEST;
 
 	if (page) {
-		/*
-		 * We drop pages[0] here, not page because page might
-		 * have been set to the head page of a compound, but
-		 * we have to drop the reference on the correct tail
-		 * page to match the get inside gup()
-		 */
-		put_page(pages[0]);
+		if (!ret && (pgflags & _PAGE_WRITE))
+			set_page_dirty_lock(page);
+		put_page(page);
 	}
+
+	if (ret == 0 || ret == -EAGAIN)
+		ret = RESUME_GUEST;
 	return ret;
 }
 
@@ -676,7 +696,7 @@ void kvmppc_free_radix(struct kvm *kvm)
 				continue;
 			pmd = pmd_offset(pud, 0);
 			for (im = 0; im < PTRS_PER_PMD; ++im, ++pmd) {
-				if (pmd_huge(*pmd)) {
+				if (pmd_is_leaf(*pmd)) {
 					pmd_clear(pmd);
 					continue;
 				}

  parent reply	other threads:[~2018-05-14  6:49 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-14  6:48 [PATCH 4.14 00/62] 4.14.41-stable review Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 01/62] ipvs: fix rtnl_lock lockups caused by start_sync_thread Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 02/62] netfilter: ebtables: dont attempt to allocate 0-sized compat array Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 03/62] kcm: Call strp_stop before strp_done in kcm_attach Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 04/62] crypto: af_alg - fix possible uninit-value in alg_bind() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 05/62] netlink: fix uninit-value in netlink_sendmsg Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 06/62] net: fix rtnh_ok() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 07/62] net: initialize skb->peeked when cloning Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 08/62] net: fix uninit-value in __hw_addr_add_ex() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 09/62] dccp: initialize ireq->ir_mark Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 10/62] ipv4: fix uninit-value in ip_route_output_key_hash_rcu() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 11/62] soreuseport: initialise timewait reuseport field Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 12/62] inetpeer: fix uninit-value in inet_getpeer Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 13/62] memcg: fix per_node_info cleanup Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 14/62] perf: Remove superfluous allocation error check Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 15/62] tcp: fix TCP_REPAIR_QUEUE bound checking Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 16/62] bdi: wake up concurrent wb_shutdown() callers Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 17/62] bdi: Fix oops in wb_workfn() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 18/62] KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 19/62] KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 20/62] KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 21/62] arm64: Add work around for Arm Cortex-A55 Erratum 1024718 Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 22/62] compat: fix 4-byte infoleak via uninitialized struct field Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 23/62] gpioib: do not free unrequested descriptors Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 24/62] gpio: fix aspeed_gpio unmask irq Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 25/62] gpio: fix error path in lineevent_create Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 26/62] rfkill: gpio: fix memory leak in probe error path Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 27/62] libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 28/62] dm integrity: use kvfree for kvmallocd memory Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 29/62] tracing: Fix regex_match_front() to not over compare the test string Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 30/62] z3fold: fix reclaim lock-ups Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 31/62] mm: sections are not offlined during memory hotremove Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 32/62] mm, oom: fix concurrent munlock and oom reaper unmap, v3 Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 33/62] ceph: fix rsize/wsize capping in ceph_direct_read_write() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 34/62] can: kvaser_usb: Increase correct stats counter in kvaser_usb_rx_can_msg() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 35/62] can: hi311x: Acquire SPI lock on ->do_get_berr_counter Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 36/62] can: hi311x: Work around TX complete interrupt erratum Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 37/62] drm/vc4: Fix scaling of uni-planar formats Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 38/62] drm/i915: Fix drm:intel_enable_lvds ERROR message in kernel log Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 39/62] drm/nouveau: Fix deadlock in nv50_mstm_register_connector() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 40/62] drm/atomic: Clean old_state/new_state in drm_atomic_state_default_clear() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 41/62] drm/atomic: Clean private obj " Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 42/62] net: atm: Fix potential Spectre v1 Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 43/62] atm: zatm: " Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 44/62] PCI / PM: Always check PME wakeup capability for runtime wakeup support Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 45/62] PCI / PM: Check device_may_wakeup() in pci_enable_wake() Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 46/62] cpufreq: schedutil: Avoid using invalid next_freq Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 47/62] Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174" Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 48/62] Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 49/62] Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets Greg Kroah-Hartman
2018-05-15 17:07   ` Brian Norris
2018-05-15 19:42     ` Guenter Roeck
2018-05-16  7:57       ` gregkh
2018-05-14  6:49 ` [PATCH 4.14 50/62] thermal: exynos: Reading temperature makes sense only when TMU is turned on Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 51/62] thermal: exynos: Propagate error value from tmu_read() Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 52/62] nvme: add quirk to force medium priority for SQ creation Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 53/62] smb3: directory sync should not return an error Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 54/62] sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 55/62] tracing/uprobe_event: Fix strncpy corner case Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 56/62] perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_* Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 57/62] perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 58/62] perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 59/62] perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[] Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 60/62] perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map() Greg Kroah-Hartman
2018-05-14  6:49 ` Greg Kroah-Hartman [this message]
2018-05-14  6:49 ` [PATCH 4.14 62/62] KVM: x86: remove APIC Timer periodic/oneshot spikes Greg Kroah-Hartman
2018-05-14  8:28 ` [PATCH 4.14 00/62] 4.14.41-stable review Nathan Chancellor
2018-05-14 12:45 ` kernelci.org bot
2018-05-14 16:21 ` Guenter Roeck
2018-05-14 22:02 ` Shuah Khan
2018-05-15  5:36 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180514064819.879286275@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulus@ozlabs.org \
    --cc=stable@vger.kernel.org \
    --subject='Re: [PATCH 4.14 61/62] KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).