LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Christoph Lameter <clameter@sgi.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Robin Holt <holt@sgi.com>, Avi Kivity <avi@qumranet.com>,
Izik Eidus <izike@qumranet.com>, Nick Piggin <npiggin@suse.de>,
kvm-devel@lists.sourceforge.net,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
steiner@sgi.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, daniel.blueman@quadrics.com,
Hugh Dickins <hugh@veritas.com>
Subject: Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
Date: Tue, 29 Jan 2008 13:53:05 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0801291343530.26824@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <20080129213604.GW7233@v2.random>
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
> > We invalidate the range *after* populating it? Isnt it okay to establish
> > references while populate_range() runs?
>
> It's not ok because that function can very well overwrite existing and
> present ptes (it's actually the nonlinear common case fast path for
> db). With your code the sptes created between invalidate_range and
> populate_range, will keep pointing forever to the old physical page
> instead of the newly populated one.
Seems though that the mmap_sem is taken for regular vmas writably and will
hold off new mappings.
> I'm also asking myself if it's a smp race not to call
> mmu_notifier(invalidate_page) between ptep_clear_flush and set_pte_at
> in install_file_pte. Probably not because the guest VM running in a
> different thread would need to serialize outside the install_file_pte
> code with the task running install_file_pte, if it wants to be sure to
> write either all its data to the old or the new page. Certainly doing
> the invalidate_page inside the PT lock was obviously safe but I hope
> this is safe and this can accommodate your needs too.
But that would be doing two invalidates on one pte. One range and one page
invalidate.
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -1639,8 +1639,6 @@ gotten:
> > > /*
> > > * Re-check the pte - we dropped the lock
> > > */
> > > - mmu_notifier(invalidate_range, mm, address,
> > > - address + PAGE_SIZE - 1, 0);
> > > page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> > > if (likely(pte_same(*page_table, orig_pte))) {
> > > if (old_page) {
> >
> > What we did is to invalidate the page (?!) before taking the pte lock. In
> > the lock we replace the pte to point to another page. This means that we
> > need to clear stale information. So we zap it before. If another reference
> > is established after taking the spinlock then the pte contents have
> > changed at the cirtical section fails.
> >
> > Before the critical section starts we have gotten an extra refcount on the
> > original page so the page cannot vanish from under us.
>
> The problem is the missing invalidate_page/range _after_
> ptep_clear_flush. If a spte is built between invalidate_range and
> pte_offset_map_lock, it will remain pointing to the old page
> forever. Nothing will be called to invalidate that stale spte built
> between invalidate_page/range and ptep_clear_flush. This is why for
> the last few days I kept saying the mmu notifiers have to be invoked
> _after_ ptep_clear_flush and never before (remember the export
> notifier?). No idea how you can deal with this in your code, certainly
> for KVM sptes that's backwards and unworkable ordering of operation
> (exactly as backwards are doing the tlb flush before pte_clear in
> ptep_clear_flush, think spte as a tlb, you can't flush the tlb before
> clearing/updating the pte or it's smp unsafe).
Hmmm... So we could only do an invalidate_page here? Drop the strange
invalidate_range()?
>
> > > @@ -1676,6 +1674,8 @@ gotten:
> > > page_cache_release(old_page);
> > > unlock:
> > > pte_unmap_unlock(page_table, ptl);
> > > + mmu_notifier(invalidate_range, mm, address,
> > > + address + PAGE_SIZE - 1, 0);
> > > if (dirty_page) {
> > > if (vma->vm_file)
> > > file_update_time(vma->vm_file);
> >
> > Now we invalidate the page after the transaction is complete. This means
> > external pte can persist while we change the pte? Possibly even dirty the
> > page?
>
> Yes, and the only reason this can be safe is for the reason explained
> at the top of the email, if the other cpu wants to serialize to be
> sure to write in the "new" page, it has to serialize with the
> page-fault but to serialize it has to wait the page fault to return
> (example: we're not going to call futex code until the page fault
> returns).
Serialize how? mmap_sem?
next prev parent reply other threads:[~2008-01-29 21:53 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-28 20:28 [patch 0/6] [RFC] MMU Notifiers V2 Christoph Lameter
2008-01-28 20:28 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
2008-01-28 22:06 ` Christoph Lameter
2008-01-29 0:05 ` Robin Holt
2008-01-29 1:19 ` Christoph Lameter
2008-01-29 13:59 ` Andrea Arcangeli
2008-01-29 14:34 ` Andrea Arcangeli
2008-01-29 19:49 ` Christoph Lameter
2008-01-29 20:41 ` Avi Kivity
2008-01-29 16:07 ` Robin Holt
2008-02-05 18:05 ` Andy Whitcroft
2008-02-05 18:17 ` Peter Zijlstra
2008-02-05 18:19 ` Christoph Lameter
2008-01-28 20:28 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-01-29 16:20 ` Andrea Arcangeli
2008-01-29 18:28 ` Andrea Arcangeli
2008-01-29 20:30 ` Christoph Lameter
2008-01-29 21:36 ` Andrea Arcangeli
2008-01-29 21:53 ` Christoph Lameter [this message]
2008-01-29 22:35 ` Andrea Arcangeli
2008-01-29 22:55 ` Christoph Lameter
2008-01-29 23:43 ` Andrea Arcangeli
2008-01-30 0:34 ` Christoph Lameter
2008-01-29 19:55 ` Christoph Lameter
2008-01-29 21:17 ` Andrea Arcangeli
2008-01-29 21:35 ` Christoph Lameter
2008-01-29 22:02 ` Andrea Arcangeli
2008-01-29 22:39 ` Christoph Lameter
2008-01-30 0:00 ` Andrea Arcangeli
2008-01-30 0:05 ` Andrea Arcangeli
2008-01-30 0:22 ` Christoph Lameter
2008-01-30 0:59 ` Andrea Arcangeli
2008-01-30 8:26 ` Peter Zijlstra
2008-01-30 0:20 ` Christoph Lameter
2008-01-30 0:28 ` Jack Steiner
2008-01-30 0:35 ` Christoph Lameter
2008-01-30 13:37 ` Andrea Arcangeli
2008-01-30 14:43 ` Jack Steiner
2008-01-30 19:41 ` Christoph Lameter
2008-01-30 20:29 ` Jack Steiner
2008-01-30 20:55 ` Christoph Lameter
2008-01-30 16:11 ` Robin Holt
2008-01-30 17:04 ` Andrea Arcangeli
2008-01-30 17:30 ` Robin Holt
2008-01-30 18:25 ` Andrea Arcangeli
2008-01-30 19:50 ` Christoph Lameter
2008-01-30 22:18 ` Robin Holt
2008-01-30 23:52 ` Andrea Arcangeli
2008-01-31 0:01 ` Christoph Lameter
2008-01-31 0:34 ` [kvm-devel] " Andrea Arcangeli
2008-01-31 1:46 ` Christoph Lameter
2008-01-31 2:34 ` Robin Holt
2008-01-31 2:37 ` Christoph Lameter
2008-01-31 2:56 ` [kvm-devel] mmu_notifier: invalidate_range_start with lock=1 Christoph Lameter
2008-01-31 10:52 ` [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Andrea Arcangeli
2008-01-31 2:08 ` Christoph Lameter
2008-01-31 2:42 ` Andrea Arcangeli
2008-01-31 2:51 ` Christoph Lameter
2008-01-31 13:39 ` Andrea Arcangeli
2008-01-30 19:35 ` Christoph Lameter
2008-01-28 20:28 ` [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap Christoph Lameter
2008-01-29 16:28 ` Robin Holt
2008-01-28 20:28 ` [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps Christoph Lameter
2008-01-29 14:03 ` Andrea Arcangeli
2008-01-29 14:24 ` Andrea Arcangeli
2008-01-29 19:51 ` Christoph Lameter
2008-01-28 20:28 ` [patch 5/6] mmu_notifier: Callbacks for xip_filemap.c Christoph Lameter
2008-01-28 20:28 ` [patch 6/6] mmu_notifier: Add invalidate_all() Christoph Lameter
2008-01-29 16:31 ` Robin Holt
2008-01-29 20:02 ` Christoph Lameter
2008-01-30 2:29 [patch 0/6] [RFC] MMU Notifiers V3 Christoph Lameter
2008-01-30 2:29 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-02-08 22:06 [patch 0/6] MMU Notifiers V6 Christoph Lameter
2008-02-08 22:06 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-02-15 6:48 [patch 0/6] MMU Notifiers V7 Christoph Lameter
2008-02-15 6:49 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-02-16 3:37 ` Andrew Morton
2008-02-16 19:26 ` Christoph Lameter
2008-02-19 8:54 ` Nick Piggin
2008-02-19 13:34 ` Andrea Arcangeli
2008-02-27 22:23 ` Christoph Lameter
2008-02-27 23:57 ` Andrea Arcangeli
2008-02-19 23:08 ` Nick Piggin
2008-02-20 1:00 ` Andrea Arcangeli
2008-02-20 3:00 ` Robin Holt
2008-02-20 3:11 ` Nick Piggin
2008-02-20 3:19 ` Robin Holt
2008-02-27 22:39 ` Christoph Lameter
2008-02-28 0:38 ` Andrea Arcangeli
2008-02-27 22:35 ` Christoph Lameter
2008-02-27 22:42 ` Jack Steiner
2008-02-28 0:10 ` Christoph Lameter
2008-02-28 0:11 ` Andrea Arcangeli
2008-02-28 0:14 ` Christoph Lameter
2008-02-28 0:52 ` Andrea Arcangeli
2008-02-28 1:03 ` Christoph Lameter
2008-02-28 1:10 ` Andrea Arcangeli
2008-02-28 18:43 ` Christoph Lameter
2008-02-29 0:55 ` Andrea Arcangeli
2008-02-29 0:59 ` Christoph Lameter
2008-02-29 13:13 ` Andrea Arcangeli
2008-02-29 19:55 ` Christoph Lameter
2008-02-29 20:17 ` Andrea Arcangeli
2008-02-29 21:03 ` Christoph Lameter
2008-02-29 21:23 ` Andrea Arcangeli
2008-02-29 21:29 ` Christoph Lameter
2008-02-29 21:34 ` Christoph Lameter
2008-02-29 21:48 ` Andrea Arcangeli
2008-02-29 22:12 ` Christoph Lameter
2008-02-29 22:41 ` Andrea Arcangeli
2008-02-28 10:53 ` Robin Holt
2008-03-03 5:11 ` Nick Piggin
2008-03-03 19:28 ` Christoph Lameter
2008-03-03 19:50 ` Nick Piggin
2008-03-04 18:58 ` Christoph Lameter
2008-03-05 0:52 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0801291343530.26824@schroedinger.engr.sgi.com \
--to=clameter@sgi.com \
--cc=a.p.zijlstra@chello.nl \
--cc=andrea@qumranet.com \
--cc=avi@qumranet.com \
--cc=benh@kernel.crashing.org \
--cc=daniel.blueman@quadrics.com \
--cc=holt@sgi.com \
--cc=hugh@veritas.com \
--cc=izike@qumranet.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=steiner@sgi.com \
--subject='Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).