LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@qumranet.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Robin Holt <holt@sgi.com>, Avi Kivity <avi@qumranet.com>,
	Izik Eidus <izike@qumranet.com>, Nick Piggin <npiggin@suse.de>,
	kvm-devel@lists.sourceforge.net,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	steiner@sgi.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, daniel.blueman@quadrics.com,
	Hugh Dickins <hugh@veritas.com>
Subject: Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
Date: Tue, 29 Jan 2008 23:02:12 +0100	[thread overview]
Message-ID: <20080129220212.GX7233@v2.random> (raw)
In-Reply-To: <Pine.LNX.4.64.0801291327330.26649@schroedinger.engr.sgi.com>

On Tue, Jan 29, 2008 at 01:35:58PM -0800, Christoph Lameter wrote:
> On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
> 
> > > It seems to be okay to invalidate range if you hold mmap_sem writably. In 
> > > that case no additional faults can happen that would create new ptes.
> > 
> > In that place the mmap_sem is taken but in readonly mode. I never rely
> > on the mmap_sem in the mmu notifier methods. Not invoking the notifier
> 
> Well it seems that we have to rely on mmap_sem otherwise concurrent faults 
> can occur. The mmap_sem seems to be acquired for write there.
      	     	 	  	      	       	   	 ^^^^^
> 
>               if (!has_write_lock) {
>                         up_read(&mm->mmap_sem);
>                         down_write(&mm->mmap_sem);
>                         has_write_lock = 1;
>                         goto retry;
>                 }


hmm, "there" where? When I said it was taken in readonly mode I meant
for the quoted code (it would be at the top if it wasn't cut), so I
quote below again:

> > +   mmu_notifier(invalidate_range, mm, address,
> > +                           address + PAGE_SIZE - 1, 0);
> >     page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> >     if (likely(pte_same(*page_table, orig_pte))) {
> >             if (old_page) {

The "there" for me was do_wp_page.

Even for the code you quoted in freemap.c, the has_write_lock is set
to 1 _only_ for the very first time you call sys_remap_file_pages on a
VMA. Only the transition of the VMA between linear to nonlinear
requires the mmap in write mode. So you can be sure all freemap code
99% of the time is populating (overwriting) already present ptes with
only the mmap_sem in readonly mode like do_wp_page. It would be
unnecessary to populate the nonlinear range with the mmap in write
mode. Only the "vma" mangling requires the mmap_sem in write mode, the
pte modifications only requires the PT_lock + mmap_sem in read mode.

Effectively the first invocation of populate_range runs with the
mmap_sem in write mode, I wonder why, there seem to be no good reason
for that. I guess it's a bit that should be optimized, by calling
downgrade_write before calling populate_range even for the first time
the vma switches from linear to nonlinear (after the vma has been
fully updated to the new status). But for sure all later invocations
runs populate_range with the semaphore readonly like the rest of the
VM does when instantiating ptes in the page faults.

> > before releasing the PT lock adds quite some uncertainty on the smp
> > safety of the spte invalidates, because the pte may be unmapped and
> > remapped by a minor fault before invalidate_range is invoked, but I
> > didn't figure out a kernel crashing race yet thanks to the pin we take
> > through get_user_pages (and only thanks to it). The requirement is
> > that invalidate_range is invoked after the last ptep_clear_flush or it
> > leaks pins that's why I had to move it at the end.
>  
> So "pins" means a reference count right? I still do not get why you 

Yes.

> have refcount problems. You take a refcount when you export the page 
> through KVM and then drop the refcount in invalidate page right?

Yes.

> So you walk through the KVM ptes and drop the refcount for each spte you 
> encounter?

Yes.

All pins are gone by the time invalidate_page/range returns. But there
is no critical section between invalidate_page and the _later_
ptep_clear_flush. So get_user_pages is free to run and take the PT
lock before the ptep_clear_flush, find the linux pte still
instantiated, and to create a new spte, before ptep_clear_flush runs.

Think of why the tlb flushes are being called at the end of
ptep_clear_flush. The mmu notifier invalidate has to be called after
for the exact same reason.

Perhaps somebody else should explain this, I started exposing this
smp race the moment after I've seen the backwards ordering being
proposed in export-notifier-v1, sorry if I'm not clear enough.

  reply	other threads:[~2008-01-29 22:02 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-28 20:28 [patch 0/6] [RFC] MMU Notifiers V2 Christoph Lameter
2008-01-28 20:28 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
2008-01-28 22:06   ` Christoph Lameter
2008-01-29  0:05   ` Robin Holt
2008-01-29  1:19     ` Christoph Lameter
2008-01-29 13:59   ` Andrea Arcangeli
2008-01-29 14:34     ` Andrea Arcangeli
2008-01-29 19:49     ` Christoph Lameter
2008-01-29 20:41       ` Avi Kivity
2008-01-29 16:07   ` Robin Holt
2008-02-05 18:05   ` Andy Whitcroft
2008-02-05 18:17     ` Peter Zijlstra
2008-02-05 18:19     ` Christoph Lameter
2008-01-28 20:28 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-01-29 16:20   ` Andrea Arcangeli
2008-01-29 18:28     ` Andrea Arcangeli
2008-01-29 20:30       ` Christoph Lameter
2008-01-29 21:36         ` Andrea Arcangeli
2008-01-29 21:53           ` Christoph Lameter
2008-01-29 22:35             ` Andrea Arcangeli
2008-01-29 22:55               ` Christoph Lameter
2008-01-29 23:43                 ` Andrea Arcangeli
2008-01-30  0:34                   ` Christoph Lameter
2008-01-29 19:55     ` Christoph Lameter
2008-01-29 21:17       ` Andrea Arcangeli
2008-01-29 21:35         ` Christoph Lameter
2008-01-29 22:02           ` Andrea Arcangeli [this message]
2008-01-29 22:39             ` Christoph Lameter
2008-01-30  0:00               ` Andrea Arcangeli
2008-01-30  0:05                 ` Andrea Arcangeli
2008-01-30  0:22                   ` Christoph Lameter
2008-01-30  0:59                     ` Andrea Arcangeli
2008-01-30  8:26                       ` Peter Zijlstra
2008-01-30  0:20                 ` Christoph Lameter
2008-01-30  0:28                   ` Jack Steiner
2008-01-30  0:35                     ` Christoph Lameter
2008-01-30 13:37                     ` Andrea Arcangeli
2008-01-30 14:43                       ` Jack Steiner
2008-01-30 19:41                         ` Christoph Lameter
2008-01-30 20:29                           ` Jack Steiner
2008-01-30 20:55                             ` Christoph Lameter
2008-01-30 16:11                 ` Robin Holt
2008-01-30 17:04                   ` Andrea Arcangeli
2008-01-30 17:30                     ` Robin Holt
2008-01-30 18:25                       ` Andrea Arcangeli
2008-01-30 19:50                         ` Christoph Lameter
2008-01-30 22:18                           ` Robin Holt
2008-01-30 23:52                           ` Andrea Arcangeli
2008-01-31  0:01                             ` Christoph Lameter
2008-01-31  0:34                               ` [kvm-devel] " Andrea Arcangeli
2008-01-31  1:46                                 ` Christoph Lameter
2008-01-31  2:34                                   ` Robin Holt
2008-01-31  2:37                                     ` Christoph Lameter
2008-01-31  2:56                                     ` [kvm-devel] mmu_notifier: invalidate_range_start with lock=1 Christoph Lameter
2008-01-31 10:52                                   ` [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Andrea Arcangeli
2008-01-31  2:08                                 ` Christoph Lameter
2008-01-31  2:42                                   ` Andrea Arcangeli
2008-01-31  2:51                                     ` Christoph Lameter
2008-01-31 13:39                                       ` Andrea Arcangeli
2008-01-30 19:35                   ` Christoph Lameter
2008-01-28 20:28 ` [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap Christoph Lameter
2008-01-29 16:28   ` Robin Holt
2008-01-28 20:28 ` [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps Christoph Lameter
2008-01-29 14:03   ` Andrea Arcangeli
2008-01-29 14:24     ` Andrea Arcangeli
2008-01-29 19:51       ` Christoph Lameter
2008-01-28 20:28 ` [patch 5/6] mmu_notifier: Callbacks for xip_filemap.c Christoph Lameter
2008-01-28 20:28 ` [patch 6/6] mmu_notifier: Add invalidate_all() Christoph Lameter
2008-01-29 16:31   ` Robin Holt
2008-01-29 20:02     ` Christoph Lameter
2008-01-30  2:29 [patch 0/6] [RFC] MMU Notifiers V3 Christoph Lameter
2008-01-30  2:29 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-02-08 22:06 [patch 0/6] MMU Notifiers V6 Christoph Lameter
2008-02-08 22:06 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-02-15  6:48 [patch 0/6] MMU Notifiers V7 Christoph Lameter
2008-02-15  6:49 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-02-16  3:37   ` Andrew Morton
2008-02-16 19:26     ` Christoph Lameter
2008-02-19  8:54   ` Nick Piggin
2008-02-19 13:34     ` Andrea Arcangeli
2008-02-27 22:23       ` Christoph Lameter
2008-02-27 23:57         ` Andrea Arcangeli
2008-02-19 23:08   ` Nick Piggin
2008-02-20  1:00     ` Andrea Arcangeli
2008-02-20  3:00       ` Robin Holt
2008-02-20  3:11         ` Nick Piggin
2008-02-20  3:19           ` Robin Holt
2008-02-27 22:39       ` Christoph Lameter
2008-02-28  0:38         ` Andrea Arcangeli
2008-02-27 22:35     ` Christoph Lameter
2008-02-27 22:42       ` Jack Steiner
2008-02-28  0:10       ` Christoph Lameter
2008-02-28  0:11       ` Andrea Arcangeli
2008-02-28  0:14         ` Christoph Lameter
2008-02-28  0:52           ` Andrea Arcangeli
2008-02-28  1:03             ` Christoph Lameter
2008-02-28  1:10               ` Andrea Arcangeli
2008-02-28 18:43                 ` Christoph Lameter
2008-02-29  0:55                   ` Andrea Arcangeli
2008-02-29  0:59                     ` Christoph Lameter
2008-02-29 13:13                       ` Andrea Arcangeli
2008-02-29 19:55                         ` Christoph Lameter
2008-02-29 20:17                           ` Andrea Arcangeli
2008-02-29 21:03                             ` Christoph Lameter
2008-02-29 21:23                               ` Andrea Arcangeli
2008-02-29 21:29                                 ` Christoph Lameter
2008-02-29 21:34                                 ` Christoph Lameter
2008-02-29 21:48                                   ` Andrea Arcangeli
2008-02-29 22:12                                     ` Christoph Lameter
2008-02-29 22:41                                       ` Andrea Arcangeli
2008-02-28 10:53             ` Robin Holt
2008-03-03  5:11       ` Nick Piggin
2008-03-03 19:28         ` Christoph Lameter
2008-03-03 19:50           ` Nick Piggin
2008-03-04 18:58             ` Christoph Lameter
2008-03-05  0:52               ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080129220212.GX7233@v2.random \
    --to=andrea@qumranet.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=avi@qumranet.com \
    --cc=benh@kernel.crashing.org \
    --cc=clameter@sgi.com \
    --cc=daniel.blueman@quadrics.com \
    --cc=holt@sgi.com \
    --cc=hugh@veritas.com \
    --cc=izike@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=steiner@sgi.com \
    --subject='Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).