LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Robin Holt <holt@sgi.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Christoph Lameter <clameter@sgi.com>,
	Avi Kivity <avi@qumranet.com>, Izik Eidus <izike@qumranet.com>,
	Andrew Morton <akpm@osdl.org>, Nick Piggin <npiggin@suse.de>,
	kvm-devel@lists.sourceforge.net,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	steiner@sgi.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, daniel.blueman@quadrics.com, holt@sgi.com,
	Hugh Dickins <hugh@veritas.com>
Subject: Re: [kvm-devel] [PATCH] export notifier #1
Date: Wed, 23 Jan 2008 06:32:30 -0600	[thread overview]
Message-ID: <20080123123230.GH26420@sgi.com> (raw)
In-Reply-To: <20080123114136.GE15848@v2.random>

Christoph, Maybe you can clear one thing up.  Was this proposal an
addition to or replacement of Andrea's?  I assumed an addition.  I am
going to try to restrict my responses to ones appropriate for that
assumption.

> The remote instance is like a secondary TLB what you're doing in your
> code is as backwards as flushing the TLB _before_ clearing the PTE! If
> you want to call the secondary tlb flush outside locks we can argue
> about that, but I think you should do that _after_ clearing the linux
> pte IMHO. Otherwise you can as well move the tlb_flush_page before
> clearing the pte and you'll run in the same amount of smp races for
> the master MMU too.

I can agree to doing the flush after as long as I get a flag from the
mmu notifier saying this flush will be repeated later without locks
held.  That would be fine with me.  If we don't have that then the
export_notifiers would need to be expanded to cover all cases of pte
clearing.  Baring one of the two, I would argue we have an unworkable
solution for XPMEM.  This is because of allocations which is touched
upon later.

> The ordering you're implementing is backwards and unnatural, you can
> try to serialize it with explicit locking to block the "remote-tlb
> refills" through page bitflags (something not doable with the core
> master tlb because the refills are done by the hardware with the
> master tlb), but it'll remain unnatural and backwards IMHO.

Given one of the two compromises above, I believe this will then work
for XPMEM.  The callouts were placed as they are now to prevent the
mmu_notifier callouts from having to do work.

> > > > - anon_vma/inode and pte locks are held during callbacks.
> > > 
> > > In a previous email I asked what's wrong in offloading the event, and
> > 
> > We have internally discussed the possibility of offloading the event but 
> > that wont work with the existing callback since we would have to 
> > perform atomic allocation and there may be thousands of external 
> > references to a page.

As an example of thousands, we currently have one customer job that
has 16880 processors all with the same physical page faulted into their
address space.  The way XPMEM is currently structured, there is fan-out of
that PFN information so we would not need to queue up that many messages,
but it would still be considerable.  Our upcoming version of the hardware
will potentially make this fanout worse because we are going to allow
even more fine-grained divisions of the machine to help with memory
error containment.

> With KVM it doesn't work that way. Anyway you must be keeping a
> "secondary" count if you know when it's time to call
> __free_page/put_page, so why don't you use the main page_count instead?

We have a counter associated with a pfn that indicates when the pfn is no
longer referenced by other partitions.  This counter triggers changing of
memory protections so any subsequent access to this page will result in
a memory error on the remote partition (this should be an illegal case).

> And how do they know when they can restart adding references if infact
> the VM _never_ calls into SetPageExported? (perhaps you forgot
> something in your patch to set PageExported again to notify the
> external reference that it can "de-freeze" and to restart adding
> references ;)

I assumed Christoph intended this to be part of our memory protection
changing phase.  Once we have raised memory protections for the page,
clear the bit.  When we lower memory protections, set the bit.

> The thing is, we can add notifiers to my patch to fit your needs, but
> those will be _different_ notifiers and they really should be after
> the linux pte updates... I think fixing your code so it'll work with
> the sleeping-notifiers getting the "page" instead of a virtual address
> called _after_ clearing the main linux pte, is the way to go. Then
> hopefully won't have to find a way to enable the PageExported bitflag
> anymore in the linux VM and it may remain always-on for the exported
> pages etc.... it makes life a whole lot easier for you too IMHO.

As I said before, I was under the assumption that Christoph was proposing
this as an addition to your notifiers, not a replacement.

Thanks,
Robin

  reply	other threads:[~2008-01-23 12:32 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-13 16:24 [PATCH] mmu notifiers #v2 Andrea Arcangeli
2008-01-13 21:11 ` Benjamin Herrenschmidt
2008-01-14 20:02 ` Christoph Lameter
2008-01-15  4:28   ` Benjamin Herrenschmidt
2008-01-15 12:44   ` Andrea Arcangeli
2008-01-15 20:18     ` Benjamin Herrenschmidt
2008-01-16  1:06       ` Andrea Arcangeli
2008-01-16  9:01 ` Brice Goglin
2008-01-16 10:19   ` Andrea Arcangeli
2008-01-16 17:42 ` Rik van Riel
2008-01-16 17:48   ` Izik Eidus
2008-01-17 16:23     ` Andrea Arcangeli
2008-01-17 18:21       ` Izik Eidus
2008-01-17 19:32         ` Andrea Arcangeli
2008-01-21 12:52           ` [PATCH] mmu notifiers #v3 Andrea Arcangeli
2008-01-22  2:21             ` Rik van Riel
2008-01-22 14:12             ` [kvm-devel] " Avi Kivity
2008-01-22 14:43               ` Andrea Arcangeli
2008-01-22 20:08                 ` [kvm-devel] [PATCH] mmu notifiers #v4 Andrea Arcangeli
2008-01-22 20:34                   ` [kvm-devel] [PATCH] export notifier #1 Christoph Lameter
2008-01-22 22:31                     ` Andrea Arcangeli
2008-01-22 22:53                       ` Christoph Lameter
2008-01-23 10:27                         ` Avi Kivity
2008-01-23 10:52                           ` Robin Holt
2008-01-23 12:04                             ` Andrea Arcangeli
2008-01-23 12:34                               ` Robin Holt
2008-01-23 19:48                               ` Christoph Lameter
2008-01-23 19:58                                 ` Robin Holt
2008-01-23 19:47                             ` Christoph Lameter
2008-01-24  5:56                               ` Avi Kivity
2008-01-24 12:26                                 ` Andrea Arcangeli
2008-01-24 12:34                                   ` Avi Kivity
2008-01-23 11:41                         ` Andrea Arcangeli
2008-01-23 12:32                           ` Robin Holt [this message]
2008-01-23 17:33                             ` Andrea Arcangeli
2008-01-23 20:27                               ` Christoph Lameter
2008-01-24 15:42                                 ` Andrea Arcangeli
2008-01-24 20:07                                   ` Christoph Lameter
2008-01-25  6:35                                     ` Avi Kivity
2008-01-23 20:18                           ` Christoph Lameter
2008-01-24 14:34                             ` Andrea Arcangeli
2008-01-24 14:41                               ` Andrea Arcangeli
2008-01-24 15:15                               ` Avi Kivity
2008-01-24 15:18                                 ` Avi Kivity
2008-01-24 20:01                               ` Christoph Lameter
2008-01-22 23:36                     ` Benjamin Herrenschmidt
2008-01-23  0:40                       ` Christoph Lameter
2008-01-23  1:21                         ` Robin Holt
2008-01-23 12:51                     ` Gerd Hoffmann
2008-01-23 13:19                       ` Robin Holt
2008-01-23 14:12                         ` Gerd Hoffmann
2008-01-23 14:18                           ` Robin Holt
2008-01-23 14:35                             ` Gerd Hoffmann
2008-01-23 15:48                               ` Robin Holt
2008-01-23 14:17                         ` Avi Kivity
2008-01-24  4:03                           ` Benjamin Herrenschmidt
2008-01-23 15:41                       ` Andrea Arcangeli
2008-01-23 17:47                         ` Gerd Hoffmann
2008-01-24  6:01                           ` Avi Kivity
2008-01-24  6:45                           ` Jeremy Fitzhardinge
2008-01-23 20:40                         ` Christoph Lameter
2008-01-24  2:00                   ` Enhance mmu notifiers to accomplish a lockless implementation (incomplete) Robin Holt
2008-01-24  4:05                     ` Robin Holt
2008-01-22 19:28             ` [PATCH] mmu notifiers #v3 Peter Zijlstra
2008-01-22 20:31               ` Christoph Lameter
2008-01-22 20:31               ` Andrea Arcangeli
2008-01-22 22:10                 ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080123123230.GH26420@sgi.com \
    --to=holt@sgi.com \
    --cc=akpm@osdl.org \
    --cc=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=benh@kernel.crashing.org \
    --cc=clameter@sgi.com \
    --cc=daniel.blueman@quadrics.com \
    --cc=hugh@veritas.com \
    --cc=izike@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).