LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Christoph Lameter <clameter@sgi.com>
To: Christian Bell <christian.bell@qlogic.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>,
	Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <andrea@qumranet.com>,
	a.p.zijlstra@chello.nl, izike@qumranet.com,
	Roland Dreier <rdreier@cisco.com>,
	steiner@sgi.com, linux-kernel@vger.kernel.org, avi@qumranet.com,
	linux-mm@kvack.org, daniel.blueman@quadrics.com,
	Robin Holt <holt@sgi.com>,
	general@lists.openfabrics.org,
	Andrew Morton <akpm@linux-foundation.org>,
	kvm-devel@lists.sourceforge.net
Subject: Re: [ofa-general] Re: Demand paging for memory regions
Date: Wed, 13 Feb 2008 11:00:05 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0802131052360.18472@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <20080213040905.GQ29340@mv.qlogic.com>

On Tue, 12 Feb 2008, Christian Bell wrote:

> You're arguing that a HW page table is not needed by describing a use
> case that is essentially what all RDMA solutions already do above the
> wire protocols (all solutions except Quadrics, of course).

The HW page table is not essential to the notification scheme. That the 
RDMA uses the page table for linearization is another issue. A chip could 
just have a TLB cache and lookup the entries using the OS page table f.e.

> > Lets say you have a two systems A and B. Each has their memory region MemA 
> > and MemB. Each side also has page tables for this region PtA and PtB.
> > If either side then accesses the page again then the reverse process 
> > happens. If B accesses the page then it wil first of all incur a page 
> > fault because the entry in PtB is missing. The fault will then cause a 
> > message to be send to A to establish the page again. A will create an 
> > entry in PtA and will then confirm to B that the page was established. At 
> > that point RDMA operations can occur again.
> 
> The notifier-reclaim cycle you describe is akin to the out-of-band
> pin-unpin control messages used by existing communication libraries.
> Also, I think what you are proposing can have problems at scale -- A
> must keep track of all of the (potentially many systems) of memA and
> cooperatively get an agreement from all these systems before reclaiming
> the page.

Right. We (SGI) have done something like this for a long time with XPmem 
and it scales ok.

> When messages are sufficiently large, the control messaging necessary
> to setup/teardown the regions is relatively small.  This is not
> always the case however -- in programming models that employ smaller
> messages, the one-sided nature of RDMA is the most attractive part of
> it.  

The messaging would only be needed if a process comes under memory 
pressure. As long as there is enough memory nothing like this will occur.

> Nothing any communication/runtime system can't already do today.  The
> point of RDMA demand paging is enabling the possibility of using RDMA
> without the implied synchronization -- the optimistic part.  Using
> the notifiers to duplicate existing memory region handling for RDMA
> hardware that doesn't have HW page tables is possible but undermines
> the more important consumer of your patches in my opinion.

The notifier schemet should integrate into existing memory region 
handling and not cause a duplication. If you already have library layers 
that do this then it should be possible to integrate it.

> One other area that has not been brought up yet (I think) is the
> applicability of notifiers in letting users know when pinned memory
> is reclaimed by the kernel.  This is useful when a lower-level
> library employs lazy deregistration strategies on memory regions that
> are subsequently released to the kernel via the application's use of
> munmap or sbrk.  Ohio Supercomputing Center has work in this area but
> a generalized approach in the kernel would certainly be welcome.

The driver gets the notifications about memory being reclaimed. The driver 
could then notify user code about the release as well.

Pinned memory current *cannot* be reclaimed by the kernel. The refcount is 
elevated. This means that the VM tries to remove the mappings and then 
sees that it was not able to remove all references. Then it gives up and 
tries again and again and again.... Thus the potential for livelock.


  reply	other threads:[~2008-02-13 19:00 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-08 22:06 [patch 0/6] MMU Notifiers V6 Christoph Lameter
2008-02-08 22:06 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
2008-02-08 22:06 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-02-08 22:06 ` [patch 3/6] mmu_notifier: invalidate_page callbacks Christoph Lameter
2008-02-08 22:06 ` [patch 4/6] mmu_notifier: Skeleton driver for a simple mmu_notifier Christoph Lameter
2008-02-08 22:06 ` [patch 5/6] mmu_notifier: Support for drivers with revers maps (f.e. for XPmem) Christoph Lameter
2008-02-08 22:06 ` [patch 6/6] mmu_rmap_notifier: Skeleton for complex driver that uses its own rmaps Christoph Lameter
2008-02-08 22:23 ` [patch 0/6] MMU Notifiers V6 Andrew Morton
2008-02-08 23:32   ` Christoph Lameter
2008-02-08 23:36     ` Robin Holt
2008-02-08 23:41       ` Christoph Lameter
2008-02-08 23:43         ` Robin Holt
2008-02-08 23:56           ` Andrew Morton
2008-02-09  0:05             ` Christoph Lameter
2008-02-09  0:12               ` [ofa-general] " Roland Dreier
2008-02-09  0:16                 ` Christoph Lameter
2008-02-09  0:22                   ` Roland Dreier
2008-02-09  0:36                     ` Christoph Lameter
2008-02-09  1:24                       ` Andrea Arcangeli
2008-02-09  1:27                         ` Christoph Lameter
2008-02-09  1:56                           ` Andrea Arcangeli
2008-02-09  2:16                             ` Christoph Lameter
2008-02-09 12:55                               ` Rik van Riel
2008-02-09 21:46                                 ` Christoph Lameter
2008-02-11 22:40                                   ` Demand paging for memory regions (was Re: MMU Notifiers V6) Roland Dreier
2008-02-12 22:01                                     ` Steve Wise
2008-02-12 22:10                                       ` Christoph Lameter
2008-02-12 22:41                                         ` [ofa-general] Re: Demand paging for memory regions Roland Dreier
2008-02-12 23:14                                           ` Felix Marti
2008-02-13  0:57                                             ` Christoph Lameter
2008-02-14 15:09                                             ` Steve Wise
2008-02-14 15:53                                               ` Robin Holt
2008-02-14 16:23                                                 ` Steve Wise
2008-02-14 17:48                                                   ` Caitlin Bestler
2008-02-14 19:39                                               ` Christoph Lameter
2008-02-14 20:17                                                 ` Caitlin Bestler
2008-02-14 20:20                                                   ` Christoph Lameter
2008-02-14 22:43                                                     ` Caitlin Bestler
2008-02-14 22:48                                                       ` Christoph Lameter
2008-02-15  1:26                                                         ` Caitlin Bestler
2008-02-15  2:37                                                           ` Christoph Lameter
2008-02-15 18:09                                                             ` Caitlin Bestler
2008-02-15 18:45                                                               ` Christoph Lameter
2008-02-15 18:53                                                                 ` Caitlin Bestler
2008-02-15 20:02                                                                   ` Christoph Lameter
2008-02-15 20:14                                                                     ` Caitlin Bestler
2008-02-15 22:50                                                                       ` Christoph Lameter
2008-02-15 23:50                                                                         ` Caitlin Bestler
2008-02-12 23:23                                           ` Jason Gunthorpe
2008-02-13  1:01                                             ` Christoph Lameter
2008-02-13  1:26                                               ` Jason Gunthorpe
2008-02-13  1:45                                                 ` Steve Wise
2008-02-13  2:35                                                 ` Christoph Lameter
2008-02-13  3:25                                                   ` Jason Gunthorpe
2008-02-13  3:56                                                     ` Patrick Geoffray
2008-02-13  4:26                                                       ` Jason Gunthorpe
2008-02-13  4:47                                                         ` Patrick Geoffray
2008-02-13 18:51                                                     ` Christoph Lameter
2008-02-13 19:51                                                       ` Jason Gunthorpe
2008-02-13 20:36                                                         ` Christoph Lameter
2008-02-13  4:09                                                   ` Christian Bell
2008-02-13 19:00                                                     ` Christoph Lameter [this message]
2008-02-13 19:46                                                       ` Christian Bell
2008-02-13 20:32                                                         ` Christoph Lameter
2008-02-13 22:44                                                           ` Kanoj Sarcar
2008-02-13 23:02                                                             ` Christoph Lameter
2008-02-13 23:43                                                               ` Kanoj Sarcar
2008-02-13 23:48                                                                 ` Jesse Barnes
2008-02-14  0:56                                                                 ` [ofa-general] " Andrea Arcangeli
2008-02-14 19:35                                                                 ` Christoph Lameter
2008-02-13 23:23                                                     ` Pete Wyckoff
2008-02-14  0:01                                                       ` Jason Gunthorpe
2008-02-27 22:11                                                         ` Christoph Lameter
2008-02-13  1:55                                               ` Christian Bell
2008-02-13  2:19                                                 ` Christoph Lameter
2008-02-13  0:56                                           ` Christoph Lameter
2008-02-13 12:11                                           ` Christoph Raisch
2008-02-13 19:02                                             ` Christoph Lameter
2008-02-09  0:12               ` [patch 0/6] MMU Notifiers V6 Andrew Morton
2008-02-09  0:18                 ` Christoph Lameter
2008-02-13 14:31 ` Jack Steiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0802131052360.18472@schroedinger.engr.sgi.com \
    --to=clameter@sgi.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=christian.bell@qlogic.com \
    --cc=daniel.blueman@quadrics.com \
    --cc=general@lists.openfabrics.org \
    --cc=holt@sgi.com \
    --cc=izike@qumranet.com \
    --cc=jgunthorpe@obsidianresearch.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rdreier@cisco.com \
    --cc=riel@redhat.com \
    --cc=steiner@sgi.com \
    --subject='Re: [ofa-general] Re: Demand paging for memory regions' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).