LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: ebiederm@xmission.com (Eric W. Biederman)
Cc: Hugh Dickins <hugh@veritas.com>,
	bryan.wu@analog.com, Robin Holt <holt@sgi.com>,
	"Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com>,
	Andrew Morton <akpm@osdl.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Pavel Machek <pavel@ucw.cz>, Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	sugita <yumiko.sugita.yf@hitachi.com>,
	Satoshi OSHIMA <soshima@redhat.com>,
	haoki@redhat.com, Robin Getz <rgetz@blackfin.uclinux.org>
Subject: Re: Move to unshared VMAs in NOMMU mode?
Date: Tue, 20 Mar 2007 19:12:10 +0000	[thread overview]
Message-ID: <4269.1174417930@redhat.com> (raw)
In-Reply-To: <m1bqinsvy3.fsf@ebiederm.dsl.xmission.com>

Eric W. Biederman <ebiederm@xmission.com> wrote:

> >> For shared mappings you share in some sense the page cache.
> >
> > Currently, no - not unless the driver does something clever as ramfs does.
> > Sharing through the page cache is a nice idea, but it has some limitations,
> > mainly that non-sharing then operates differently.
> 
> I don't quite what your limitations are

 (1) There's no MMU to provide write protection.  This can be worked around in
     various ways - such as returning ETXTBSY if one attempts to write() to a
     shared mapped file.

 (2) An NFS root changing.  ETXTBSY cannot be applied to the server just
     because a client has a file open.  Admittedly, this is a problem for
     MMU-mode too.

 (3) Keeping track of what memory a VMA is currently pinning.  This isn't
     normally necessary in MMU-mode, but it's very important in NOMMU-mode.

 (4) Mappings must be made on contiguous regions of CPU physical memory space.
     This leads to fun generating contiguous regions.  If someone reads, say,
     the first page of a file, then attempts to map the first meg, say, of the
     file (think ld.so loading a shared library), you need to make the first
     meg completely contiguous.  There can be problems with this:

     (a) Any pages that are already in the page cache (notably the first page)
     	 may need moving to a region that's big enough to hold the whole
     	 segment.

     (b) If a smaller mapping is already made, but can't be extended, then you
     	 can't make the bigger mapping unless you can decide not to share them.

     With ramfs on NOMMU, using truncate() to expand a zero length file
     attempts to get a contiguous region of the size specified which is then
     broken up and attached to the page cache for that file.  This is what
     makes POSIX and SYSV shared memory work on NOMMU.

 (5) PROT_WRITE/MAP_SHARED mappings are not practical on files that are on
     devices or filesystems that aren't directly mappable (disks and NFS vs
     memory and flash).

 (6) PROT_WRITE/MAP_PRIVATE mappings cannot practically be made to follow
     changes to the backing file.

Point (4) is the most difficult one, I think.

> but it is a fundamental assumption that the sharing happen in the page cache.

A fundamental assumption where?  There's no requirement for an O/S to work by
having a page cache.

> I.e. The pages that are in the page cache are the only ones that can be
> effectively shared.

That's not actually so.  Character devices don't generally exist in the page
cache, for example.  In fact, IIRC, SYSV SHM used to work without touching the
page cache (I may be wrong on that).

Remember also, you're on a NOMMU system: we might have to bend some of the
rules.

> As a corollary if a page is not in the page cache it should not be
> shared.  I guess this limits sharing to just those things in ramfs?

That's nasty, and also unnecessary.  It also prohibits XIP, btw.

> You obviously don't have the hardware to enforce this

The main point is not that you don't have h/w to enforce protection, it's that
you may not have h/w to do virtual address mappings.

> but at least you can define the sharing of anything else as undefined.

Why would I want to do that?

> Now I don't know what it takes from your data structures to achieve
> sharing of page cache pages.  Or what your underlying complications
> are.  Not having the page cache as the fundamental underlying sharing
> pool is strongly non-intuitive from the perspective of the rest
> of linux.

Again, you're on a NOMMU system.  Actually, as it stands, what I have here
seems to work pretty well.  There isn't much that doesn't work.  fork() doesn't
work (it's just too impractical) and it turns out that SYSV SHM only mostly
works (grrr).

> >> My gut feel says just keep a vma per process of the regions the process
> >> has and do the appropriate book keeping and all will be fine.
> >
> > I'm sure it will be, but at the cost of consuming extra memory.  I'm not
> > sure that the amount of extra memory is, however, all that significant.
> > Now that I think about it, I don't imagine that a lot of processes are
> > going to be running at once on a NOMMU system, and so the scope for sharing
> > isn't all that wide.
> 
> Sure.  But this is sharing with the kernel as well.

What do you mean by that?

> And you always have at least one kernel and one application.

Actually, that's not so.  I seem to recall some router thing where userspace
used to just exit completely, leaving the kernel to run alone.

> So if they can share the file buffers that is a win.

Yes, it's a win, but there are also problems with doing so, most particularly
contiguity.

> And what mmap of any file backed pages is expected to provide.

I think it's reasonable to say that a read-only MAP_PRIVATE mapping need not
follow the backing file in NOMMU mode.  Note that it's close to impossible to
do this for a writable MAP_PRIVATE mapping, even though that's the expected
behaviour if it hasn't been written through.

Out of interest, do you know of any application that relies on the behaviour in
which unwritten private mappings follow the backing file?

> Scratch the fork part.  You still aren't calling the open/close
> methods when they would normally be called, and that is where your
> problem lies.

As previously stated, yes, I do realise that.

I think the behaviour I have for sharing R/O private mappings is good enough,
especially as consolidating bits of the pagecache will be a pain, and mostly
unnecessary.

This is NOMMU mode.  Some things are going to have to be different, but almost
all of the MMU functionality is available, just as long as you don't look too
closely.

David

  parent reply	other threads:[~2007-03-20 19:13 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-16 13:34 [PATCH 0/4] coredump: core dump masking support v3 Kawai, Hidehiro
2007-02-16 13:39 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro
2007-02-16 13:40 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro
2007-02-16 13:41 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro
2007-02-16 13:42 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro
2007-02-16 15:05 ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory David Howells
2007-02-16 16:50   ` Robin Holt
2007-02-16 20:09   ` David Howells
2007-03-02 16:55     ` Hugh Dickins
2007-03-03 14:10     ` David Howells
2007-03-05 19:04       ` Hugh Dickins
2007-03-06 18:13       ` David Howells
2007-03-09 14:12       ` Move to unshared VMAs in NOMMU mode? David Howells
2007-03-12 20:50         ` Robin Getz
2007-03-13 10:14         ` David Howells
2007-03-15 21:20         ` Hugh Dickins
2007-03-15 22:47         ` David Howells
2007-03-19 19:23           ` Eric W. Biederman
2007-03-20 11:06           ` David Howells
2007-03-20 16:48             ` Eric W. Biederman
2007-03-20 19:12             ` David Howells [this message]
2007-03-20 19:51             ` David Howells
2007-03-21 16:11             ` David Howells
2007-03-03 14:25     ` [PATCH] NOMMU: Hide vm_mm in NOMMU mode David Howells
2007-02-20  9:45   ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory Kawai, Hidehiro
2007-02-20 10:58   ` David Howells
2007-02-20 12:56     ` Robin Holt
2007-02-21 10:00     ` Kawai, Hidehiro
2007-02-21 11:33     ` David Howells
2007-02-21 11:54       ` Robin Holt
2007-02-22  5:33         ` Kawai, Hidehiro
2007-02-22 11:47         ` David Howells
2007-02-16 15:08 ` [PATCH 0/4] coredump: core dump masking support v3 David Howells
2007-02-20  9:48   ` Kawai, Hidehiro
2007-02-24  3:32 ` Markus Gutschke
2007-02-24 11:39   ` Pavel Machek
2007-03-01 12:35   ` Kawai, Hidehiro
2007-03-01 18:16     ` Markus Gutschke
2007-02-24 10:02 ` David Howells
2007-02-24 20:01   ` Markus Gutschke
2007-02-26 11:49   ` David Howells
2007-02-26 12:01     ` Pavel Machek
2007-02-26 12:42     ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4269.1174417930@redhat.com \
    --to=dhowells@redhat.com \
    --cc=akpm@osdl.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=bryan.wu@analog.com \
    --cc=ebiederm@xmission.com \
    --cc=haoki@redhat.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=holt@sgi.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=pavel@ucw.cz \
    --cc=rgetz@blackfin.uclinux.org \
    --cc=soshima@redhat.com \
    --cc=yumiko.sugita.yf@hitachi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).