LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com> To: ebiederm@xmission.com (Eric W. Biederman) Cc: Hugh Dickins <hugh@veritas.com>, bryan.wu@analog.com, Robin Holt <holt@sgi.com>, "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com>, Andrew Morton <akpm@osdl.org>, kernel list <linux-kernel@vger.kernel.org>, Pavel Machek <pavel@ucw.cz>, Alan Cox <alan@lxorguk.ukuu.org.uk>, Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>, sugita <yumiko.sugita.yf@hitachi.com>, Satoshi OSHIMA <soshima@redhat.com>, haoki@redhat.com, Robin Getz <rgetz@blackfin.uclinux.org> Subject: Re: Move to unshared VMAs in NOMMU mode? Date: Tue, 20 Mar 2007 19:12:10 +0000 [thread overview] Message-ID: <4269.1174417930@redhat.com> (raw) In-Reply-To: <m1bqinsvy3.fsf@ebiederm.dsl.xmission.com> Eric W. Biederman <ebiederm@xmission.com> wrote: > >> For shared mappings you share in some sense the page cache. > > > > Currently, no - not unless the driver does something clever as ramfs does. > > Sharing through the page cache is a nice idea, but it has some limitations, > > mainly that non-sharing then operates differently. > > I don't quite what your limitations are (1) There's no MMU to provide write protection. This can be worked around in various ways - such as returning ETXTBSY if one attempts to write() to a shared mapped file. (2) An NFS root changing. ETXTBSY cannot be applied to the server just because a client has a file open. Admittedly, this is a problem for MMU-mode too. (3) Keeping track of what memory a VMA is currently pinning. This isn't normally necessary in MMU-mode, but it's very important in NOMMU-mode. (4) Mappings must be made on contiguous regions of CPU physical memory space. This leads to fun generating contiguous regions. If someone reads, say, the first page of a file, then attempts to map the first meg, say, of the file (think ld.so loading a shared library), you need to make the first meg completely contiguous. There can be problems with this: (a) Any pages that are already in the page cache (notably the first page) may need moving to a region that's big enough to hold the whole segment. (b) If a smaller mapping is already made, but can't be extended, then you can't make the bigger mapping unless you can decide not to share them. With ramfs on NOMMU, using truncate() to expand a zero length file attempts to get a contiguous region of the size specified which is then broken up and attached to the page cache for that file. This is what makes POSIX and SYSV shared memory work on NOMMU. (5) PROT_WRITE/MAP_SHARED mappings are not practical on files that are on devices or filesystems that aren't directly mappable (disks and NFS vs memory and flash). (6) PROT_WRITE/MAP_PRIVATE mappings cannot practically be made to follow changes to the backing file. Point (4) is the most difficult one, I think. > but it is a fundamental assumption that the sharing happen in the page cache. A fundamental assumption where? There's no requirement for an O/S to work by having a page cache. > I.e. The pages that are in the page cache are the only ones that can be > effectively shared. That's not actually so. Character devices don't generally exist in the page cache, for example. In fact, IIRC, SYSV SHM used to work without touching the page cache (I may be wrong on that). Remember also, you're on a NOMMU system: we might have to bend some of the rules. > As a corollary if a page is not in the page cache it should not be > shared. I guess this limits sharing to just those things in ramfs? That's nasty, and also unnecessary. It also prohibits XIP, btw. > You obviously don't have the hardware to enforce this The main point is not that you don't have h/w to enforce protection, it's that you may not have h/w to do virtual address mappings. > but at least you can define the sharing of anything else as undefined. Why would I want to do that? > Now I don't know what it takes from your data structures to achieve > sharing of page cache pages. Or what your underlying complications > are. Not having the page cache as the fundamental underlying sharing > pool is strongly non-intuitive from the perspective of the rest > of linux. Again, you're on a NOMMU system. Actually, as it stands, what I have here seems to work pretty well. There isn't much that doesn't work. fork() doesn't work (it's just too impractical) and it turns out that SYSV SHM only mostly works (grrr). > >> My gut feel says just keep a vma per process of the regions the process > >> has and do the appropriate book keeping and all will be fine. > > > > I'm sure it will be, but at the cost of consuming extra memory. I'm not > > sure that the amount of extra memory is, however, all that significant. > > Now that I think about it, I don't imagine that a lot of processes are > > going to be running at once on a NOMMU system, and so the scope for sharing > > isn't all that wide. > > Sure. But this is sharing with the kernel as well. What do you mean by that? > And you always have at least one kernel and one application. Actually, that's not so. I seem to recall some router thing where userspace used to just exit completely, leaving the kernel to run alone. > So if they can share the file buffers that is a win. Yes, it's a win, but there are also problems with doing so, most particularly contiguity. > And what mmap of any file backed pages is expected to provide. I think it's reasonable to say that a read-only MAP_PRIVATE mapping need not follow the backing file in NOMMU mode. Note that it's close to impossible to do this for a writable MAP_PRIVATE mapping, even though that's the expected behaviour if it hasn't been written through. Out of interest, do you know of any application that relies on the behaviour in which unwritten private mappings follow the backing file? > Scratch the fork part. You still aren't calling the open/close > methods when they would normally be called, and that is where your > problem lies. As previously stated, yes, I do realise that. I think the behaviour I have for sharing R/O private mappings is good enough, especially as consolidating bits of the pagecache will be a pain, and mostly unnecessary. This is NOMMU mode. Some things are going to have to be different, but almost all of the MMU functionality is available, just as long as you don't look too closely. David
next prev parent reply other threads:[~2007-03-20 19:13 UTC|newest] Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top 2007-02-16 13:34 [PATCH 0/4] coredump: core dump masking support v3 Kawai, Hidehiro 2007-02-16 13:39 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro 2007-02-16 13:40 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro 2007-02-16 13:41 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro 2007-02-16 13:42 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro 2007-02-16 15:05 ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory David Howells 2007-02-16 16:50 ` Robin Holt 2007-02-16 20:09 ` David Howells 2007-03-02 16:55 ` Hugh Dickins 2007-03-03 14:10 ` David Howells 2007-03-05 19:04 ` Hugh Dickins 2007-03-06 18:13 ` David Howells 2007-03-09 14:12 ` Move to unshared VMAs in NOMMU mode? David Howells 2007-03-12 20:50 ` Robin Getz 2007-03-13 10:14 ` David Howells 2007-03-15 21:20 ` Hugh Dickins 2007-03-15 22:47 ` David Howells 2007-03-19 19:23 ` Eric W. Biederman 2007-03-20 11:06 ` David Howells 2007-03-20 16:48 ` Eric W. Biederman 2007-03-20 19:12 ` David Howells [this message] 2007-03-20 19:51 ` David Howells 2007-03-21 16:11 ` David Howells 2007-03-03 14:25 ` [PATCH] NOMMU: Hide vm_mm in NOMMU mode David Howells 2007-02-20 9:45 ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory Kawai, Hidehiro 2007-02-20 10:58 ` David Howells 2007-02-20 12:56 ` Robin Holt 2007-02-21 10:00 ` Kawai, Hidehiro 2007-02-21 11:33 ` David Howells 2007-02-21 11:54 ` Robin Holt 2007-02-22 5:33 ` Kawai, Hidehiro 2007-02-22 11:47 ` David Howells 2007-02-16 15:08 ` [PATCH 0/4] coredump: core dump masking support v3 David Howells 2007-02-20 9:48 ` Kawai, Hidehiro 2007-02-24 3:32 ` Markus Gutschke 2007-02-24 11:39 ` Pavel Machek 2007-03-01 12:35 ` Kawai, Hidehiro 2007-03-01 18:16 ` Markus Gutschke 2007-02-24 10:02 ` David Howells 2007-02-24 20:01 ` Markus Gutschke 2007-02-26 11:49 ` David Howells 2007-02-26 12:01 ` Pavel Machek 2007-02-26 12:42 ` David Howells
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4269.1174417930@redhat.com \ --to=dhowells@redhat.com \ --cc=akpm@osdl.org \ --cc=alan@lxorguk.ukuu.org.uk \ --cc=bryan.wu@analog.com \ --cc=ebiederm@xmission.com \ --cc=haoki@redhat.com \ --cc=hidehiro.kawai.ez@hitachi.com \ --cc=holt@sgi.com \ --cc=hugh@veritas.com \ --cc=linux-kernel@vger.kernel.org \ --cc=masami.hiramatsu.pt@hitachi.com \ --cc=pavel@ucw.cz \ --cc=rgetz@blackfin.uclinux.org \ --cc=soshima@redhat.com \ --cc=yumiko.sugita.yf@hitachi.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).