LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com> To: Hugh Dickins <hugh@veritas.com>, bryan.wu@analog.com Cc: Robin Holt <holt@sgi.com>, "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com>, Andrew Morton <akpm@osdl.org>, kernel list <linux-kernel@vger.kernel.org>, Pavel Machek <pavel@ucw.cz>, Alan Cox <alan@lxorguk.ukuu.org.uk>, Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>, sugita <yumiko.sugita.yf@hitachi.com>, Satoshi OSHIMA <soshima@redhat.com>, haoki@redhat.com, Robin Getz <rgetz@blackfin.uclinux.org> Subject: Move to unshared VMAs in NOMMU mode? Date: Fri, 09 Mar 2007 14:12:02 +0000 [thread overview] Message-ID: <12852.1173449522@redhat.com> (raw) In-Reply-To: <3378.1173204813@redhat.com> I've been considering how to deal with the SYSV SHM problem, and I think we may have to move to unshared VMAs in NOMMU mode to deal with this. Currently, what we have is each mm_struct has in its arch-specific context argument a list of VMLs. Take the FRV context for example: [include/asm-frv/mmu.h] typedef struct { #ifdef CONFIG_MMU ... struct vm_list_struct *vmlist; unsigned long end_brk; #endif ... } mm_context_t; Each VML struct containes a pointer to a systemwide VMA and the next VML in the list: struct vm_list_struct { struct vm_list_struct *next; struct vm_area_struct *vma; }; The VMAs themselves are kept in an rb-tree in mm/nommu.c: /* list of shareable VMAs */ struct rb_root nommu_vma_tree = RB_ROOT; which can then be displayed through /proc/maps. There are some restrictions of this system, mainly due to the NOMMU constraints: (*) mmap() may not be used to overlay one mapping upon another (*) mmap() may not be used with MAP_FIXED. (*) mmap()'s of the same part of the same file will result in multiple mappings returning the same base address, assuming the maps are shareable. If they aren't shareable, they'll be at different base addresses. (*) for normal shareable file mappings, two mappings will only be shared if they precisely match offset, size and protection, otherwise a new mapping will be created (this is because VMAs will be shared). Splitting VMAs would reduce the this restriction, though subsequent mappings would have to be bounded by the first mapping, but wouldn't have to be the same size. (*) munmap() may only unmap a precise match amongst the mappings made; it may not be used to cut down or punch a hole in an existing mapping. The VMAs for private file mappings, private blockdev mappings and anonymous mappings, be they shared[*] or unshared, hold a pointer to the kmalloc()'d region of memory in which the mapping contents reside. This region is discarded when the VMA is deleted. When a region can be shared the VMA is also shared, and so no reference counting need take place on the mapping contents as that is implied by the VMA. [*] MAP_PRIVATE+!PROT_WRITE+!PT_PTRACED regions may be shared Note that for mappable chardevs with special BDI capability flags, extra VMAs may be allocated because (a) they may need to overlap non-exactly, and (b) the chardev itself pins the backing storage, if the backing storage is potentially transient. If VMAs are not shared for shared memory regions then some other means of retaining the actual allocated memory region must be found. The obvious way to do this is to have the VMA point to a shared, refcounted record that keeps track of the region: struct vm_region { /* the first parameters define the region as for the VMA */ pgprot_t vm_page_prot; unsigned long vm_start; unsigned long vm_end unsigned long vm_pgoff; struct file *vm_file; atomic_t vm_usage; /* region usage count */ struct rb_node vm_rb; /* region tree */ }; The VMA itself would then have to be modified to include a pointer to this, but wouldn't then need its own refcount. VMAs would belong, once again, to the mm_struct, the VML struct would vanish, and the VML list rooted in mm_context_t would vanish. For R/O shareable file mappings, it might be possible to actually use the target file's pagecache for the mapping. I do something of that sort for shared-writable mappings on ramfs files (to support POSIX SHM and SYSV SHM). The downside of allocating all these extra VMAs is that, of course, it takes up more memory, though that may not be too bad, especially if it's at the gain of additional consistency with the MM code. However, consistency isn't for the most part a real issue. As I see it, drivers and filesystems should not concern themselves with anything other than the VMA they're given, and so it doesn't matter if these are shared or not. That brings us on to the problem with SYSV SHM which keeps an attachment count that the VMA mmap(), open() and release() ops manipulate. This means that the nattch count comes out wrong on NOMMU systems. Note that on MMU systems, doing a munmap() in the middle of an attached region will *also* break the nattch count, though this is self-correcting. Another way of dealing with the nattch count on NOMMU systems is to do it through the VML list, but that then needs more special casing in the SHM driver and perhaps others. Thoughts? David
next prev parent reply other threads:[~2007-03-09 14:15 UTC|newest] Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top 2007-02-16 13:34 [PATCH 0/4] coredump: core dump masking support v3 Kawai, Hidehiro 2007-02-16 13:39 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro 2007-02-16 13:40 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro 2007-02-16 13:41 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro 2007-02-16 13:42 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro 2007-02-16 15:05 ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory David Howells 2007-02-16 16:50 ` Robin Holt 2007-02-16 20:09 ` David Howells 2007-03-02 16:55 ` Hugh Dickins 2007-03-03 14:10 ` David Howells 2007-03-05 19:04 ` Hugh Dickins 2007-03-06 18:13 ` David Howells 2007-03-09 14:12 ` David Howells [this message] 2007-03-12 20:50 ` Move to unshared VMAs in NOMMU mode? Robin Getz 2007-03-13 10:14 ` David Howells 2007-03-15 21:20 ` Hugh Dickins 2007-03-15 22:47 ` David Howells 2007-03-19 19:23 ` Eric W. Biederman 2007-03-20 11:06 ` David Howells 2007-03-20 16:48 ` Eric W. Biederman 2007-03-20 19:12 ` David Howells 2007-03-20 19:51 ` David Howells 2007-03-21 16:11 ` David Howells 2007-03-03 14:25 ` [PATCH] NOMMU: Hide vm_mm in NOMMU mode David Howells 2007-02-20 9:45 ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory Kawai, Hidehiro 2007-02-20 10:58 ` David Howells 2007-02-20 12:56 ` Robin Holt 2007-02-21 10:00 ` Kawai, Hidehiro 2007-02-21 11:33 ` David Howells 2007-02-21 11:54 ` Robin Holt 2007-02-22 5:33 ` Kawai, Hidehiro 2007-02-22 11:47 ` David Howells 2007-02-16 15:08 ` [PATCH 0/4] coredump: core dump masking support v3 David Howells 2007-02-20 9:48 ` Kawai, Hidehiro 2007-02-24 3:32 ` Markus Gutschke 2007-02-24 11:39 ` Pavel Machek 2007-03-01 12:35 ` Kawai, Hidehiro 2007-03-01 18:16 ` Markus Gutschke 2007-02-24 10:02 ` David Howells 2007-02-24 20:01 ` Markus Gutschke 2007-02-26 11:49 ` David Howells 2007-02-26 12:01 ` Pavel Machek 2007-02-26 12:42 ` David Howells
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=12852.1173449522@redhat.com \ --to=dhowells@redhat.com \ --cc=akpm@osdl.org \ --cc=alan@lxorguk.ukuu.org.uk \ --cc=bryan.wu@analog.com \ --cc=haoki@redhat.com \ --cc=hidehiro.kawai.ez@hitachi.com \ --cc=holt@sgi.com \ --cc=hugh@veritas.com \ --cc=linux-kernel@vger.kernel.org \ --cc=masami.hiramatsu.pt@hitachi.com \ --cc=pavel@ucw.cz \ --cc=rgetz@blackfin.uclinux.org \ --cc=soshima@redhat.com \ --cc=yumiko.sugita.yf@hitachi.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).