LKML Archive on
help / color / mirror / Atom feed
From: David Howells <>
To: Hugh Dickins <>,
Cc: Robin Holt <>,
	"Kawai, Hidehiro" <>,
	Andrew Morton <>,
	kernel list <>,
	Pavel Machek <>, Alan Cox <>,
	Masami Hiramatsu <>,
	sugita <>,
	Satoshi OSHIMA <>,, Robin Getz <>
Subject: Move to unshared VMAs in NOMMU mode?
Date: Fri, 09 Mar 2007 14:12:02 +0000	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

I've been considering how to deal with the SYSV SHM problem, and I think we
may have to move to unshared VMAs in NOMMU mode to deal with this.  Currently,
what we have is each mm_struct has in its arch-specific context argument a
list of VMLs.  Take the FRV context for example:

	typedef struct {
	#ifdef CONFIG_MMU
		struct vm_list_struct	*vmlist;
		unsigned long		end_brk;

	} mm_context_t;

Each VML struct containes a pointer to a systemwide VMA and the next VML in
the list:

	struct vm_list_struct {
		struct vm_list_struct	*next;
		struct vm_area_struct	*vma;

The VMAs themselves are kept in an rb-tree in mm/nommu.c:

	/* list of shareable VMAs */
	struct rb_root nommu_vma_tree = RB_ROOT;

which can then be displayed through /proc/maps.

There are some restrictions of this system, mainly due to the NOMMU constraints:

 (*) mmap() may not be used to overlay one mapping upon another

 (*) mmap() may not be used with MAP_FIXED.

 (*) mmap()'s of the same part of the same file will result in multiple
     mappings returning the same base address, assuming the maps are shareable.
     If they aren't shareable, they'll be at different base addresses.

 (*) for normal shareable file mappings, two mappings will only be shared if
     they precisely match offset, size and protection, otherwise a new mapping
     will be created (this is because VMAs will be shared).  Splitting VMAs
     would reduce the this restriction, though subsequent mappings would have
     to be bounded by the first mapping, but wouldn't have to be the same size.

 (*) munmap() may only unmap a precise match amongst the mappings made; it may
     not be used to cut down or punch a hole in an existing mapping.

The VMAs for private file mappings, private blockdev mappings and anonymous
mappings, be they shared[*] or unshared, hold a pointer to the kmalloc()'d
region of memory in which the mapping contents reside.  This region is
discarded when the VMA is deleted.  When a region can be shared the VMA is also
shared, and so no reference counting need take place on the mapping contents as
that is implied by the VMA.

[*] MAP_PRIVATE+!PROT_WRITE+!PT_PTRACED regions may be shared

Note that for mappable chardevs with special BDI capability flags, extra VMAs
may be allocated because (a) they may need to overlap non-exactly, and (b) the
chardev itself pins the backing storage, if the backing storage is potentially

If VMAs are not shared for shared memory regions then some other means of
retaining the actual allocated memory region must be found.  The obvious way to
do this is to have the VMA point to a shared, refcounted record that keeps
track of the region:

	struct vm_region {
		/* the first parameters define the region as for the VMA */
		pgprot_t	vm_page_prot;
		unsigned long	vm_start;
		unsigned long	vm_end
		unsigned long	vm_pgoff;
		struct file	*vm_file;

		atomic_t	vm_usage;	/* region usage count */
		struct rb_node	vm_rb;		/* region tree */

The VMA itself would then have to be modified to include a pointer to this, but
wouldn't then need its own refcount.  VMAs would belong, once again, to the
mm_struct, the VML struct would vanish, and the VML list rooted in mm_context_t
would vanish.

For R/O shareable file mappings, it might be possible to actually use the
target file's pagecache for the mapping.  I do something of that sort for
shared-writable mappings on ramfs files (to support POSIX SHM and SYSV SHM).

The downside of allocating all these extra VMAs is that, of course, it takes up
more memory, though that may not be too bad, especially if it's at the gain of
additional consistency with the MM code.

However, consistency isn't for the most part a real issue.  As I see it,
drivers and filesystems should not concern themselves with anything other than
the VMA they're given, and so it doesn't matter if these are shared or not.

That brings us on to the problem with SYSV SHM which keeps an attachment count
that the VMA mmap(), open() and release() ops manipulate.  This means that the
nattch count comes out wrong on NOMMU systems.  Note that on MMU systems, doing
a munmap() in the middle of an attached region will *also* break the nattch
count, though this is self-correcting.

Another way of dealing with the nattch count on NOMMU systems is to do it
through the VML list, but that then needs more special casing in the SHM driver
and perhaps others.



  parent reply	other threads:[~2007-03-09 14:15 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-16 13:34 [PATCH 0/4] coredump: core dump masking support v3 Kawai, Hidehiro
2007-02-16 13:39 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro
2007-02-16 13:40 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro
2007-02-16 13:41 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro
2007-02-16 13:42 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro
2007-02-16 15:05 ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory David Howells
2007-02-16 16:50   ` Robin Holt
2007-02-16 20:09   ` David Howells
2007-03-02 16:55     ` Hugh Dickins
2007-03-03 14:10     ` David Howells
2007-03-05 19:04       ` Hugh Dickins
2007-03-06 18:13       ` David Howells
2007-03-09 14:12       ` David Howells [this message]
2007-03-12 20:50         ` Move to unshared VMAs in NOMMU mode? Robin Getz
2007-03-13 10:14         ` David Howells
2007-03-15 21:20         ` Hugh Dickins
2007-03-15 22:47         ` David Howells
2007-03-19 19:23           ` Eric W. Biederman
2007-03-20 11:06           ` David Howells
2007-03-20 16:48             ` Eric W. Biederman
2007-03-20 19:12             ` David Howells
2007-03-20 19:51             ` David Howells
2007-03-21 16:11             ` David Howells
2007-03-03 14:25     ` [PATCH] NOMMU: Hide vm_mm in NOMMU mode David Howells
2007-02-20  9:45   ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory Kawai, Hidehiro
2007-02-20 10:58   ` David Howells
2007-02-20 12:56     ` Robin Holt
2007-02-21 10:00     ` Kawai, Hidehiro
2007-02-21 11:33     ` David Howells
2007-02-21 11:54       ` Robin Holt
2007-02-22  5:33         ` Kawai, Hidehiro
2007-02-22 11:47         ` David Howells
2007-02-16 15:08 ` [PATCH 0/4] coredump: core dump masking support v3 David Howells
2007-02-20  9:48   ` Kawai, Hidehiro
2007-02-24  3:32 ` Markus Gutschke
2007-02-24 11:39   ` Pavel Machek
2007-03-01 12:35   ` Kawai, Hidehiro
2007-03-01 18:16     ` Markus Gutschke
2007-02-24 10:02 ` David Howells
2007-02-24 20:01   ` Markus Gutschke
2007-02-26 11:49   ` David Howells
2007-02-26 12:01     ` Pavel Machek
2007-02-26 12:42     ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).