LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Hans Rosenfeld" <hans.rosenfeld@amd.com>
To: "Dave Hansen" <haveblue@us.ibm.com>
Cc: "Matt Mackall" <mpm@selenic.com>,
	linux-kernel@vger.kernel.org, "Adam Litke" <aglitke@gmail.com>,
	"nacc" <nacc@linux.vnet.ibm.com>,
	"Jon Tollefson" <kniht@linux.vnet.ibm.com>
Subject: Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size
Date: Mon, 25 Feb 2008 13:09:51 +0100	[thread overview]
Message-ID: <20080225120951.GA5963@escobedo.amd.com> (raw)
In-Reply-To: <1203791461.11305.135.camel@nimitz.home.sr71.net>

On Sat, Feb 23, 2008 at 10:31:01AM -0800, Dave Hansen wrote:
> > > - 4 bits for the page size, with 0 meaning native page size (4k on x86,
> > >   8k on alpha, ...) and values 1-15 being specific to the architecture
> > >   (I used 1 for 2M, 2 for 4M and 3 for 1G for x86)
> 
> "Native page size" probably a bad idea.  ppc64 can use 64k or 4k for its
> "native" page size and has 16MB large pages (as well as some others).
> To make it even more confusing, you can have a 64k kernel page size with
> 4k mmu mappings!
> 
> That said, this is a decent idea as long as we know that nobody will
> ever have more than 16 page sizes.  

Then a better way to encode the page size would be returning the page
shift. This would need 6 bits instead of 4, but it would probably be
enough for any 64 bit architecture.


> > This is ok-ish, but I can't say I like it much. Especially the page size
> > field.
> > 
> > But I don't really have many ideas here. Perhaps having a bit saying
> > "this entry is really a continuation of the previous one". Then any page
> > size can be trivially represented. This might also make the code on both
> > sides simpler?

I don't like the idea of parsing thousands of entries just to find out
that I'm using a huge page. It would be much better to just get the page
size one way or the other in the first entry one reads.


> > > -static int add_to_pagemap(unsigned long addr, u64 pfn,
> > > +struct ppte {
> > > +	uint64_t paddr:58;
> > > +	uint64_t psize:4;
> > > +	uint64_t swap:1;
> > > +	uint64_t present:1;
> > > +};
> 
> It'd be nice to keep the current convention, which is to stay away from
> bitfields.

I like them, they make the code much more readable.


> > > +#ifdef CONFIG_X86
> > > +	if (pmd_huge(*pmd)) {
> > > +		struct ppte ppte = { 
> > > +			.paddr = pmd_pfn(*pmd) << PAGE_SHIFT,
> > > +			.psize = (HPAGE_SHIFT == 22 ?
> > > +				  PM_PSIZE_4M : PM_PSIZE_2M),
> > > +			.swap  = 0,
> > > +			.present = 1,
> > > +		};
> > > +
> > > +		for(; addr != end; addr += PAGE_SIZE) {
> > > +			err = add_to_pagemap(addr, ppte, pm);
> > > +			if (err)
> > > +				return err;
> > > +		}
> > > +	} else
> > > +#endif
> 
> It's great to make this support huge pages, but things like this
> probably need their own function.  Putting an #ifdef in the middle of
> here makes it a lot harder to read.  Just think of when powerpc, ia64
> and x86_64 get their grubby mitts in here. ;)

AFAIK the way huge pages are used on x86 differs much from other
architectures. While on x86 the address translation stops at some upper
table for a huge page, other architectures encode the page size in the
pte (at least the ones I looked at did). So pmd_huge() (and soon
pud_huge()) are very x86-specific and return just 0 on other archs, and
this code would be optimized away for them. All that would be necessary
for other archs is to eventually get the page size from the pte and put
it in the psize field.

The #ifdef could go away if pmd_pfn() was defined as 0 for !x86, it
wouldn't make sense to use it anyway.



-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown



  reply	other threads:[~2008-02-25 12:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-20 13:57 Hans Rosenfeld
2008-02-23  2:18 ` Matt Mackall
2008-02-23 18:31   ` Dave Hansen
2008-02-25 12:09     ` Hans Rosenfeld [this message]
2008-02-25 18:39       ` Dave Hansen
2008-02-26 20:25         ` Hans Rosenfeld
2008-02-27 17:44           ` Dave Hansen
2008-02-28 12:00             ` Hans Rosenfeld
2008-02-29  0:49               ` Dave Hansen
2008-02-29  1:15                 ` Matt Mackall
2008-02-29 22:30                   ` Dave Hansen
2008-02-29 22:46                     ` Matt Mackall
2008-03-05 16:00                       ` Hans Rosenfeld
2008-03-05 16:32                         ` Dave Hansen
2008-03-05 17:22                           ` Hans Rosenfeld
2008-02-23  8:06 ` Andrew Morton
2008-02-23 15:25   ` Matt Mackall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080225120951.GA5963@escobedo.amd.com \
    --to=hans.rosenfeld@amd.com \
    --cc=aglitke@gmail.com \
    --cc=haveblue@us.ibm.com \
    --cc=kniht@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpm@selenic.com \
    --cc=nacc@linux.vnet.ibm.com \
    --subject='Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).