LKML Archive on
help / color / mirror / Atom feed
From: William Lee Irwin III <>
To: Nick Piggin <>
Cc: Adam Litke <>,
	Andrew Morton <>,
	Arjan van de Ven <>,
	Christoph Hellwig <>,
	Ken Chen <>,,,
	Linus Torvalds <>
Subject: Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
Date: Wed, 21 Mar 2007 03:46:49 -0700	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

William Lee Irwin III wrote:
>> I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
>> in a few weeks I can start up on the first two of the bunch.

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Care to give us a hint? :)

The first is something DISM-like. I've not made up my mind on the
second, but the shopping catalogue of feature requests I've done
nothing about for some time that want this is long.

William Lee Irwin III wrote:
>> Not much of a VM translation; it's just a lookup through the
>> software mocked-up structures on everything save i386, x86_64, and
>> some m68k where they're the same thing only with hardware walkers
>> (ISTR ia64's being firmware a la Alpha despite the "HPW" name,
>> though I could be wrong)

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Well the vma+pagetables *are* our VM translation data structure. It is
> a good data structure. The Gelato/UNSW guys experimenting with changing
> this have basically said they haven't yet got anything that beats it.
> I would be opposed to anything that bypasses that unless a) it is not
> applicable to the VM as a whole, and b) it is really worth it
> (hugepages was a reasonable exception).

Maybe anticipating the conventional Linux approach to this wasn't as
difficult as I supposed. ;)

William Lee Irwin III wrote:
>> reliant on them. The drivers/etc. could just as easily use helper
>> functions to carry out the lookup, thereby accomplishing the
>> unification. There's nothing particularly fundamental about a pte
>> lookup.

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Yeah you could, but it looks back to front to me.
> The VM tells the filesystem that the machine took a fault at virtual
> address X, then the filesystem asks the VM what pgoff that is, then
> tells the VM to install the corresponding page to vaddr X.
> With my ->fault, the VM asks the filesystem to give the page that
> corresponds to vaddr X, then installs it into that vaddr.

I'm aware of what is now done and the minor modification accomplished
by your ->fault(). Maybe I've even written something like this before
that I never posted. It's obvious what I'm on about and that my
thoughts here are too divergent to fly. Others should chime in with
more Linux-native ideas about what's to be done here.

William Lee Irwin III wrote:
>> Normal arches that do software TLB refill could just as easily
>> consult the radix trees dangled off struct address_space or any old
>> data structure floating around the kernel with enough information to
>> translate user virtual addresses to the physical addresses they need to
>> fill the TLB with, and there are other kernels that literally do things
>> like that.

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Sure it *could* be done, but it may not be very nice, given Linux's
> design. And you definitely need _something_ other than just the
> pagecache radix-tree, because the VM needs to know who maps the page.
> So if, for your backing store, you use a small hash table and evict old
> entries like powerpc, you'll constantly be faulting in and out pages
> from the VM's high level view of the address space. That isn't a really
> cheap operation. It takes at least:
[long list of locking operations snipped]
> Compared to our current page table walk which is just a single locked
> op + barrier for the spinlock + radix tree walk.
> If you had a very large hash table (ia64 long mode, maybe?), then you
> may have slightly fewer high level faults, but range based operations
> are going to take a whole lot of cache misses, aren't they? Especially
> for small processes.
> Not that I wouldn't be happy to be proven wrong, but I don't think it
> should be something that sneaks in under these pagetable operations.
> IMO.

I'll presume that was not for my benefit; if so, it was superfluous.

The example I gave was to show how far things could diverge from Linux'
conventions. Every single locking operation cited for Linux didn't
apply to the kernel I was thinking of due to its lockless pagecache
analogue, its lack of a direct equivalent of struct page, and its use
of different lifetime-bounding protocols from reference counting.
Things like page replacement didn't rely on things that would disturb
all that. It all worked out quite well for that kernel. So not only
can it be done other ways, but those ways are indeed efficient.

It should be clear from the above that retrofitting Linux to do similar
is effectively impossible. (Well, if you think you can pull off removing
struct page in favor of no direct equivalent and bounding the lifetimes
of page-sized chunks of memory by shooting down all references using
knowledge of who could possibly be hanging onto them in Linux, feel
free to attempt such a retrofit, and I'll send you a case of Scotch
whisky if you can get it to boot and run a major database benchmark
without crashing regardless of whether it's merged.)

In any event, let's not talk too much at cross-purposes. I'm deferring
to others on this, as I said. Someone else will doubtless come at this
from a direction that gibes better with Linux-native conventions.

-- wli

  parent reply	other threads:[~2007-03-21 10:53 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-19 20:05 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Adam Litke
2007-03-19 20:05 ` [PATCH 1/7] Introduce the pagetable_operations and associated helper macros Adam Litke
2007-03-20 23:24   ` Dave Hansen
2007-03-21 14:50     ` Adam Litke
2007-03-21 15:05       ` Arjan van de Ven
2007-03-21  4:18   ` Nick Piggin
2007-03-21  4:52     ` William Lee Irwin III
2007-03-21  5:07       ` Nick Piggin
2007-03-21  5:41         ` William Lee Irwin III
2007-03-21  6:51           ` Nick Piggin
2007-03-21  7:36             ` Nick Piggin
2007-03-21 10:46             ` William Lee Irwin III [this message]
2007-03-21 15:17     ` Adam Litke
2007-03-21 16:00       ` Christoph Hellwig
2007-03-21 23:03         ` Nick Piggin
2007-03-21 23:02       ` Nick Piggin
2007-03-21 23:32         ` William Lee Irwin III
2007-03-19 20:05 ` [PATCH 2/7] copy_vma for hugetlbfs Adam Litke
2007-03-19 20:05 ` [PATCH 3/7] pin_pages for hugetlb Adam Litke
2007-03-19 20:05 ` [PATCH 4/7] unmap_page_range " Adam Litke
2007-03-20 23:27   ` Dave Hansen
2007-03-19 20:05 ` [PATCH 5/7] change_protection " Adam Litke
2007-03-19 20:06 ` [PATCH 6/7] free_pgtable_range " Adam Litke
2007-03-19 20:06 ` [PATCH 7/7] hugetlbfs fault handler Adam Litke
2007-03-20 23:50 ` [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Dave Hansen
2007-03-21  1:17 ` William Lee Irwin III
2007-03-21 15:55 ` Hugh Dickins
2007-03-21 16:01   ` Christoph Hellwig
2007-03-21 19:43 ` pagetable_ops: Hugetlb character device example Adam Litke
2007-03-21 19:51   ` Valdis.Kletnieks
2007-03-21 20:26     ` Adam Litke
2007-03-21 22:26     ` William Lee Irwin III
2007-03-21 22:53       ` Matt Mackall
2007-03-21 23:35         ` William Lee Irwin III
2007-03-22  0:31           ` Matt Mackall
2007-03-22 10:38   ` Christoph Hellwig
2007-03-22 15:42     ` Mel Gorman
2007-03-22 18:15       ` Christoph Hellwig
2007-03-23 14:57         ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2007-02-19 18:31 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API Adam Litke
2007-02-19 18:31 ` [PATCH 1/7] Introduce the pagetable_operations and associated helper macros Adam Litke
2007-02-19 18:41   ` Arjan van de Ven
2007-02-19 19:31     ` Adam Litke
2007-02-19 19:48   ` William Lee Irwin III
2007-02-19 22:29   ` Christoph Hellwig
2007-02-20 15:50     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).