LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: Adam Litke <agl@us.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Arjan van de Ven <arjan@infradead.org>,
Christoph Hellwig <hch@infradead.org>,
Ken Chen <kenchen@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
Date: Wed, 21 Mar 2007 17:51:23 +1100 [thread overview]
Message-ID: <4600D5EB.90507@yahoo.com.au> (raw)
In-Reply-To: <20070321054102.GF2986@holomorphy.com>
William Lee Irwin III wrote:
> William Lee Irwin III wrote:
>
>>>ISTR potential ppc64 users coming out of the woodwork for something I
>>>didn't recognize the name of, but I may be confusing that with your
>>>patch. I can implement additional users (and useful ones at that)
>>>needing this in particular if desired.
>
>
> On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
>
>>Yes I would be interested in seeing useful additional users of this
>>that cannot use our regular virtual memory, before making it a general
>>thing.
>>I just don't want to see proliferation of these things, if possible.
>
>
> I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
> in a few weeks I can start up on the first two of the bunch.
Care to give us a hint? :)
> William Lee Irwin III wrote:
>
>>>Two fault handling methods callbacks raise an eyebrow over here at least.
>>>I was vaguely hoping for unification of the fault handling callbacks.
>
>
> On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
>
>>I don't know if it would be so clean to do that as they are at different
>>levels.
>>Adam's fault is before the VM translation (and bypasses it), and mine is
>>after.
>
>
> Not much of a VM translation; it's just a lookup through the software
> mocked-up structures on everything save i386, x86_64, and some m68k where
> they're the same thing only with hardware walkers (ISTR ia64's being
> firmware a la Alpha despite the "HPW" name, though I could be wrong)
Well the vma+pagetables *are* our VM translation data structure. It is
a good data structure. The Gelato/UNSW guys experimenting with changing
this have basically said they haven't yet got anything that beats it.
I would be opposed to anything that bypasses that unless a) it is not
applicable to the VM as a whole, and b) it is really worth it
(hugepages was a reasonable exception).
> reliant on them. The drivers/etc. could just as easily use helper
> functions to carry out the lookup, thereby accomplishing the
> unification. There's nothing particularly fundamental about a pte
> lookup.
Yeah you could, but it looks back to front to me.
The VM tells the filesystem that the machine took a fault at virtual
address X, then the filesystem asks the VM what pgoff that is, then
tells the VM to install the corresponding page to vaddr X.
With my ->fault, the VM asks the filesystem to give the page that
corresponds to vaddr X, then installs it into that vaddr.
> Normal arches that do software TLB refill could just as easily
> consult the radix trees dangled off struct address_space or any old
> data structure floating around the kernel with enough information to
> translate user virtual addresses to the physical addresses they need to
> fill the TLB with, and there are other kernels that literally do things
> like that.
Sure it *could* be done, but it may not be very nice, given Linux's
design. And you definitely need _something_ other than just the
pagecache radix-tree, because the VM needs to know who maps the page.
So if, for your backing store, you use a small hash table and evict old
entries like powerpc, you'll constantly be faulting in and out pages
from the VM's high level view of the address space. That isn't a really
cheap operation. It takes at least:
read_lock_irq(mapping->tree_lock);
radix_tree_lookup()
read_unlock_irq(mapping->tree_lock);
lock_page()
atomic_add(page->_count)
atomic_add(page->_mapcount)
unlock_page()
atomic_add_negative(page->_mapcount)
atomic_dec_and_test(page->_count)
Compared to our current page table walk which is just a single locked
op + barrier for the spinlock + radix tree walk.
If you had a very large hash table (ia64 long mode, maybe?), then you
may have slightly fewer high level faults, but range based operations
are going to take a whole lot of cache misses, aren't they? Especially
for small processes.
Not that I wouldn't be happy to be proven wrong, but I don't think it
should be something that sneaks in under these pagetable operations.
IMO.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
next prev parent reply other threads:[~2007-03-21 6:51 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-19 20:05 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Adam Litke
2007-03-19 20:05 ` [PATCH 1/7] Introduce the pagetable_operations and associated helper macros Adam Litke
2007-03-20 23:24 ` Dave Hansen
2007-03-21 14:50 ` Adam Litke
2007-03-21 15:05 ` Arjan van de Ven
2007-03-21 4:18 ` Nick Piggin
2007-03-21 4:52 ` William Lee Irwin III
2007-03-21 5:07 ` Nick Piggin
2007-03-21 5:41 ` William Lee Irwin III
2007-03-21 6:51 ` Nick Piggin [this message]
2007-03-21 7:36 ` Nick Piggin
2007-03-21 10:46 ` William Lee Irwin III
2007-03-21 15:17 ` Adam Litke
2007-03-21 16:00 ` Christoph Hellwig
2007-03-21 23:03 ` Nick Piggin
2007-03-21 23:02 ` Nick Piggin
2007-03-21 23:32 ` William Lee Irwin III
2007-03-19 20:05 ` [PATCH 2/7] copy_vma for hugetlbfs Adam Litke
2007-03-19 20:05 ` [PATCH 3/7] pin_pages for hugetlb Adam Litke
2007-03-19 20:05 ` [PATCH 4/7] unmap_page_range " Adam Litke
2007-03-20 23:27 ` Dave Hansen
2007-03-19 20:05 ` [PATCH 5/7] change_protection " Adam Litke
2007-03-19 20:06 ` [PATCH 6/7] free_pgtable_range " Adam Litke
2007-03-19 20:06 ` [PATCH 7/7] hugetlbfs fault handler Adam Litke
2007-03-20 23:50 ` [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Dave Hansen
2007-03-21 1:17 ` William Lee Irwin III
2007-03-21 15:55 ` Hugh Dickins
2007-03-21 16:01 ` Christoph Hellwig
2007-03-21 19:43 ` pagetable_ops: Hugetlb character device example Adam Litke
2007-03-21 19:51 ` Valdis.Kletnieks
2007-03-21 20:26 ` Adam Litke
2007-03-21 22:26 ` William Lee Irwin III
2007-03-21 22:53 ` Matt Mackall
2007-03-21 23:35 ` William Lee Irwin III
2007-03-22 0:31 ` Matt Mackall
2007-03-22 10:38 ` Christoph Hellwig
2007-03-22 15:42 ` Mel Gorman
2007-03-22 18:15 ` Christoph Hellwig
2007-03-23 14:57 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2007-02-19 18:31 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API Adam Litke
2007-02-19 18:31 ` [PATCH 1/7] Introduce the pagetable_operations and associated helper macros Adam Litke
2007-02-19 18:41 ` Arjan van de Ven
2007-02-19 19:31 ` Adam Litke
2007-02-19 19:48 ` William Lee Irwin III
2007-02-19 22:29 ` Christoph Hellwig
2007-02-20 15:50 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4600D5EB.90507@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=hch@infradead.org \
--cc=kenchen@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
--cc=wli@holomorphy.com \
--subject='Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).