LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Adam Litke <agl@us.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>,
William Lee Irwin III <wli@holomorphy.com>,
Christoph Hellwig <hch@infradead.org>,
Ken Chen <kenchen@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: pagetable_ops: Hugetlb character device example
Date: Wed, 21 Mar 2007 14:43:48 -0500 [thread overview]
Message-ID: <1174506228.21684.41.camel@localhost.localdomain> (raw)
In-Reply-To: <20070319200502.17168.17175.stgit@localhost.localdomain>
The main reason I am advocating a set of pagetable_operations is to
enable the development of a new hugetlb interface. During the hugetlb
BOFS at OLS last year, we talked about a character device that would
behave like /dev/zero. Many of the people were talking about how they
just wanted to create MAP_PRIVATE hugetlb mappings without all the fuss
about the hugetlbfs filesystem. /dev/zero is a familiar interface for
getting anonymous memory so bringing that model to huge pages would make
programming for anonymous huge pages easier.
The pagetable_operations API opens up possibilities to do some
additional (and completely sane) things. For example, I have a patch
that alters the character device code below to make use of a hugetlb
ZERO_PAGE. This eliminates almost all the up-front fault time, allowing
pages to be COW'ed only when first written to. We cannot do things like
this with hugetlbfs anymore because we have a set of complex semantics
to preserve.
The following patch is an example of what a simple pagetable_operations
consumer could look like. It does depend on some other cleanups I am
working on (removal of is_file_hugepages(), ...hugetlbfs/inode.c vs.
mm/hugetlb.c separation, etc). So it is unlikely to apply to any trees
you may have. I do think it makes a useful illustration of what
legitimate things can be done with a pagetable_operations interface.
commit be72df1c616fb662693a8d4410ce3058f20c71f3
Author: Adam Litke <agl@us.ibm.com>
Date: Tue Feb 13 14:18:21 2007 -0800
diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index fc11063..c5e755b 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_IPMI_HANDLER) += ipmi/
obj-$(CONFIG_HANGCHECK_TIMER) += hangcheck-timer.o
obj-$(CONFIG_TCG_TPM) += tpm/
+obj-$(CONFIG_HUGETLB_PAGE) += page.o
# Files generated that shall be removed upon make clean
clean-files := consolemap_deftbl.c defkeymap.c
diff --git a/drivers/char/page.c b/drivers/char/page.c
new file mode 100644
index 0000000..e903028
--- /dev/null
+++ b/drivers/char/page.c
@@ -0,0 +1,133 @@
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/init.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/pagemap.h>
+#include <linux/hugetlb.h>
+
+static const struct {
+ unsigned int minor;
+ char *name;
+ umode_t mode;
+} devlist[] = {
+ {1, "page-huge", S_IRUGO | S_IWUGO},
+};
+
+static struct page *page_nopage(struct vm_area_struct *vma,
+ unsigned long address, int *unused)
+{
+ BUG();
+ return NULL;
+}
+
+static struct vm_operations_struct page_vm_ops = {
+ .nopage = page_nopage,
+};
+
+static int page_fault(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, int write_access)
+{
+ pte_t *ptep;
+ pte_t entry, new_entry;
+ int ret;
+ static DEFINE_MUTEX(hugetlb_instantiation_mutex);
+
+ ptep = huge_pte_alloc(mm, address);
+ if (!ptep)
+ return VM_FAULT_OOM;
+
+ mutex_lock(&hugetlb_instantiation_mutex);
+ entry = *ptep;
+ if (pte_none(entry)) {
+ struct page *page;
+
+ page = alloc_huge_page(vma, address);
+ if (!page)
+ return VM_FAULT_OOM;
+ clear_huge_page(page, address);
+
+ ret = VM_FAULT_MINOR;
+ spin_lock(&mm->page_table_lock);
+ if (!pte_none(*ptep))
+ goto out;
+ add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
+ new_entry = make_huge_pte(vma, page, 0);
+ set_huge_pte_at(mm, address, ptep, new_entry);
+ goto out;
+ }
+
+ spin_lock(&mm->page_table_lock);
+ /* Check for a racing update before calling hugetlb_cow */
+ if (likely(pte_same(entry, *ptep)))
+ if (write_access && !pte_write(entry))
+ ret = hugetlb_cow(mm, vma, address, ptep, entry);
+
+out:
+ spin_unlock(&mm->page_table_lock);
+ mutex_unlock(&hugetlb_instantiation_mutex);
+ return ret;
+}
+
+
+static struct pagetable_operations_struct page_pagetable_ops = {
+ .copy_vma = copy_hugetlb_page_range,
+ .pin_pages = follow_hugetlb_page,
+ .unmap_page_range = unmap_hugepage_range,
+ .change_protection = hugetlb_change_protection,
+ .free_pgtable_range = hugetlb_free_pgd_range,
+ .fault = page_fault,
+};
+
+static int page_mmap(struct file * file, struct vm_area_struct *vma)
+{
+ if (vma->vm_flags & VM_SHARED)
+ return -EINVAL;
+
+ if (vma->vm_pgoff)
+ return -EINVAL;
+
+ if (vma->vm_start & ~HPAGE_MASK)
+ return -EINVAL;
+
+ if (vma->vm_end & ~HPAGE_MASK)
+ return -EINVAL;
+
+ if (vma->vm_end - vma->vm_start < HPAGE_SIZE)
+ return -EINVAL;
+
+ vma->vm_flags |= (VM_HUGETLB | VM_RESERVED);
+ vma->vm_ops = &page_vm_ops;
+ vma->pagetable_ops = &page_pagetable_ops;
+
+ return 0;
+}
+
+const struct file_operations page_file_operations = {
+ .mmap = page_mmap,
+ .get_unmapped_area = hugetlb_get_unmapped_area,
+ .prepare_unmapped_area = prepare_hugepage_range,
+};
+
+static struct class *page_class;
+
+static int __init chr_dev_init(void)
+{
+ int major, i;
+
+ printk("Initializing page devices...");
+ major = register_chrdev(0, "page", &page_file_operations);
+ if (major <= 0)
+ printk("failed\n");
+ else
+ printk("(%i:0)\n", major);
+
+ page_class = class_create(THIS_MODULE, "page");
+ for (i = 0; i < ARRAY_SIZE(devlist); i++)
+ class_device_create(page_class, NULL,
+ MKDEV(major, devlist[i].minor),
+ NULL, devlist[i].name);
+
+ return 0;
+}
+fs_initcall(chr_dev_init);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4fc0bca..edd4944 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -590,6 +590,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
BUG_ON(!has_pt_op(vma, fault));
+ BUG_ON(!has_pt_op(vma,fault));
spin_lock(&mm->page_table_lock);
while (vaddr < vma->vm_end && remainder) {
pte_t *pte;
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
next prev parent reply other threads:[~2007-03-21 19:43 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-19 20:05 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Adam Litke
2007-03-19 20:05 ` [PATCH 1/7] Introduce the pagetable_operations and associated helper macros Adam Litke
2007-03-20 23:24 ` Dave Hansen
2007-03-21 14:50 ` Adam Litke
2007-03-21 15:05 ` Arjan van de Ven
2007-03-21 4:18 ` Nick Piggin
2007-03-21 4:52 ` William Lee Irwin III
2007-03-21 5:07 ` Nick Piggin
2007-03-21 5:41 ` William Lee Irwin III
2007-03-21 6:51 ` Nick Piggin
2007-03-21 7:36 ` Nick Piggin
2007-03-21 10:46 ` William Lee Irwin III
2007-03-21 15:17 ` Adam Litke
2007-03-21 16:00 ` Christoph Hellwig
2007-03-21 23:03 ` Nick Piggin
2007-03-21 23:02 ` Nick Piggin
2007-03-21 23:32 ` William Lee Irwin III
2007-03-19 20:05 ` [PATCH 2/7] copy_vma for hugetlbfs Adam Litke
2007-03-19 20:05 ` [PATCH 3/7] pin_pages for hugetlb Adam Litke
2007-03-19 20:05 ` [PATCH 4/7] unmap_page_range " Adam Litke
2007-03-20 23:27 ` Dave Hansen
2007-03-19 20:05 ` [PATCH 5/7] change_protection " Adam Litke
2007-03-19 20:06 ` [PATCH 6/7] free_pgtable_range " Adam Litke
2007-03-19 20:06 ` [PATCH 7/7] hugetlbfs fault handler Adam Litke
2007-03-20 23:50 ` [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Dave Hansen
2007-03-21 1:17 ` William Lee Irwin III
2007-03-21 15:55 ` Hugh Dickins
2007-03-21 16:01 ` Christoph Hellwig
2007-03-21 19:43 ` Adam Litke [this message]
2007-03-21 19:51 ` pagetable_ops: Hugetlb character device example Valdis.Kletnieks
2007-03-21 20:26 ` Adam Litke
2007-03-21 22:26 ` William Lee Irwin III
2007-03-21 22:53 ` Matt Mackall
2007-03-21 23:35 ` William Lee Irwin III
2007-03-22 0:31 ` Matt Mackall
2007-03-22 10:38 ` Christoph Hellwig
2007-03-22 15:42 ` Mel Gorman
2007-03-22 18:15 ` Christoph Hellwig
2007-03-23 14:57 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1174506228.21684.41.camel@localhost.localdomain \
--to=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=hch@infradead.org \
--cc=kenchen@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=wli@holomorphy.com \
--subject='Re: pagetable_ops: Hugetlb character device example' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).