LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Adam Litke <agl@us.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>,
	William Lee Irwin III <wli@holomorphy.com>,
	Christoph Hellwig <hch@infradead.org>,
	Ken Chen <kenchen@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: pagetable_ops: Hugetlb character device example
Date: Wed, 21 Mar 2007 14:43:48 -0500	[thread overview]
Message-ID: <1174506228.21684.41.camel@localhost.localdomain> (raw)
In-Reply-To: <20070319200502.17168.17175.stgit@localhost.localdomain>

The main reason I am advocating a set of pagetable_operations is to
enable the development of a new hugetlb interface.  During the hugetlb
BOFS at OLS last year, we talked about a character device that would
behave like /dev/zero.  Many of the people were talking about how they
just wanted to create MAP_PRIVATE hugetlb mappings without all the fuss
about the hugetlbfs filesystem.  /dev/zero is a familiar interface for
getting anonymous memory so bringing that model to huge pages would make
programming for anonymous huge pages easier.

The pagetable_operations API opens up possibilities to do some
additional (and completely sane) things.  For example, I have a patch
that alters the character device code below to make use of a hugetlb
ZERO_PAGE.  This eliminates almost all the up-front fault time, allowing
pages to be COW'ed only when first written to.  We cannot do things like
this with hugetlbfs anymore because we have a set of complex semantics
to preserve.

The following patch is an example of what a simple pagetable_operations
consumer could look like.  It does depend on some other cleanups I am
working on (removal of is_file_hugepages(), ...hugetlbfs/inode.c vs.
mm/hugetlb.c separation, etc).  So it is unlikely to apply to any trees
you may have.  I do think it makes a useful illustration of what
legitimate things can be done with a pagetable_operations interface.

commit be72df1c616fb662693a8d4410ce3058f20c71f3
Author: Adam Litke <agl@us.ibm.com>
Date:   Tue Feb 13 14:18:21 2007 -0800

diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index fc11063..c5e755b 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_IPMI_HANDLER)	+= ipmi/
 
 obj-$(CONFIG_HANGCHECK_TIMER)	+= hangcheck-timer.o
 obj-$(CONFIG_TCG_TPM)		+= tpm/
+obj-$(CONFIG_HUGETLB_PAGE)	+= page.o
 
 # Files generated that shall be removed upon make clean
 clean-files := consolemap_deftbl.c defkeymap.c
diff --git a/drivers/char/page.c b/drivers/char/page.c
new file mode 100644
index 0000000..e903028
--- /dev/null
+++ b/drivers/char/page.c
@@ -0,0 +1,133 @@
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/init.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/pagemap.h>
+#include <linux/hugetlb.h>
+
+static const struct {
+	unsigned int    minor;
+	char            *name;
+	umode_t         mode;
+} devlist[] = {
+	{1, "page-huge", S_IRUGO | S_IWUGO},
+};
+
+static struct page *page_nopage(struct vm_area_struct *vma,
+			unsigned long address, int *unused)
+{
+	BUG();
+	return NULL;
+}
+
+static struct vm_operations_struct page_vm_ops = {
+	.nopage	= page_nopage,
+};
+
+static int page_fault(struct mm_struct *mm, struct vm_area_struct *vma,
+			unsigned long address, int write_access)
+{
+	pte_t *ptep;
+	pte_t entry, new_entry;
+	int ret;
+	static DEFINE_MUTEX(hugetlb_instantiation_mutex);
+
+	ptep = huge_pte_alloc(mm, address);
+	if (!ptep)
+		return VM_FAULT_OOM;
+
+	mutex_lock(&hugetlb_instantiation_mutex);
+	entry = *ptep;
+	if (pte_none(entry)) {
+		struct page *page;
+
+		page = alloc_huge_page(vma, address);
+		if (!page)
+			return VM_FAULT_OOM;
+		clear_huge_page(page, address);
+
+		ret = VM_FAULT_MINOR;
+		spin_lock(&mm->page_table_lock);
+		if (!pte_none(*ptep))
+			goto out;
+		add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
+		new_entry = make_huge_pte(vma, page, 0);
+		set_huge_pte_at(mm, address, ptep, new_entry);
+		goto out;
+	}
+
+	spin_lock(&mm->page_table_lock);
+	/* Check for a racing update before calling hugetlb_cow */
+	if (likely(pte_same(entry, *ptep)))
+		if (write_access && !pte_write(entry))
+			ret = hugetlb_cow(mm, vma, address, ptep, entry);
+
+out:
+	spin_unlock(&mm->page_table_lock);
+	mutex_unlock(&hugetlb_instantiation_mutex);
+	return ret;
+}
+
+
+static struct pagetable_operations_struct page_pagetable_ops = {
+	.copy_vma		= copy_hugetlb_page_range,
+	.pin_pages		= follow_hugetlb_page,
+	.unmap_page_range	= unmap_hugepage_range,
+	.change_protection	= hugetlb_change_protection,
+	.free_pgtable_range	= hugetlb_free_pgd_range,
+	.fault			= page_fault,
+};
+
+static int page_mmap(struct file * file, struct vm_area_struct *vma)
+{
+	if (vma->vm_flags & VM_SHARED)
+		return -EINVAL;
+
+	if (vma->vm_pgoff)
+		return -EINVAL;
+
+	if (vma->vm_start & ~HPAGE_MASK)
+		return -EINVAL;
+
+	if (vma->vm_end & ~HPAGE_MASK)
+		return -EINVAL;
+
+	if (vma->vm_end - vma->vm_start < HPAGE_SIZE)
+		return -EINVAL;
+
+	vma->vm_flags |= (VM_HUGETLB | VM_RESERVED);
+	vma->vm_ops = &page_vm_ops;
+	vma->pagetable_ops = &page_pagetable_ops;
+
+	return 0;
+}
+
+const struct file_operations page_file_operations = {
+	.mmap			= page_mmap,
+	.get_unmapped_area	= hugetlb_get_unmapped_area,
+	.prepare_unmapped_area	= prepare_hugepage_range,
+};
+
+static struct class *page_class;
+
+static int __init chr_dev_init(void)
+{
+	int major, i;
+
+	printk("Initializing page devices...");
+	major = register_chrdev(0, "page", &page_file_operations);
+	if (major <= 0)
+		printk("failed\n");
+	else
+		printk("(%i:0)\n", major);
+
+	page_class = class_create(THIS_MODULE, "page");
+	for (i = 0; i < ARRAY_SIZE(devlist); i++)
+		class_device_create(page_class, NULL,
+			MKDEV(major, devlist[i].minor),
+			NULL, devlist[i].name);
+
+	return 0;
+}
+fs_initcall(chr_dev_init);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4fc0bca..edd4944 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -590,6 +590,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	BUG_ON(!has_pt_op(vma, fault));
 
+	BUG_ON(!has_pt_op(vma,fault));
 	spin_lock(&mm->page_table_lock);
 	while (vaddr < vma->vm_end && remainder) {
 		pte_t *pte;

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center


  parent reply	other threads:[~2007-03-21 19:43 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-19 20:05 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Adam Litke
2007-03-19 20:05 ` [PATCH 1/7] Introduce the pagetable_operations and associated helper macros Adam Litke
2007-03-20 23:24   ` Dave Hansen
2007-03-21 14:50     ` Adam Litke
2007-03-21 15:05       ` Arjan van de Ven
2007-03-21  4:18   ` Nick Piggin
2007-03-21  4:52     ` William Lee Irwin III
2007-03-21  5:07       ` Nick Piggin
2007-03-21  5:41         ` William Lee Irwin III
2007-03-21  6:51           ` Nick Piggin
2007-03-21  7:36             ` Nick Piggin
2007-03-21 10:46             ` William Lee Irwin III
2007-03-21 15:17     ` Adam Litke
2007-03-21 16:00       ` Christoph Hellwig
2007-03-21 23:03         ` Nick Piggin
2007-03-21 23:02       ` Nick Piggin
2007-03-21 23:32         ` William Lee Irwin III
2007-03-19 20:05 ` [PATCH 2/7] copy_vma for hugetlbfs Adam Litke
2007-03-19 20:05 ` [PATCH 3/7] pin_pages for hugetlb Adam Litke
2007-03-19 20:05 ` [PATCH 4/7] unmap_page_range " Adam Litke
2007-03-20 23:27   ` Dave Hansen
2007-03-19 20:05 ` [PATCH 5/7] change_protection " Adam Litke
2007-03-19 20:06 ` [PATCH 6/7] free_pgtable_range " Adam Litke
2007-03-19 20:06 ` [PATCH 7/7] hugetlbfs fault handler Adam Litke
2007-03-20 23:50 ` [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Dave Hansen
2007-03-21  1:17 ` William Lee Irwin III
2007-03-21 15:55 ` Hugh Dickins
2007-03-21 16:01   ` Christoph Hellwig
2007-03-21 19:43 ` Adam Litke [this message]
2007-03-21 19:51   ` pagetable_ops: Hugetlb character device example Valdis.Kletnieks
2007-03-21 20:26     ` Adam Litke
2007-03-21 22:26     ` William Lee Irwin III
2007-03-21 22:53       ` Matt Mackall
2007-03-21 23:35         ` William Lee Irwin III
2007-03-22  0:31           ` Matt Mackall
2007-03-22 10:38   ` Christoph Hellwig
2007-03-22 15:42     ` Mel Gorman
2007-03-22 18:15       ` Christoph Hellwig
2007-03-23 14:57         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1174506228.21684.41.camel@localhost.localdomain \
    --to=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=hch@infradead.org \
    --cc=kenchen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=wli@holomorphy.com \
    --subject='Re: pagetable_ops: Hugetlb character device example' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).