LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2)
@ 2007-02-08 13:26 Nick Piggin
  2007-02-08 13:27 ` [patch 1/3] mm: fix PageUptodate memorder Nick Piggin
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Nick Piggin @ 2007-02-08 13:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management, Hugh Dickins, Nick Piggin,
	Benjamin Herrenschmidt, Linux Kernel

Still no independent confirmation as to whether this is a problem or not.
Updated some comments, added diffstats to patches, don't use __SetPageUptodate
as an internal page-flags.h private function.

I would like to eventually get an ack from Hugh regarding the anon memory
and especially swap side of the equation, and a glance from whoever put the
smp_wmb()s into the copy functions (Was it Ben H or Anton maybe?)

Thanks,
Nick

--
SuSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 1/3] mm: fix PageUptodate memorder
  2007-02-08 13:26 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Nick Piggin
@ 2007-02-08 13:27 ` Nick Piggin
  2007-02-08 13:27 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2007-02-08 13:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Linux Memory Management, Benjamin Herrenschmidt,
	Nick Piggin, Linux Kernel

After running SetPageUptodate, preceeding stores to the page contents to
actually bring it uptodate may not be ordered with the store to set the page
uptodate.

Therefore, another CPU which checks PageUptodate is true, then reads the
page contents can get stale data.

Fix this by ensuring SetPageUptodate is always called with the page locked
(except in the case of a new page that cannot be visible to other CPUs), and
requiring PageUptodate be checked only when the page is locked.

To facilitate lockless checks, SetPageUptodate contains an smp_wmb to order
preceeding stores before the store to page flags, and a new PageUptodate_NoLock
is introduced, which issues a smp_rmb after the page flags are loaded for the
test.

I'm still not sure that a DMA memory barrier is not required, however I think
the logical place to put such a barrier would be in the IO completion routines,
when they come back to tell us that they have succeeded. (Help? Anyone?)

One thing I like about it is that it unifies the anonymous page handling
with the rest of the page management, by marking anon pages as uptodate
when they _are_ uptodate, rather than when our implementation requires
that they be marked as such. Doing this let me get rid of the smp_wmb's
in the page copying functions, which were specially added for anonymous
pages for a closely related issue, always vaguely troubled me.

Convert core code and some filesystems to use PageUptodate_NoLock, just for
reference (a more complete patch follows).

Signed-off-by: Nick Piggin <npiggin@suse.de>

 fs/ext2/dir.c              |    2 -
 fs/namei.c                 |    2 -
 fs/partitions/check.c      |    2 -
 fs/splice.c                |    4 +--
 include/linux/highmem.h    |    4 ---
 include/linux/page-flags.h |   57 +++++++++++++++++++++++++++++++++++++++++----
 mm/filemap.c               |   28 +++++++++++++++-------
 mm/hugetlb.c               |    2 +
 mm/memory.c                |    9 +++----
 mm/page_io.c               |    2 -
 mm/swap_state.c            |    2 -
 mm/swapfile.c              |    2 -
 12 files changed, 86 insertions(+), 30 deletions(-)

Index: linux-2.6/include/linux/highmem.h
===================================================================
--- linux-2.6.orig/include/linux/highmem.h
+++ linux-2.6/include/linux/highmem.h
@@ -57,8 +57,6 @@ static inline void clear_user_highpage(s
 	void *addr = kmap_atomic(page, KM_USER0);
 	clear_user_page(addr, vaddr, page);
 	kunmap_atomic(addr, KM_USER0);
-	/* Make sure this page is cleared on other CPU's too before using it */
-	smp_wmb();
 }
 
 #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
@@ -108,8 +106,6 @@ static inline void copy_user_highpage(st
 	copy_user_page(vto, vfrom, vaddr, to);
 	kunmap_atomic(vfrom, KM_USER0);
 	kunmap_atomic(vto, KM_USER1);
-	/* Make sure this page is cleared on other CPU's too before using it */
-	smp_wmb();
 }
 
 #endif
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h
+++ linux-2.6/include/linux/page-flags.h
@@ -126,16 +126,63 @@
 #define ClearPageReferenced(page)	clear_bit(PG_referenced, &(page)->flags)
 #define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, &(page)->flags)
 
-#define PageUptodate(page)	test_bit(PG_uptodate, &(page)->flags)
-#ifdef CONFIG_S390
-static inline void SetPageUptodate(struct page *page)
+static inline int PageUptodate(struct page *page)
+{
+	WARN_ON(!PageLocked(page));
+	return test_bit(PG_uptodate, &(page)->flags);
+}
+
+/*
+ * PageUptodate to be used when not holding the page lock.
+ */
+static inline int PageUptodate_NoLock(struct page *page)
 {
+	int ret = test_bit(PG_uptodate, &(page)->flags);
+
+	/*
+	 * Must ensure that the data we read out of the page is loaded
+	 * _after_ we've loaded page->flags to check for PageUptodate.
+	 * See SetPageUptodate() for the other side of the story.
+	 */
+	smp_rmb();
+
+	return ret;
+}
+
+static inline void SetPageUptodate_nowarn(struct page *page)
+{
+#ifdef CONFIG_S390
 	if (!test_and_set_bit(PG_uptodate, &page->flags))
 		page_test_and_clear_dirty(page);
-}
 #else
-#define SetPageUptodate(page)	set_bit(PG_uptodate, &(page)->flags)
+	/*
+	 * Memory barrier must be issued before setting the PG_uptodate bit,
+	 * so all previous writes that served to bring the page uptodate are
+	 * visible before PageUptodate becomes true.
+	 *
+	 * S390 is guaranteed to have a barrier in the test_and_set operation
+	 * (see Documentation/atomic_ops.txt).
+	 *
+	 * This memory barrier should not need to provide ordering against
+	 * DMA writes into the page, because the IO completion should really
+	 * be doing that.
+	 */
+	smp_wmb();
+	set_bit(PG_uptodate, &(page)->flags);
 #endif
+}
+
+static inline void SetPageUptodate(struct page *page)
+{
+	WARN_ON(!PageLocked(page));
+	SetPageUptodate_nowarn(page);
+}
+
+static inline void SetNewPageUptodate(struct page *page)
+{
+	SetPageUptodate_nowarn(page);
+}
+
 #define ClearPageUptodate(page)	clear_bit(PG_uptodate, &(page)->flags)
 
 #define PageDirty(page)		test_bit(PG_dirty, &(page)->flags)
Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c
+++ linux-2.6/mm/hugetlb.c
@@ -443,6 +443,7 @@ static int hugetlb_cow(struct mm_struct 
 
 	spin_unlock(&mm->page_table_lock);
 	copy_huge_page(new_page, old_page, address, vma);
+	SetNewPageUptodate(new_page);
 	spin_lock(&mm->page_table_lock);
 
 	ptep = huge_pte_offset(mm, address & HPAGE_MASK);
@@ -506,6 +507,7 @@ retry:
 		} else
 			lock_page(page);
 	}
+	SetNewPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
 	size = i_size_read(mapping->host) >> HPAGE_SHIFT;
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1463,10 +1463,8 @@ static inline void cow_user_page(struct 
 			memset(kaddr, 0, PAGE_SIZE);
 		kunmap_atomic(kaddr, KM_USER0);
 		flush_dcache_page(dst);
-		return;
-
-	}
-	copy_user_highpage(dst, src, va, vma);
+	} else
+		copy_user_highpage(dst, src, va, vma);
 }
 
 /*
@@ -1579,6 +1577,7 @@ gotten:
 			goto oom;
 		cow_user_page(new_page, old_page, address, vma);
 	}
+	SetNewPageUptodate(new_page);
 
 	/*
 	 * Re-check the pte - we dropped the lock
@@ -2097,6 +2096,7 @@ static int do_anonymous_page(struct mm_s
 		page = alloc_zeroed_user_highpage(vma, address);
 		if (!page)
 			goto oom;
+		SetNewPageUptodate(page);
 
 		entry = mk_pte(page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
@@ -2203,6 +2203,7 @@ retry:
 			copy_user_highpage(page, new_page, address, vma);
 			page_cache_release(new_page);
 			new_page = page;
+			SetNewPageUptodate(new_page);
 			anon = 1;
 
 		} else {
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -932,7 +932,7 @@ find_page:
 			handle_ra_miss(mapping, &ra, index);
 			goto no_cached_page;
 		}
-		if (!PageUptodate(page))
+		if (!PageUptodate_NoLock(page))
 			goto page_not_up_to_date;
 page_ok:
 
@@ -1000,7 +1000,7 @@ readpage:
 			goto readpage_error;
 		}
 
-		if (!PageUptodate(page)) {
+		if (!PageUptodate_NoLock(page)) {
 			lock_page(page);
 			if (!PageUptodate(page)) {
 				if (page->mapping == NULL) {
@@ -1417,11 +1417,16 @@ retry_find:
 	 * Ok, found a page in the page cache, now we need to check
 	 * that it's up-to-date.
 	 */
-	if (!PageUptodate(page))
+	if (!PageUptodate_NoLock(page))
 		goto page_not_uptodate;
 
 success:
 	/*
+	 * Must order memory for the same reason as do_generic_mapping_read
+	 */
+	smp_rmb();
+
+	/*
 	 * Found the page and have a reference on it.
 	 */
 	mark_page_accessed(page);
@@ -1484,7 +1489,7 @@ page_not_uptodate:
 	error = mapping->a_ops->readpage(file, page);
 	if (!error) {
 		wait_on_page_locked(page);
-		if (PageUptodate(page))
+		if (PageUptodate_NoLock(page))
 			goto success;
 	} else if (error == AOP_TRUNCATED_PAGE) {
 		page_cache_release(page);
@@ -1515,7 +1520,7 @@ page_not_uptodate:
 	error = mapping->a_ops->readpage(file, page);
 	if (!error) {
 		wait_on_page_locked(page);
-		if (PageUptodate(page))
+		if (PageUptodate_NoLock(page))
 			goto success;
 	} else if (error == AOP_TRUNCATED_PAGE) {
 		page_cache_release(page);
@@ -1554,7 +1559,7 @@ retry_find:
 	 * Ok, found a page in the page cache, now we need to check
 	 * that it's up-to-date.
 	 */
-	if (!PageUptodate(page)) {
+	if (!PageUptodate_NoLock(page)) {
 		if (nonblock) {
 			page_cache_release(page);
 			return NULL;
@@ -1564,6 +1569,11 @@ retry_find:
 
 success:
 	/*
+	 * Must order memory for the same reason as do_generic_mapping_read
+	 */
+	smp_rmb();
+
+	/*
 	 * Found the page and have a reference on it.
 	 */
 	mark_page_accessed(page);
@@ -1605,7 +1615,7 @@ page_not_uptodate:
 	error = mapping->a_ops->readpage(file, page);
 	if (!error) {
 		wait_on_page_locked(page);
-		if (PageUptodate(page))
+		if (PageUptodate_NoLock(page))
 			goto success;
 	} else if (error == AOP_TRUNCATED_PAGE) {
 		page_cache_release(page);
@@ -1635,7 +1645,7 @@ page_not_uptodate:
 	error = mapping->a_ops->readpage(file, page);
 	if (!error) {
 		wait_on_page_locked(page);
-		if (PageUptodate(page))
+		if (PageUptodate_NoLock(page))
 			goto success;
 	} else if (error == AOP_TRUNCATED_PAGE) {
 		page_cache_release(page);
@@ -1806,7 +1816,7 @@ retry:
 	if (IS_ERR(page))
 		goto out;
 	mark_page_accessed(page);
-	if (PageUptodate(page))
+	if (PageUptodate_NoLock(page))
 		goto out;
 
 	lock_page(page);
Index: linux-2.6/mm/page_io.c
===================================================================
--- linux-2.6.orig/mm/page_io.c
+++ linux-2.6/mm/page_io.c
@@ -134,7 +134,7 @@ int swap_readpage(struct file *file, str
 	int ret = 0;
 
 	BUG_ON(!PageLocked(page));
-	ClearPageUptodate(page);
+	BUG_ON(PageUptodate(page));
 	bio = get_swap_bio(GFP_KERNEL, page_private(page), page,
 				end_swap_bio_read);
 	if (bio == NULL) {
Index: linux-2.6/mm/swap_state.c
===================================================================
--- linux-2.6.orig/mm/swap_state.c
+++ linux-2.6/mm/swap_state.c
@@ -149,6 +149,7 @@ int add_to_swap(struct page * page, gfp_
 	int err;
 
 	BUG_ON(!PageLocked(page));
+	BUG_ON(!PageUptodate(page));
 
 	for (;;) {
 		entry = get_swap_page();
@@ -171,7 +172,6 @@ int add_to_swap(struct page * page, gfp_
 
 		switch (err) {
 		case 0:				/* Success */
-			SetPageUptodate(page);
 			SetPageDirty(page);
 			INC_CACHE_INFO(add_total);
 			return 1;
Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c
+++ linux-2.6/fs/splice.c
@@ -107,7 +107,7 @@ static int page_cache_pipe_buf_pin(struc
 	struct page *page = buf->page;
 	int err;
 
-	if (!PageUptodate(page)) {
+	if (!PageUptodate_NoLock(page)) {
 		lock_page(page);
 
 		/*
@@ -373,7 +373,7 @@ __generic_file_splice_read(struct file *
 		/*
 		 * If the page isn't uptodate, we may need to start io on it
 		 */
-		if (!PageUptodate(page)) {
+		if (!PageUptodate_NoLock(page)) {
 			/*
 			 * If in nonblock mode then dont block on waiting
 			 * for an in-flight io page
Index: linux-2.6/fs/ext2/dir.c
===================================================================
--- linux-2.6.orig/fs/ext2/dir.c
+++ linux-2.6/fs/ext2/dir.c
@@ -163,7 +163,7 @@ static struct page * ext2_get_page(struc
 	if (!IS_ERR(page)) {
 		wait_on_page_locked(page);
 		kmap(page);
-		if (!PageUptodate(page))
+		if (!PageUptodate_NoLock(page))
 			goto fail;
 		if (!PageChecked(page))
 			ext2_check_page(page);
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c
+++ linux-2.6/fs/namei.c
@@ -2641,7 +2641,7 @@ static char *page_getlink(struct dentry 
 	if (IS_ERR(page))
 		goto sync_fail;
 	wait_on_page_locked(page);
-	if (!PageUptodate(page))
+	if (!PageUptodate_NoLock(page))
 		goto async_fail;
 	*ppage = page;
 	return kmap(page);
Index: linux-2.6/fs/partitions/check.c
===================================================================
--- linux-2.6.orig/fs/partitions/check.c
+++ linux-2.6/fs/partitions/check.c
@@ -562,7 +562,7 @@ unsigned char *read_dev_sector(struct bl
 				 NULL);
 	if (!IS_ERR(page)) {
 		wait_on_page_locked(page);
-		if (!PageUptodate(page))
+		if (!PageUptodate_NoLock(page))
 			goto fail;
 		if (PageError(page))
 			goto fail;
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c
+++ linux-2.6/mm/swapfile.c
@@ -1532,7 +1532,7 @@ asmlinkage long sys_swapon(const char __
 		goto bad_swap;
 	}
 	wait_on_page_locked(page);
-	if (!PageUptodate(page))
+	if (!PageUptodate_NoLock(page))
 		goto bad_swap;
 	kmap(page);
 	swap_header = page_address(page);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 2/3] fs: buffer don't PageUptodate without page locked
  2007-02-08 13:26 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Nick Piggin
  2007-02-08 13:27 ` [patch 1/3] mm: fix PageUptodate memorder Nick Piggin
@ 2007-02-08 13:27 ` Nick Piggin
  2007-02-08 13:27 ` [patch 3/3] mm: make read_cache_page synchronous Nick Piggin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2007-02-08 13:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Linux Memory Management, Benjamin Herrenschmidt,
	Nick Piggin, Linux Kernel

__block_write_full_page is calling SetPageUptodate without the page locked.
This is unusual, but not incorrect, as PG_writeback is still set.

However with the previous patch, this is now a problem: so don't bother
setting the page uptodate in this case (it is weird that the write path
does such a thing anyway). Instead just leave it to the read side to bring
the page uptodate when it notices that all buffers are uptodate.

Signed-off-by: Nick Piggin <npiggin@suse.de>

 fs/buffer.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1698,17 +1698,8 @@ done:
 		 * clean.  Someone wrote them back by hand with
 		 * ll_rw_block/submit_bh.  A rare case.
 		 */
-		int uptodate = 1;
-		do {
-			if (!buffer_uptodate(bh)) {
-				uptodate = 0;
-				break;
-			}
-			bh = bh->b_this_page;
-		} while (bh != head);
-		if (uptodate)
-			SetPageUptodate(page);
 		end_page_writeback(page);
+
 		/*
 		 * The page and buffer_heads can be released at any time from
 		 * here on.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 3/3] mm: make read_cache_page synchronous
  2007-02-08 13:26 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Nick Piggin
  2007-02-08 13:27 ` [patch 1/3] mm: fix PageUptodate memorder Nick Piggin
  2007-02-08 13:27 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin
@ 2007-02-08 13:27 ` Nick Piggin
  2007-02-08 22:21 ` [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Benjamin Herrenschmidt
  2007-02-09  0:41 ` Hugh Dickins
  4 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2007-02-08 13:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Linux Memory Management, Benjamin Herrenschmidt,
	Nick Piggin, Linux Kernel

Ensure pages are uptodate after returning from read_cache_page, which allows
us to cut out most of the filesystem-internal PageUptodate_NoLock calls.

I didn't have a great look down the call chains, but this appears to fixes 7
possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs,
1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd.
All depending on whether the filler is async and/or can return with a !uptodate
page.

Signed-off-by: Nick Piggin <npiggin@suse.de>

 drivers/mtd/devices/block2mtd.c |    3 --
 fs/afs/dir.c                    |    3 --
 fs/afs/mntpt.c                  |   11 +++------
 fs/cramfs/inode.c               |    3 +-
 fs/ecryptfs/mmap.c              |   10 --------
 fs/ext2/dir.c                   |    3 --
 fs/freevxfs/vxfs_subr.c         |    3 --
 fs/minix/dir.c                  |    1 
 fs/namei.c                      |   12 ----------
 fs/nfs/dir.c                    |    5 ----
 fs/nfs/symlink.c                |    6 -----
 fs/ntfs/aops.h                  |    3 --
 fs/ntfs/attrib.c                |   18 +--------------
 fs/ntfs/file.c                  |    3 --
 fs/ntfs/super.c                 |   30 +++----------------------
 fs/ocfs2/symlink.c              |    7 -----
 fs/partitions/check.c           |    3 --
 fs/reiserfs/xattr.c             |    4 ---
 fs/sysv/dir.c                   |   10 --------
 fs/ufs/dir.c                    |    6 -----
 fs/ufs/util.c                   |    6 +----
 include/linux/pagemap.h         |   11 +++++++++
 mm/filemap.c                    |   48 ++++++++++++++++++++++++++++++----------
 mm/swapfile.c                   |    3 --
 24 files changed, 68 insertions(+), 144 deletions(-)

Index: linux-2.6/fs/afs/dir.c
===================================================================
--- linux-2.6.orig/fs/afs/dir.c
+++ linux-2.6/fs/afs/dir.c
@@ -187,10 +187,7 @@ static struct page *afs_dir_get_page(str
 
 	page = read_mapping_page(dir->i_mapping, index, NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
-		if (!PageUptodate(page))
-			goto fail;
 		if (!PageChecked(page))
 			afs_dir_check_page(dir, page);
 		if (PageError(page))
Index: linux-2.6/fs/afs/mntpt.c
===================================================================
--- linux-2.6.orig/fs/afs/mntpt.c
+++ linux-2.6/fs/afs/mntpt.c
@@ -77,13 +77,11 @@ int afs_mntpt_check_symlink(struct afs_v
 	}
 
 	ret = -EIO;
-	wait_on_page_locked(page);
-	buf = kmap(page);
-	if (!PageUptodate(page))
-		goto out_free;
 	if (PageError(page))
 		goto out_free;
 
+	buf = kmap(page);
+
 	/* examine the symlink's contents */
 	size = vnode->status.size;
 	_debug("symlink to %*.*s", size, (int) size, buf);
@@ -100,8 +98,8 @@ int afs_mntpt_check_symlink(struct afs_v
 
 	ret = 0;
 
- out_free:
 	kunmap(page);
+ out_free:
 	page_cache_release(page);
  out:
 	_leave(" = %d", ret);
@@ -184,8 +182,7 @@ static struct vfsmount *afs_mntpt_do_aut
 	}
 
 	ret = -EIO;
-	wait_on_page_locked(page);
-	if (!PageUptodate(page) || PageError(page))
+	if (PageError(page))
 		goto error;
 
 	buf = kmap(page);
Index: linux-2.6/fs/cramfs/inode.c
===================================================================
--- linux-2.6.orig/fs/cramfs/inode.c
+++ linux-2.6/fs/cramfs/inode.c
@@ -180,7 +180,8 @@ static void *cramfs_read(struct super_bl
 		struct page *page = NULL;
 
 		if (blocknr + i < devsize) {
-			page = read_mapping_page(mapping, blocknr + i, NULL);
+			page = read_mapping_page_async(mapping, blocknr + i,
+									NULL);
 			/* synchronous error? */
 			if (IS_ERR(page))
 				page = NULL;
Index: linux-2.6/fs/ext2/dir.c
===================================================================
--- linux-2.6.orig/fs/ext2/dir.c
+++ linux-2.6/fs/ext2/dir.c
@@ -161,10 +161,7 @@ static struct page * ext2_get_page(struc
 	struct address_space *mapping = dir->i_mapping;
 	struct page *page = read_mapping_page(mapping, n, NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
-		if (!PageUptodate_NoLock(page))
-			goto fail;
 		if (!PageChecked(page))
 			ext2_check_page(page);
 		if (PageError(page))
Index: linux-2.6/fs/freevxfs/vxfs_subr.c
===================================================================
--- linux-2.6.orig/fs/freevxfs/vxfs_subr.c
+++ linux-2.6/fs/freevxfs/vxfs_subr.c
@@ -74,10 +74,7 @@ vxfs_get_page(struct address_space *mapp
 	pp = read_mapping_page(mapping, n, NULL);
 
 	if (!IS_ERR(pp)) {
-		wait_on_page_locked(pp);
 		kmap(pp);
-		if (!PageUptodate(pp))
-			goto fail;
 		/** if (!PageChecked(pp)) **/
 			/** vxfs_check_page(pp); **/
 		if (PageError(pp))
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1756,7 +1756,7 @@ int generic_file_readonly_mmap(struct fi
 EXPORT_SYMBOL(generic_file_mmap);
 EXPORT_SYMBOL(generic_file_readonly_mmap);
 
-static inline struct page *__read_cache_page(struct address_space *mapping,
+static struct page *__read_cache_page(struct address_space *mapping,
 				unsigned long index,
 				int (*filler)(void *,struct page*),
 				void *data)
@@ -1793,17 +1793,11 @@ repeat:
 	return page;
 }
 
-/**
- * read_cache_page - read into page cache, fill it if needed
- * @mapping:	the page's address_space
- * @index:	the page index
- * @filler:	function to perform the read
- * @data:	destination for read data
- *
- * Read into the page cache. If a page already exists,
- * and PageUptodate() is not set, try to fill the page.
+/*
+ * Same as read_cache_page, but don't wait for page to become unlocked
+ * after submitting it to the filler.
  */
-struct page *read_cache_page(struct address_space *mapping,
+struct page *read_cache_page_async(struct address_space *mapping,
 				unsigned long index,
 				int (*filler)(void *,struct page*),
 				void *data)
@@ -1815,7 +1809,6 @@ retry:
 	page = __read_cache_page(mapping, index, filler, data);
 	if (IS_ERR(page))
 		goto out;
-	mark_page_accessed(page);
 	if (PageUptodate_NoLock(page))
 		goto out;
 
@@ -1835,6 +1828,37 @@ retry:
 		page = ERR_PTR(err);
 	}
  out:
+	mark_page_accessed(page);
+	return page;
+}
+EXPORT_SYMBOL(read_cache_page_async);
+
+/**
+ * read_cache_page - read into page cache, fill it if needed
+ * @mapping:	the page's address_space
+ * @index:	the page index
+ * @filler:	function to perform the read
+ * @data:	destination for read data
+ *
+ * Read into the page cache. If a page already exists,
+ * and PageUptodate() is not set, try to fill the page.
+ */
+struct page *read_cache_page(struct address_space *mapping,
+				unsigned long index,
+				int (*filler)(void *,struct page*),
+				void *data)
+{
+	struct page *page;
+
+	page = read_cache_page_async(mapping, index, filler, data);
+	if (IS_ERR(page))
+		goto out;
+	wait_on_page_locked(page);
+	if (!PageUptodate_NoLock(page)) {
+		page_cache_release(page);
+		page = ERR_PTR(-EIO);
+	}
+ out:
 	return page;
 }
 EXPORT_SYMBOL(read_cache_page);
Index: linux-2.6/fs/minix/dir.c
===================================================================
--- linux-2.6.orig/fs/minix/dir.c
+++ linux-2.6/fs/minix/dir.c
@@ -62,7 +62,6 @@ static struct page * dir_get_page(struct
 	struct address_space *mapping = dir->i_mapping;
 	struct page *page = read_mapping_page(mapping, n, NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
 		if (!PageUptodate(page))
 			goto fail;
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c
+++ linux-2.6/fs/namei.c
@@ -2639,19 +2639,9 @@ static char *page_getlink(struct dentry 
 	struct address_space *mapping = dentry->d_inode->i_mapping;
 	page = read_mapping_page(mapping, 0, NULL);
 	if (IS_ERR(page))
-		goto sync_fail;
-	wait_on_page_locked(page);
-	if (!PageUptodate_NoLock(page))
-		goto async_fail;
+		return (char *)page;
 	*ppage = page;
 	return kmap(page);
-
-async_fail:
-	page_cache_release(page);
-	return ERR_PTR(-EIO);
-
-sync_fail:
-	return (char*)page;
 }
 
 int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
Index: linux-2.6/fs/ntfs/aops.h
===================================================================
--- linux-2.6.orig/fs/ntfs/aops.h
+++ linux-2.6/fs/ntfs/aops.h
@@ -89,9 +89,8 @@ static inline struct page *ntfs_map_page
 	struct page *page = read_mapping_page(mapping, index, NULL);
 
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
-		if (PageUptodate(page) && !PageError(page))
+		if (!PageError(page))
 			return page;
 		ntfs_unmap_page(page);
 		return ERR_PTR(-EIO);
Index: linux-2.6/fs/ntfs/attrib.c
===================================================================
--- linux-2.6.orig/fs/ntfs/attrib.c
+++ linux-2.6/fs/ntfs/attrib.c
@@ -2532,14 +2532,7 @@ int ntfs_attr_set(ntfs_inode *ni, const 
 		page = read_mapping_page(mapping, idx, NULL);
 		if (IS_ERR(page)) {
 			ntfs_error(vol->sb, "Failed to read first partial "
-					"page (sync error, index 0x%lx).", idx);
-			return PTR_ERR(page);
-		}
-		wait_on_page_locked(page);
-		if (unlikely(!PageUptodate(page))) {
-			ntfs_error(vol->sb, "Failed to read first partial page "
-					"(async error, index 0x%lx).", idx);
-			page_cache_release(page);
+					"page (error, index 0x%lx).", idx);
 			return PTR_ERR(page);
 		}
 		/*
@@ -2602,14 +2595,7 @@ int ntfs_attr_set(ntfs_inode *ni, const 
 		page = read_mapping_page(mapping, idx, NULL);
 		if (IS_ERR(page)) {
 			ntfs_error(vol->sb, "Failed to read last partial page "
-					"(sync error, index 0x%lx).", idx);
-			return PTR_ERR(page);
-		}
-		wait_on_page_locked(page);
-		if (unlikely(!PageUptodate(page))) {
-			ntfs_error(vol->sb, "Failed to read last partial page "
-					"(async error, index 0x%lx).", idx);
-			page_cache_release(page);
+					"(error, index 0x%lx).", idx);
 			return PTR_ERR(page);
 		}
 		kaddr = kmap_atomic(page, KM_USER0);
Index: linux-2.6/fs/ntfs/file.c
===================================================================
--- linux-2.6.orig/fs/ntfs/file.c
+++ linux-2.6/fs/ntfs/file.c
@@ -236,8 +236,7 @@ do_non_resident_extend:
 			err = PTR_ERR(page);
 			goto init_err_out;
 		}
-		wait_on_page_locked(page);
-		if (unlikely(!PageUptodate(page) || PageError(page))) {
+		if (unlikely(PageError(page))) {
 			page_cache_release(page);
 			err = -EIO;
 			goto init_err_out;
Index: linux-2.6/fs/ocfs2/symlink.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/symlink.c
+++ linux-2.6/fs/ocfs2/symlink.c
@@ -67,16 +67,9 @@ static char *ocfs2_page_getlink(struct d
 	page = read_mapping_page(mapping, 0, NULL);
 	if (IS_ERR(page))
 		goto sync_fail;
-	wait_on_page_locked(page);
-	if (!PageUptodate(page))
-		goto async_fail;
 	*ppage = page;
 	return kmap(page);
 
-async_fail:
-	page_cache_release(page);
-	return ERR_PTR(-EIO);
-
 sync_fail:
 	return (char*)page;
 }
Index: linux-2.6/fs/partitions/check.c
===================================================================
--- linux-2.6.orig/fs/partitions/check.c
+++ linux-2.6/fs/partitions/check.c
@@ -561,9 +561,6 @@ unsigned char *read_dev_sector(struct bl
 	page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_CACHE_SHIFT-9)),
 				 NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
-		if (!PageUptodate_NoLock(page))
-			goto fail;
 		if (PageError(page))
 			goto fail;
 		p->v = page;
Index: linux-2.6/fs/reiserfs/xattr.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/xattr.c
+++ linux-2.6/fs/reiserfs/xattr.c
@@ -454,11 +454,7 @@ static struct page *reiserfs_get_page(st
 	mapping_set_gfp_mask(mapping, GFP_NOFS);
 	page = read_mapping_page(mapping, n, NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
-		if (!PageUptodate(page))
-			goto fail;
-
 		if (PageError(page))
 			goto fail;
 	}
Index: linux-2.6/fs/sysv/dir.c
===================================================================
--- linux-2.6.orig/fs/sysv/dir.c
+++ linux-2.6/fs/sysv/dir.c
@@ -54,17 +54,9 @@ static struct page * dir_get_page(struct
 {
 	struct address_space *mapping = dir->i_mapping;
 	struct page *page = read_mapping_page(mapping, n, NULL);
-	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
+	if (!IS_ERR(page))
 		kmap(page);
-		if (!PageUptodate(page))
-			goto fail;
-	}
 	return page;
-
-fail:
-	dir_put_page(page);
-	return ERR_PTR(-EIO);
 }
 
 static int sysv_readdir(struct file * filp, void * dirent, filldir_t filldir)
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c
+++ linux-2.6/mm/swapfile.c
@@ -1531,9 +1531,6 @@ asmlinkage long sys_swapon(const char __
 		error = PTR_ERR(page);
 		goto bad_swap;
 	}
-	wait_on_page_locked(page);
-	if (!PageUptodate_NoLock(page))
-		goto bad_swap;
 	kmap(page);
 	swap_header = page_address(page);
 
Index: linux-2.6/drivers/mtd/devices/block2mtd.c
===================================================================
--- linux-2.6.orig/drivers/mtd/devices/block2mtd.c
+++ linux-2.6/drivers/mtd/devices/block2mtd.c
@@ -88,9 +88,8 @@ static void cache_readahead(struct addre
 
 static struct page* page_readahead(struct address_space *mapping, int index)
 {
-	filler_t *filler = (filler_t*)mapping->a_ops->readpage;
 	cache_readahead(mapping, index);
-	return read_cache_page(mapping, index, filler, NULL);
+	return read_mapping_page(mapping, index, NULL);
 }
 
 
Index: linux-2.6/fs/ecryptfs/mmap.c
===================================================================
--- linux-2.6.orig/fs/ecryptfs/mmap.c
+++ linux-2.6/fs/ecryptfs/mmap.c
@@ -54,14 +54,7 @@ static struct page *ecryptfs_get1page(st
 	dentry = file->f_path.dentry;
 	inode = dentry->d_inode;
 	mapping = inode->i_mapping;
-	page = read_cache_page(mapping, index,
-			       (filler_t *)mapping->a_ops->readpage,
-			       (void *)file);
-	if (IS_ERR(page))
-		goto out;
-	wait_on_page_locked(page);
-out:
-	return page;
+	return read_mapping_page(mapping, index, (void *)file);
 }
 
 static
@@ -233,7 +226,6 @@ int ecryptfs_do_readpage(struct file *fi
 		ecryptfs_printk(KERN_ERR, "Error reading from page cache\n");
 		goto out;
 	}
-	wait_on_page_locked(lower_page);
 	page_data = (char *)kmap(page);
 	if (!page_data) {
 		rc = -ENOMEM;
Index: linux-2.6/fs/nfs/dir.c
===================================================================
--- linux-2.6.orig/fs/nfs/dir.c
+++ linux-2.6/fs/nfs/dir.c
@@ -322,8 +322,6 @@ int find_dirent_page(nfs_readdir_descrip
 		status = PTR_ERR(page);
 		goto out;
 	}
-	if (!PageUptodate(page))
-		goto read_error;
 
 	/* NOTE: Someone else may have changed the READDIRPLUS flag */
 	desc->page = page;
@@ -337,9 +335,6 @@ int find_dirent_page(nfs_readdir_descrip
  out:
 	dfprintk(DIRCACHE, "NFS: %s: returns %d\n", __FUNCTION__, status);
 	return status;
- read_error:
-	page_cache_release(page);
-	return -EIO;
 }
 
 /*
Index: linux-2.6/fs/nfs/symlink.c
===================================================================
--- linux-2.6.orig/fs/nfs/symlink.c
+++ linux-2.6/fs/nfs/symlink.c
@@ -61,15 +61,9 @@ static void *nfs_follow_link(struct dent
 		err = page;
 		goto read_failed;
 	}
-	if (!PageUptodate(page)) {
-		err = ERR_PTR(-EIO);
-		goto getlink_read_error;
-	}
 	nd_set_link(nd, kmap(page));
 	return page;
 
-getlink_read_error:
-	page_cache_release(page);
 read_failed:
 	nd_set_link(nd, err);
 	return NULL;
Index: linux-2.6/fs/ntfs/super.c
===================================================================
--- linux-2.6.orig/fs/ntfs/super.c
+++ linux-2.6/fs/ntfs/super.c
@@ -2471,7 +2471,6 @@ static s64 get_nr_free_clusters(ntfs_vol
 	s64 nr_free = vol->nr_clusters;
 	u32 *kaddr;
 	struct address_space *mapping = vol->lcnbmp_ino->i_mapping;
-	filler_t *readpage = (filler_t*)mapping->a_ops->readpage;
 	struct page *page;
 	pgoff_t index, max_index;
 
@@ -2494,24 +2493,14 @@ static s64 get_nr_free_clusters(ntfs_vol
 		 * Read the page from page cache, getting it from backing store
 		 * if necessary, and increment the use count.
 		 */
-		page = read_cache_page(mapping, index, (filler_t*)readpage,
-				NULL);
+		page = read_mapping_page(mapping, index, NULL);
 		/* Ignore pages which errored synchronously. */
 		if (IS_ERR(page)) {
-			ntfs_debug("Sync read_cache_page() error. Skipping "
+			ntfs_debug("read_mapping_page() error. Skipping "
 					"page (index 0x%lx).", index);
 			nr_free -= PAGE_CACHE_SIZE * 8;
 			continue;
 		}
-		wait_on_page_locked(page);
-		/* Ignore pages which errored asynchronously. */
-		if (!PageUptodate(page)) {
-			ntfs_debug("Async read_cache_page() error. Skipping "
-					"page (index 0x%lx).", index);
-			page_cache_release(page);
-			nr_free -= PAGE_CACHE_SIZE * 8;
-			continue;
-		}
 		kaddr = (u32*)kmap_atomic(page, KM_USER0);
 		/*
 		 * For each 4 bytes, subtract the number of set bits. If this
@@ -2562,7 +2551,6 @@ static unsigned long __get_nr_free_mft_r
 {
 	u32 *kaddr;
 	struct address_space *mapping = vol->mftbmp_ino->i_mapping;
-	filler_t *readpage = (filler_t*)mapping->a_ops->readpage;
 	struct page *page;
 	pgoff_t index;
 
@@ -2576,21 +2564,11 @@ static unsigned long __get_nr_free_mft_r
 		 * Read the page from page cache, getting it from backing store
 		 * if necessary, and increment the use count.
 		 */
-		page = read_cache_page(mapping, index, (filler_t*)readpage,
-				NULL);
+		page = read_mapping_page(mapping, index, NULL);
 		/* Ignore pages which errored synchronously. */
 		if (IS_ERR(page)) {
-			ntfs_debug("Sync read_cache_page() error. Skipping "
-					"page (index 0x%lx).", index);
-			nr_free -= PAGE_CACHE_SIZE * 8;
-			continue;
-		}
-		wait_on_page_locked(page);
-		/* Ignore pages which errored asynchronously. */
-		if (!PageUptodate(page)) {
-			ntfs_debug("Async read_cache_page() error. Skipping "
+			ntfs_debug("read_mapping_page() error. Skipping "
 					"page (index 0x%lx).", index);
-			page_cache_release(page);
 			nr_free -= PAGE_CACHE_SIZE * 8;
 			continue;
 		}
Index: linux-2.6/fs/ufs/dir.c
===================================================================
--- linux-2.6.orig/fs/ufs/dir.c
+++ linux-2.6/fs/ufs/dir.c
@@ -180,13 +180,9 @@ fail:
 static struct page *ufs_get_page(struct inode *dir, unsigned long n)
 {
 	struct address_space *mapping = dir->i_mapping;
-	struct page *page = read_cache_page(mapping, n,
-				(filler_t*)mapping->a_ops->readpage, NULL);
+	struct page *page = read_mapping_page(mapping, n, NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
-		if (!PageUptodate(page))
-			goto fail;
 		if (!PageChecked(page))
 			ufs_check_page(page);
 		if (PageError(page))
Index: linux-2.6/fs/ufs/util.c
===================================================================
--- linux-2.6.orig/fs/ufs/util.c
+++ linux-2.6/fs/ufs/util.c
@@ -251,13 +251,11 @@ struct page *ufs_get_locked_page(struct 
 
 	page = find_lock_page(mapping, index);
 	if (!page) {
-		page = read_cache_page(mapping, index,
-				       (filler_t*)mapping->a_ops->readpage,
-				       NULL);
+		page = read_mapping_page(mapping, index, NULL);
 
 		if (IS_ERR(page)) {
 			printk(KERN_ERR "ufs_change_blocknr: "
-			       "read_cache_page error: ino %lu, index: %lu\n",
+			       "read_mapping_page error: ino %lu, index: %lu\n",
 			       mapping->host->i_ino, index);
 			goto out;
 		}
Index: linux-2.6/include/linux/pagemap.h
===================================================================
--- linux-2.6.orig/include/linux/pagemap.h
+++ linux-2.6/include/linux/pagemap.h
@@ -97,12 +97,23 @@ static inline struct page *grab_cache_pa
 
 extern struct page * grab_cache_page_nowait(struct address_space *mapping,
 				unsigned long index);
+extern struct page * read_cache_page_async(struct address_space *mapping,
+				unsigned long index, filler_t *filler,
+				void *data);
 extern struct page * read_cache_page(struct address_space *mapping,
 				unsigned long index, filler_t *filler,
 				void *data);
 extern int read_cache_pages(struct address_space *mapping,
 		struct list_head *pages, filler_t *filler, void *data);
 
+static inline struct page *read_mapping_page_async(
+						struct address_space *mapping,
+					     unsigned long index, void *data)
+{
+	filler_t *filler = (filler_t *)mapping->a_ops->readpage;
+	return read_cache_page_async(mapping, index, filler, data);
+}
+
 static inline struct page *read_mapping_page(struct address_space *mapping,
 					     unsigned long index, void *data)
 {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2)
  2007-02-08 13:26 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Nick Piggin
                   ` (2 preceding siblings ...)
  2007-02-08 13:27 ` [patch 3/3] mm: make read_cache_page synchronous Nick Piggin
@ 2007-02-08 22:21 ` Benjamin Herrenschmidt
  2007-02-09  0:41 ` Hugh Dickins
  4 siblings, 0 replies; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-08 22:21 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Linux Memory Management, Hugh Dickins, Linux Kernel

On Thu, 2007-02-08 at 14:26 +0100, Nick Piggin wrote:
> Still no independent confirmation as to whether this is a problem or not.
> Updated some comments, added diffstats to patches, don't use __SetPageUptodate
> as an internal page-flags.h private function.
> 
> I would like to eventually get an ack from Hugh regarding the anon memory
> and especially swap side of the equation, and a glance from whoever put the
> smp_wmb()s into the copy functions (Was it Ben H or Anton maybe?)

I don't remember adding that one ... Anton ?

Ben.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2)
  2007-02-08 13:26 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Nick Piggin
                   ` (3 preceding siblings ...)
  2007-02-08 22:21 ` [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Benjamin Herrenschmidt
@ 2007-02-09  0:41 ` Hugh Dickins
  2007-02-09  1:31   ` Nick Piggin
  2007-02-09  1:41   ` Benjamin Herrenschmidt
  4 siblings, 2 replies; 9+ messages in thread
From: Hugh Dickins @ 2007-02-09  0:41 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Linux Memory Management, Benjamin Herrenschmidt,
	Linux Kernel

On Thu, 8 Feb 2007, Nick Piggin wrote:
> Still no independent confirmation as to whether this is a problem or not.

I'm trying to convince myself none of your patch is necessary.  Probably
shall fail.  But how come we've survived for years with such an issue?

> Updated some comments, added diffstats to patches, don't use
> __SetPageUptodate as an internal page-flags.h private function.

Depressed by profusion of PageUptodate_UpperAndlowercasevariants.
Those rmbs, you really only want them when it says "yes", don't you?

> 
> I would like to eventually get an ack from Hugh regarding the anon memory
> and especially swap side of the equation,

Plea noted.  I'm pondering.  "Eventually" indeed.  OTOH I expect you're
right to criticize anon/swap PageUptodate being set where it was needed
to get by, rather than where it was natural to do so.

> and a glance from whoever put the
> smp_wmb()s into the copy functions (Was it Ben H or Anton maybe?)

From: Linus Torvalds <torvalds@ppc970.osdl.org>
Date: Thu, 14 Oct 2004 04:00:06 +0000 (-0700)
Subject: Fix threaded user page write memory ordering
X-Git-Tag: v2.6.9-final~3
X-Git-Url: http://127.0.0.1:1234/?p=.git;a=commitdiff_plain;h=538ce05c0ef4055cf29a92a4abcdf139d180a0f9;hp=8c225dbc5a7b13801a8254aae0ccebab8e4bece7

Fix threaded user page write memory ordering

Make sure we order the writes to a newly created page
with the page table update that potentially exposes the
page to another CPU.

This is a no-op on any architecture where getting the
page table spinlock will already do the ordering (notably
x86), but other architectures can care.
---

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 232d8fd..7153aef 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -40,6 +40,8 @@ static inline void clear_user_highpage(s
 	void *addr = kmap_atomic(page, KM_USER0);
 	clear_user_page(addr, vaddr, page);
 	kunmap_atomic(addr, KM_USER0);
+	/* Make sure this page is cleared on other CPU's too before using it */
+	smp_wmb();
 }
 
 static inline void clear_highpage(struct page *page)
@@ -73,6 +75,8 @@ static inline void copy_user_highpage(st
 	copy_user_page(vto, vfrom, vaddr, to);
 	kunmap_atomic(vfrom, KM_USER0);
 	kunmap_atomic(vto, KM_USER1);
+	/* Make sure this page is cleared on other CPU's too before using it */
+	smp_wmb();
 }
 
 static inline void copy_highpage(struct page *to, struct page *from)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2)
  2007-02-09  0:41 ` Hugh Dickins
@ 2007-02-09  1:31   ` Nick Piggin
  2007-02-09  1:44     ` Benjamin Herrenschmidt
  2007-02-09  1:41   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2007-02-09  1:31 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Linux Memory Management, Benjamin Herrenschmidt,
	Linux Kernel

On Fri, Feb 09, 2007 at 12:41:51AM +0000, Hugh Dickins wrote:
> On Thu, 8 Feb 2007, Nick Piggin wrote:
> > Still no independent confirmation as to whether this is a problem or not.
> 
> I'm trying to convince myself none of your patch is necessary.  Probably
> shall fail.  But how come we've survived for years with such an issue?

Well I'm almost sure that the POWER guys hit this with anonymous pages
being zeroed out then added to process address space -- they would
occasionally see junk (via another thread, I presume).

If that were the case, then I think that a read-vs-read over a hole
(for example) should be buggy in the same way.

> > Updated some comments, added diffstats to patches, don't use
> > __SetPageUptodate as an internal page-flags.h private function.
> 
> Depressed by profusion of PageUptodate_UpperAndlowercasevariants.
> Those rmbs, you really only want them when it says "yes", don't you?

Yeah, any help with naming suggestions would be appreciated. I think
we can get rid of the SetPageUptodate_nowarn variant, by making
SetNewPageUptodate simply do an smp_wmb + __set_bit on all architectures?
This frees us from the extra atomic ops I'd added into the page fault
fastpaths as well.

Yes, we do only need the rmb if it is uptodate, that's a good point.

> > I would like to eventually get an ack from Hugh regarding the anon memory
> > and especially swap side of the equation,
> 
> Plea noted.  I'm pondering.  "Eventually" indeed.  OTOH I expect you're
> right to criticize anon/swap PageUptodate being set where it was needed
> to get by, rather than where it was natural to do so.

Well if I can get you to warm to that aspect of the patch... :) I expect
that using non-atomic bitops might help.

Should I make that change into its own patch?

> > and a glance from whoever put the
> > smp_wmb()s into the copy functions (Was it Ben H or Anton maybe?)
> 
> From: Linus Torvalds <torvalds@ppc970.osdl.org>
> Date: Thu, 14 Oct 2004 04:00:06 +0000 (-0700)
> Subject: Fix threaded user page write memory ordering
> X-Git-Tag: v2.6.9-final~3
> X-Git-Url: http://127.0.0.1:1234/?p=.git;a=commitdiff_plain;h=538ce05c0ef4055cf29a92a4abcdf139d180a0f9;hp=8c225dbc5a7b13801a8254aae0ccebab8e4bece7
> 
> Fix threaded user page write memory ordering

Thanks, I did see that, but I'm sure it must have been prompted by a
discussion or another proposed patch from IBM. Maybe I'm wrong
though.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2)
  2007-02-09  0:41 ` Hugh Dickins
  2007-02-09  1:31   ` Nick Piggin
@ 2007-02-09  1:41   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-09  1:41 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Nick Piggin, Andrew Morton, Linux Memory Management, Linux Kernel


> Fix threaded user page write memory ordering
> 
> Make sure we order the writes to a newly created page
> with the page table update that potentially exposes the
> page to another CPU.
> 
> This is a no-op on any architecture where getting the
> page table spinlock will already do the ordering (notably
> x86), but other architectures can care.


Ah yes, the clear_* ones I knew about as I'm the one who tracked down
that bug on power :-) I though the ones Nick pointed out were different
tho.

Ben.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2)
  2007-02-09  1:31   ` Nick Piggin
@ 2007-02-09  1:44     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-09  1:44 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Hugh Dickins, Andrew Morton, Linux Memory Management, Linux Kernel


> > Fix threaded user page write memory ordering
> 
> Thanks, I did see that, but I'm sure it must have been prompted by a
> discussion or another proposed patch from IBM. Maybe I'm wrong
> though.

Yes, my initial proposal iirc was to smp_wmb() in set_pte() but after a
discussion with Linus, we ended with open coding them.

Ben.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-02-09  1:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-08 13:26 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Nick Piggin
2007-02-08 13:27 ` [patch 1/3] mm: fix PageUptodate memorder Nick Piggin
2007-02-08 13:27 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin
2007-02-08 13:27 ` [patch 3/3] mm: make read_cache_page synchronous Nick Piggin
2007-02-08 22:21 ` [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Benjamin Herrenschmidt
2007-02-09  0:41 ` Hugh Dickins
2007-02-09  1:31   ` Nick Piggin
2007-02-09  1:44     ` Benjamin Herrenschmidt
2007-02-09  1:41   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).