LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 0/3] 2.6.20 fix for PageUptodate memorder problem
@ 2007-02-06  8:02 Nick Piggin
  2007-02-06  8:02 ` [patch 1/3] mm: fix PageUptodate memorder Nick Piggin
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Nick Piggin @ 2007-02-06  8:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Andrew Morton, Linux Memory Management,
	Linux Kernel, Nick Piggin, Linux Filesystems

Still no independent confirmation as to whether this is a problem or not.
I think it is, so I'll propose this patchset to fix it. Patch 1/3 has a
reasonable description of the problem.

Thanks,
Nick

--
SuSE Labs


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 1/3] mm: fix PageUptodate memorder
  2007-02-06  8:02 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem Nick Piggin
@ 2007-02-06  8:02 ` Nick Piggin
  2007-02-06  8:25   ` Andrew Morton
  2007-02-06  8:02 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2007-02-06  8:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Andrew Morton, Nick Piggin, Linux Kernel,
	Linux Memory Management, Linux Filesystems

After running SetPageUptodate, preceeding stores to the page contents to
actually bring it uptodate may not be ordered with the store to set the page
uptodate.

Therefore, another CPU which checks PageUptodate is true, then reads the
page contents can get stale data.

Fix this by ensuring SetPageUptodate is always called with the page locked
(except in the case of a new page that cannot be visible to other CPUs), and
requiring PageUptodate be checked only when the page is locked.

To facilitate lockless checks, SetPageUptodate contains an smp_wmb to order
preceeding stores before the store to page flags, and a new PageUptodate_NoLock
is introduced, which issues a smp_rmb after the page flags are loaded for the
test.

I'm still not sure that a DMA memory barrier is not required, however I think
the logical place to put such a barrier would be in the IO completion routines,
when they come back to tell us that they have succeeded. (Help? Anyone?)

One thing I like about it is that it unifies the anonymous page handling
with the rest of the page management, by marking anon pages as uptodate
when they _are_ uptodate, rather than when our implementation requires
that they be marked as such. Doing this let me get rid of the smp_wmb's
in the page copying functions, which were specially added for anonymous
pages for a closely related issue, always vaguely troubled me.

Convert core code and some filesystems to use PageUptodate_NoLock, just for
reference (a more complete patch follows).

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/include/linux/highmem.h
===================================================================
--- linux-2.6.orig/include/linux/highmem.h
+++ linux-2.6/include/linux/highmem.h
@@ -57,8 +57,6 @@ static inline void clear_user_highpage(s
 	void *addr = kmap_atomic(page, KM_USER0);
 	clear_user_page(addr, vaddr, page);
 	kunmap_atomic(addr, KM_USER0);
-	/* Make sure this page is cleared on other CPU's too before using it */
-	smp_wmb();
 }
 
 #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
@@ -108,8 +106,6 @@ static inline void copy_user_highpage(st
 	copy_user_page(vto, vfrom, vaddr, to);
 	kunmap_atomic(vfrom, KM_USER0);
 	kunmap_atomic(vto, KM_USER1);
-	/* Make sure this page is cleared on other CPU's too before using it */
-	smp_wmb();
 }
 
 #endif
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h
+++ linux-2.6/include/linux/page-flags.h
@@ -126,16 +126,62 @@
 #define ClearPageReferenced(page)	clear_bit(PG_referenced, &(page)->flags)
 #define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, &(page)->flags)
 
-#define PageUptodate(page)	test_bit(PG_uptodate, &(page)->flags)
-#ifdef CONFIG_S390
-static inline void SetPageUptodate(struct page *page)
+static inline int PageUptodate(struct page *page)
+{
+	WARN_ON(!PageLocked(page));
+	return test_bit(PG_uptodate, &(page)->flags);
+}
+
+/*
+ * PageUptodate to be used when not holding the page lock.
+ */
+static inline int PageUptodate_NoLock(struct page *page)
 {
+	int ret = test_bit(PG_uptodate, &(page)->flags);
+
+	/*
+	 * Must ensure that the data we read out of the page is loaded
+	 * _after_ we've loaded page->flags to check for PageUptodate.
+	 * See SetPageUptodate() for the other side of the story.
+	 */
+	smp_rmb();
+
+	return ret;
+}
+
+static inline void __SetPageUptodate(struct page *page)
+{
+#ifdef CONFIG_S390
 	if (!test_and_set_bit(PG_uptodate, &page->flags))
 		page_test_and_clear_dirty(page);
-}
 #else
-#define SetPageUptodate(page)	set_bit(PG_uptodate, &(page)->flags)
+	/*
+	 * Memory barrier must be issued before setting the PG_uptodate bit,
+	 * so all previous writes that served to bring the page uptodate are
+	 * visible before PageUptodate becomes true.
+	 *
+	 * S390 is guaranteed to have a barrier in the test_and_set operation
+	 * (see Documentation/atomic_ops.txt).
+	 *
+	 * XXX: does this memory barrier need to be anything special to
+	 * handle things like DMA writes into the page?
+	 */
+	smp_wmb();
+	set_bit(PG_uptodate, &(page)->flags);
 #endif
+}
+
+static inline void SetPageUptodate(struct page *page)
+{
+	WARN_ON(!PageLocked(page));
+	__SetPageUptodate(page);
+}
+
+static inline void SetNewPageUptodate(struct page *page)
+{
+	__SetPageUptodate(page);
+}
+
 #define ClearPageUptodate(page)	clear_bit(PG_uptodate, &(page)->flags)
 
 #define PageDirty(page)		test_bit(PG_dirty, &(page)->flags)
Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c
+++ linux-2.6/mm/hugetlb.c
@@ -443,6 +443,7 @@ static int hugetlb_cow(struct mm_struct 
 
 	spin_unlock(&mm->page_table_lock);
 	copy_huge_page(new_page, old_page, address, vma);
+	SetNewPageUptodate(new_page);
 	spin_lock(&mm->page_table_lock);
 
 	ptep = huge_pte_offset(mm, address & HPAGE_MASK);
@@ -506,6 +507,7 @@ retry:
 		} else
 			lock_page(page);
 	}
+	SetNewPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
 	size = i_size_read(mapping->host) >> HPAGE_SHIFT;
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1463,10 +1463,8 @@ static inline void cow_user_page(struct 
 			memset(kaddr, 0, PAGE_SIZE);
 		kunmap_atomic(kaddr, KM_USER0);
 		flush_dcache_page(dst);
-		return;
-
-	}
-	copy_user_highpage(dst, src, va, vma);
+	} else
+		copy_user_highpage(dst, src, va, vma);
 }
 
 /*
@@ -1579,6 +1577,7 @@ gotten:
 			goto oom;
 		cow_user_page(new_page, old_page, address, vma);
 	}
+	SetNewPageUptodate(new_page);
 
 	/*
 	 * Re-check the pte - we dropped the lock
@@ -2097,6 +2096,7 @@ static int do_anonymous_page(struct mm_s
 		page = alloc_zeroed_user_highpage(vma, address);
 		if (!page)
 			goto oom;
+		SetNewPageUptodate(page);
 
 		entry = mk_pte(page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
@@ -2203,6 +2203,7 @@ retry:
 			copy_user_highpage(page, new_page, address, vma);
 			page_cache_release(new_page);
 			new_page = page;
+			SetNewPageUptodate(new_page);
 			anon = 1;
 
 		} else {
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -932,7 +932,7 @@ find_page:
 			handle_ra_miss(mapping, &ra, index);
 			goto no_cached_page;
 		}
-		if (!PageUptodate(page))
+		if (!PageUptodate_NoLock(page))
 			goto page_not_up_to_date;
 page_ok:
 
@@ -1000,7 +1000,7 @@ readpage:
 			goto readpage_error;
 		}
 
-		if (!PageUptodate(page)) {
+		if (!PageUptodate_NoLock(page)) {
 			lock_page(page);
 			if (!PageUptodate(page)) {
 				if (page->mapping == NULL) {
@@ -1417,11 +1417,16 @@ retry_find:
 	 * Ok, found a page in the page cache, now we need to check
 	 * that it's up-to-date.
 	 */
-	if (!PageUptodate(page))
+	if (!PageUptodate_NoLock(page))
 		goto page_not_uptodate;
 
 success:
 	/*
+	 * Must order memory for the same reason as do_generic_mapping_read
+	 */
+	smp_rmb();
+
+	/*
 	 * Found the page and have a reference on it.
 	 */
 	mark_page_accessed(page);
@@ -1484,7 +1489,7 @@ page_not_uptodate:
 	error = mapping->a_ops->readpage(file, page);
 	if (!error) {
 		wait_on_page_locked(page);
-		if (PageUptodate(page))
+		if (PageUptodate_NoLock(page))
 			goto success;
 	} else if (error == AOP_TRUNCATED_PAGE) {
 		page_cache_release(page);
@@ -1515,7 +1520,7 @@ page_not_uptodate:
 	error = mapping->a_ops->readpage(file, page);
 	if (!error) {
 		wait_on_page_locked(page);
-		if (PageUptodate(page))
+		if (PageUptodate_NoLock(page))
 			goto success;
 	} else if (error == AOP_TRUNCATED_PAGE) {
 		page_cache_release(page);
@@ -1554,7 +1559,7 @@ retry_find:
 	 * Ok, found a page in the page cache, now we need to check
 	 * that it's up-to-date.
 	 */
-	if (!PageUptodate(page)) {
+	if (!PageUptodate_NoLock(page)) {
 		if (nonblock) {
 			page_cache_release(page);
 			return NULL;
@@ -1564,6 +1569,11 @@ retry_find:
 
 success:
 	/*
+	 * Must order memory for the same reason as do_generic_mapping_read
+	 */
+	smp_rmb();
+
+	/*
 	 * Found the page and have a reference on it.
 	 */
 	mark_page_accessed(page);
@@ -1605,7 +1615,7 @@ page_not_uptodate:
 	error = mapping->a_ops->readpage(file, page);
 	if (!error) {
 		wait_on_page_locked(page);
-		if (PageUptodate(page))
+		if (PageUptodate_NoLock(page))
 			goto success;
 	} else if (error == AOP_TRUNCATED_PAGE) {
 		page_cache_release(page);
@@ -1635,7 +1645,7 @@ page_not_uptodate:
 	error = mapping->a_ops->readpage(file, page);
 	if (!error) {
 		wait_on_page_locked(page);
-		if (PageUptodate(page))
+		if (PageUptodate_NoLock(page))
 			goto success;
 	} else if (error == AOP_TRUNCATED_PAGE) {
 		page_cache_release(page);
@@ -1806,7 +1816,7 @@ retry:
 	if (IS_ERR(page))
 		goto out;
 	mark_page_accessed(page);
-	if (PageUptodate(page))
+	if (PageUptodate_NoLock(page))
 		goto out;
 
 	lock_page(page);
Index: linux-2.6/mm/page_io.c
===================================================================
--- linux-2.6.orig/mm/page_io.c
+++ linux-2.6/mm/page_io.c
@@ -134,7 +134,7 @@ int swap_readpage(struct file *file, str
 	int ret = 0;
 
 	BUG_ON(!PageLocked(page));
-	ClearPageUptodate(page);
+	BUG_ON(PageUptodate(page));
 	bio = get_swap_bio(GFP_KERNEL, page_private(page), page,
 				end_swap_bio_read);
 	if (bio == NULL) {
Index: linux-2.6/mm/swap_state.c
===================================================================
--- linux-2.6.orig/mm/swap_state.c
+++ linux-2.6/mm/swap_state.c
@@ -149,6 +149,7 @@ int add_to_swap(struct page * page, gfp_
 	int err;
 
 	BUG_ON(!PageLocked(page));
+	BUG_ON(!PageUptodate(page));
 
 	for (;;) {
 		entry = get_swap_page();
@@ -171,7 +172,6 @@ int add_to_swap(struct page * page, gfp_
 
 		switch (err) {
 		case 0:				/* Success */
-			SetPageUptodate(page);
 			SetPageDirty(page);
 			INC_CACHE_INFO(add_total);
 			return 1;
Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c
+++ linux-2.6/fs/splice.c
@@ -107,7 +107,7 @@ static int page_cache_pipe_buf_pin(struc
 	struct page *page = buf->page;
 	int err;
 
-	if (!PageUptodate(page)) {
+	if (!PageUptodate_NoLock(page)) {
 		lock_page(page);
 
 		/*
@@ -373,7 +373,7 @@ __generic_file_splice_read(struct file *
 		/*
 		 * If the page isn't uptodate, we may need to start io on it
 		 */
-		if (!PageUptodate(page)) {
+		if (!PageUptodate_NoLock(page)) {
 			/*
 			 * If in nonblock mode then dont block on waiting
 			 * for an in-flight io page
Index: linux-2.6/fs/ext2/dir.c
===================================================================
--- linux-2.6.orig/fs/ext2/dir.c
+++ linux-2.6/fs/ext2/dir.c
@@ -163,7 +163,7 @@ static struct page * ext2_get_page(struc
 	if (!IS_ERR(page)) {
 		wait_on_page_locked(page);
 		kmap(page);
-		if (!PageUptodate(page))
+		if (!PageUptodate_NoLock(page))
 			goto fail;
 		if (!PageChecked(page))
 			ext2_check_page(page);
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c
+++ linux-2.6/fs/namei.c
@@ -2641,7 +2641,7 @@ static char *page_getlink(struct dentry 
 	if (IS_ERR(page))
 		goto sync_fail;
 	wait_on_page_locked(page);
-	if (!PageUptodate(page))
+	if (!PageUptodate_NoLock(page))
 		goto async_fail;
 	*ppage = page;
 	return kmap(page);
Index: linux-2.6/fs/partitions/check.c
===================================================================
--- linux-2.6.orig/fs/partitions/check.c
+++ linux-2.6/fs/partitions/check.c
@@ -562,7 +562,7 @@ unsigned char *read_dev_sector(struct bl
 				 NULL);
 	if (!IS_ERR(page)) {
 		wait_on_page_locked(page);
-		if (!PageUptodate(page))
+		if (!PageUptodate_NoLock(page))
 			goto fail;
 		if (PageError(page))
 			goto fail;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 2/3] fs: buffer don't PageUptodate without page locked
  2007-02-06  8:02 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem Nick Piggin
  2007-02-06  8:02 ` [patch 1/3] mm: fix PageUptodate memorder Nick Piggin
@ 2007-02-06  8:02 ` Nick Piggin
  2007-02-06  8:21   ` Andrew Morton
  2007-02-06  8:02 ` [patch 3/3] mm: make read_cache_page synchronous Nick Piggin
  2007-02-06 22:58 ` [patch 0/3] 2.6.20 fix for PageUptodate memorder problem David Chinner
  3 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2007-02-06  8:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Andrew Morton, Nick Piggin, Linux Kernel,
	Linux Memory Management, Linux Filesystems

__block_write_full_page is calling SetPageUptodate without the page locked.
This is unusual, but not incorrect, as PG_writeback is still set.

However with the previous patch, this is now a problem: so don't bother
setting the page uptodate in this case (it is weird that the write path
does such a thing anyway). Instead just leave it to the read side to bring
the page uptodate when it notices that all buffers are uptodate.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1679,6 +1679,7 @@ static int __block_write_full_page(struc
 	 */
 	BUG_ON(PageWriteback(page));
 	set_page_writeback(page);
+	unlock_page(page);
 
 	do {
 		struct buffer_head *next = bh->b_this_page;
@@ -1688,7 +1689,6 @@ static int __block_write_full_page(struc
 		}
 		bh = next;
 	} while (bh != head);
-	unlock_page(page);
 
 	err = 0;
 done:
@@ -1698,17 +1698,8 @@ done:
 		 * clean.  Someone wrote them back by hand with
 		 * ll_rw_block/submit_bh.  A rare case.
 		 */
-		int uptodate = 1;
-		do {
-			if (!buffer_uptodate(bh)) {
-				uptodate = 0;
-				break;
-			}
-			bh = bh->b_this_page;
-		} while (bh != head);
-		if (uptodate)
-			SetPageUptodate(page);
 		end_page_writeback(page);
+
 		/*
 		 * The page and buffer_heads can be released at any time from
 		 * here on.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 3/3] mm: make read_cache_page synchronous
  2007-02-06  8:02 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem Nick Piggin
  2007-02-06  8:02 ` [patch 1/3] mm: fix PageUptodate memorder Nick Piggin
  2007-02-06  8:02 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin
@ 2007-02-06  8:02 ` Nick Piggin
  2007-02-06  8:28   ` Andrew Morton
  2007-02-06 22:58 ` [patch 0/3] 2.6.20 fix for PageUptodate memorder problem David Chinner
  3 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2007-02-06  8:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Andrew Morton, Nick Piggin, Linux Kernel,
	Linux Memory Management, Linux Filesystems

Ensure pages are uptodate after returning from read_cache_page, which allows
us to cut out most of the filesystem-internal PageUptodate_NoLock calls.

I didn't have a great look down the call chains, but this appears to fixes 7
possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs,
1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd.
All depending on whether the filler is async and/or can return with a !uptodate
page.

Also, a memory leak in sys_swapon().

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/fs/afs/dir.c
===================================================================
--- linux-2.6.orig/fs/afs/dir.c
+++ linux-2.6/fs/afs/dir.c
@@ -187,10 +187,7 @@ static struct page *afs_dir_get_page(str
 
 	page = read_mapping_page(dir->i_mapping, index, NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
-		if (!PageUptodate(page))
-			goto fail;
 		if (!PageChecked(page))
 			afs_dir_check_page(dir, page);
 		if (PageError(page))
Index: linux-2.6/fs/afs/mntpt.c
===================================================================
--- linux-2.6.orig/fs/afs/mntpt.c
+++ linux-2.6/fs/afs/mntpt.c
@@ -77,13 +77,11 @@ int afs_mntpt_check_symlink(struct afs_v
 	}
 
 	ret = -EIO;
-	wait_on_page_locked(page);
-	buf = kmap(page);
-	if (!PageUptodate(page))
-		goto out_free;
 	if (PageError(page))
 		goto out_free;
 
+	buf = kmap(page);
+
 	/* examine the symlink's contents */
 	size = vnode->status.size;
 	_debug("symlink to %*.*s", size, (int) size, buf);
@@ -100,8 +98,8 @@ int afs_mntpt_check_symlink(struct afs_v
 
 	ret = 0;
 
- out_free:
 	kunmap(page);
+ out_free:
 	page_cache_release(page);
  out:
 	_leave(" = %d", ret);
@@ -184,8 +182,7 @@ static struct vfsmount *afs_mntpt_do_aut
 	}
 
 	ret = -EIO;
-	wait_on_page_locked(page);
-	if (!PageUptodate(page) || PageError(page))
+	if (PageError(page))
 		goto error;
 
 	buf = kmap(page);
Index: linux-2.6/fs/cramfs/inode.c
===================================================================
--- linux-2.6.orig/fs/cramfs/inode.c
+++ linux-2.6/fs/cramfs/inode.c
@@ -180,7 +180,8 @@ static void *cramfs_read(struct super_bl
 		struct page *page = NULL;
 
 		if (blocknr + i < devsize) {
-			page = read_mapping_page(mapping, blocknr + i, NULL);
+			page = read_mapping_page_async(mapping, blocknr + i,
+									NULL);
 			/* synchronous error? */
 			if (IS_ERR(page))
 				page = NULL;
Index: linux-2.6/fs/ext2/dir.c
===================================================================
--- linux-2.6.orig/fs/ext2/dir.c
+++ linux-2.6/fs/ext2/dir.c
@@ -161,10 +161,7 @@ static struct page * ext2_get_page(struc
 	struct address_space *mapping = dir->i_mapping;
 	struct page *page = read_mapping_page(mapping, n, NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
-		if (!PageUptodate_NoLock(page))
-			goto fail;
 		if (!PageChecked(page))
 			ext2_check_page(page);
 		if (PageError(page))
Index: linux-2.6/fs/freevxfs/vxfs_subr.c
===================================================================
--- linux-2.6.orig/fs/freevxfs/vxfs_subr.c
+++ linux-2.6/fs/freevxfs/vxfs_subr.c
@@ -74,10 +74,7 @@ vxfs_get_page(struct address_space *mapp
 	pp = read_mapping_page(mapping, n, NULL);
 
 	if (!IS_ERR(pp)) {
-		wait_on_page_locked(pp);
 		kmap(pp);
-		if (!PageUptodate(pp))
-			goto fail;
 		/** if (!PageChecked(pp)) **/
 			/** vxfs_check_page(pp); **/
 		if (PageError(pp))
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1756,7 +1756,7 @@ int generic_file_readonly_mmap(struct fi
 EXPORT_SYMBOL(generic_file_mmap);
 EXPORT_SYMBOL(generic_file_readonly_mmap);
 
-static inline struct page *__read_cache_page(struct address_space *mapping,
+static struct page *__read_cache_page(struct address_space *mapping,
 				unsigned long index,
 				int (*filler)(void *,struct page*),
 				void *data)
@@ -1793,17 +1793,11 @@ repeat:
 	return page;
 }
 
-/**
- * read_cache_page - read into page cache, fill it if needed
- * @mapping:	the page's address_space
- * @index:	the page index
- * @filler:	function to perform the read
- * @data:	destination for read data
- *
- * Read into the page cache. If a page already exists,
- * and PageUptodate() is not set, try to fill the page.
+/*
+ * Same as read_cache_page, but don't wait for page to become unlocked
+ * after submitting it to the filler.
  */
-struct page *read_cache_page(struct address_space *mapping,
+struct page *read_cache_page_async(struct address_space *mapping,
 				unsigned long index,
 				int (*filler)(void *,struct page*),
 				void *data)
@@ -1815,7 +1809,6 @@ retry:
 	page = __read_cache_page(mapping, index, filler, data);
 	if (IS_ERR(page))
 		goto out;
-	mark_page_accessed(page);
 	if (PageUptodate_NoLock(page))
 		goto out;
 
@@ -1835,6 +1828,37 @@ retry:
 		page = ERR_PTR(err);
 	}
  out:
+	mark_page_accessed(page);
+	return page;
+}
+EXPORT_SYMBOL(read_cache_page_async);
+
+/**
+ * read_cache_page - read into page cache, fill it if needed
+ * @mapping:	the page's address_space
+ * @index:	the page index
+ * @filler:	function to perform the read
+ * @data:	destination for read data
+ *
+ * Read into the page cache. If a page already exists,
+ * and PageUptodate() is not set, try to fill the page.
+ */
+struct page *read_cache_page(struct address_space *mapping,
+				unsigned long index,
+				int (*filler)(void *,struct page*),
+				void *data)
+{
+	struct page *page;
+
+	page = read_cache_page_async(mapping, index, filler, data);
+	if (IS_ERR(page))
+		goto out;
+	wait_on_page_locked(page);
+	if (!PageUptodate_NoLock(page)) {
+		page_cache_release(page);
+		page = ERR_PTR(-EIO);
+	}
+ out:
 	return page;
 }
 EXPORT_SYMBOL(read_cache_page);
Index: linux-2.6/fs/minix/dir.c
===================================================================
--- linux-2.6.orig/fs/minix/dir.c
+++ linux-2.6/fs/minix/dir.c
@@ -62,7 +62,6 @@ static struct page * dir_get_page(struct
 	struct address_space *mapping = dir->i_mapping;
 	struct page *page = read_mapping_page(mapping, n, NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
 		if (!PageUptodate(page))
 			goto fail;
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c
+++ linux-2.6/fs/namei.c
@@ -2639,19 +2639,9 @@ static char *page_getlink(struct dentry 
 	struct address_space *mapping = dentry->d_inode->i_mapping;
 	page = read_mapping_page(mapping, 0, NULL);
 	if (IS_ERR(page))
-		goto sync_fail;
-	wait_on_page_locked(page);
-	if (!PageUptodate_NoLock(page))
-		goto async_fail;
+		return (char *)page;
 	*ppage = page;
 	return kmap(page);
-
-async_fail:
-	page_cache_release(page);
-	return ERR_PTR(-EIO);
-
-sync_fail:
-	return (char*)page;
 }
 
 int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
Index: linux-2.6/fs/ntfs/aops.h
===================================================================
--- linux-2.6.orig/fs/ntfs/aops.h
+++ linux-2.6/fs/ntfs/aops.h
@@ -89,9 +89,8 @@ static inline struct page *ntfs_map_page
 	struct page *page = read_mapping_page(mapping, index, NULL);
 
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
-		if (PageUptodate(page) && !PageError(page))
+		if (!PageError(page))
 			return page;
 		ntfs_unmap_page(page);
 		return ERR_PTR(-EIO);
Index: linux-2.6/fs/ntfs/attrib.c
===================================================================
--- linux-2.6.orig/fs/ntfs/attrib.c
+++ linux-2.6/fs/ntfs/attrib.c
@@ -2532,14 +2532,7 @@ int ntfs_attr_set(ntfs_inode *ni, const 
 		page = read_mapping_page(mapping, idx, NULL);
 		if (IS_ERR(page)) {
 			ntfs_error(vol->sb, "Failed to read first partial "
-					"page (sync error, index 0x%lx).", idx);
-			return PTR_ERR(page);
-		}
-		wait_on_page_locked(page);
-		if (unlikely(!PageUptodate(page))) {
-			ntfs_error(vol->sb, "Failed to read first partial page "
-					"(async error, index 0x%lx).", idx);
-			page_cache_release(page);
+					"page (error, index 0x%lx).", idx);
 			return PTR_ERR(page);
 		}
 		/*
@@ -2602,14 +2595,7 @@ int ntfs_attr_set(ntfs_inode *ni, const 
 		page = read_mapping_page(mapping, idx, NULL);
 		if (IS_ERR(page)) {
 			ntfs_error(vol->sb, "Failed to read last partial page "
-					"(sync error, index 0x%lx).", idx);
-			return PTR_ERR(page);
-		}
-		wait_on_page_locked(page);
-		if (unlikely(!PageUptodate(page))) {
-			ntfs_error(vol->sb, "Failed to read last partial page "
-					"(async error, index 0x%lx).", idx);
-			page_cache_release(page);
+					"(error, index 0x%lx).", idx);
 			return PTR_ERR(page);
 		}
 		kaddr = kmap_atomic(page, KM_USER0);
Index: linux-2.6/fs/ntfs/file.c
===================================================================
--- linux-2.6.orig/fs/ntfs/file.c
+++ linux-2.6/fs/ntfs/file.c
@@ -236,8 +236,7 @@ do_non_resident_extend:
 			err = PTR_ERR(page);
 			goto init_err_out;
 		}
-		wait_on_page_locked(page);
-		if (unlikely(!PageUptodate(page) || PageError(page))) {
+		if (unlikely(PageError(page))) {
 			page_cache_release(page);
 			err = -EIO;
 			goto init_err_out;
Index: linux-2.6/fs/ocfs2/symlink.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/symlink.c
+++ linux-2.6/fs/ocfs2/symlink.c
@@ -67,16 +67,9 @@ static char *ocfs2_page_getlink(struct d
 	page = read_mapping_page(mapping, 0, NULL);
 	if (IS_ERR(page))
 		goto sync_fail;
-	wait_on_page_locked(page);
-	if (!PageUptodate(page))
-		goto async_fail;
 	*ppage = page;
 	return kmap(page);
 
-async_fail:
-	page_cache_release(page);
-	return ERR_PTR(-EIO);
-
 sync_fail:
 	return (char*)page;
 }
Index: linux-2.6/fs/partitions/check.c
===================================================================
--- linux-2.6.orig/fs/partitions/check.c
+++ linux-2.6/fs/partitions/check.c
@@ -561,9 +561,6 @@ unsigned char *read_dev_sector(struct bl
 	page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_CACHE_SHIFT-9)),
 				 NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
-		if (!PageUptodate_NoLock(page))
-			goto fail;
 		if (PageError(page))
 			goto fail;
 		p->v = page;
Index: linux-2.6/fs/reiserfs/xattr.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/xattr.c
+++ linux-2.6/fs/reiserfs/xattr.c
@@ -454,11 +454,7 @@ static struct page *reiserfs_get_page(st
 	mapping_set_gfp_mask(mapping, GFP_NOFS);
 	page = read_mapping_page(mapping, n, NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
-		if (!PageUptodate(page))
-			goto fail;
-
 		if (PageError(page))
 			goto fail;
 	}
Index: linux-2.6/fs/sysv/dir.c
===================================================================
--- linux-2.6.orig/fs/sysv/dir.c
+++ linux-2.6/fs/sysv/dir.c
@@ -54,17 +54,9 @@ static struct page * dir_get_page(struct
 {
 	struct address_space *mapping = dir->i_mapping;
 	struct page *page = read_mapping_page(mapping, n, NULL);
-	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
+	if (!IS_ERR(page))
 		kmap(page);
-		if (!PageUptodate(page))
-			goto fail;
-	}
 	return page;
-
-fail:
-	dir_put_page(page);
-	return ERR_PTR(-EIO);
 }
 
 static int sysv_readdir(struct file * filp, void * dirent, filldir_t filldir)
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c
+++ linux-2.6/mm/swapfile.c
@@ -1531,9 +1531,6 @@ asmlinkage long sys_swapon(const char __
 		error = PTR_ERR(page);
 		goto bad_swap;
 	}
-	wait_on_page_locked(page);
-	if (!PageUptodate_NoLock(page))
-		goto bad_swap;
 	kmap(page);
 	swap_header = page_address(page);
 
Index: linux-2.6/drivers/mtd/devices/block2mtd.c
===================================================================
--- linux-2.6.orig/drivers/mtd/devices/block2mtd.c
+++ linux-2.6/drivers/mtd/devices/block2mtd.c
@@ -88,9 +88,8 @@ static void cache_readahead(struct addre
 
 static struct page* page_readahead(struct address_space *mapping, int index)
 {
-	filler_t *filler = (filler_t*)mapping->a_ops->readpage;
 	cache_readahead(mapping, index);
-	return read_cache_page(mapping, index, filler, NULL);
+	return read_mapping_page(mapping, index, NULL);
 }
 
 
Index: linux-2.6/fs/ecryptfs/mmap.c
===================================================================
--- linux-2.6.orig/fs/ecryptfs/mmap.c
+++ linux-2.6/fs/ecryptfs/mmap.c
@@ -54,14 +54,7 @@ static struct page *ecryptfs_get1page(st
 	dentry = file->f_path.dentry;
 	inode = dentry->d_inode;
 	mapping = inode->i_mapping;
-	page = read_cache_page(mapping, index,
-			       (filler_t *)mapping->a_ops->readpage,
-			       (void *)file);
-	if (IS_ERR(page))
-		goto out;
-	wait_on_page_locked(page);
-out:
-	return page;
+	return read_mapping_page(mapping, index, (void *)file);
 }
 
 static
@@ -233,7 +226,6 @@ int ecryptfs_do_readpage(struct file *fi
 		ecryptfs_printk(KERN_ERR, "Error reading from page cache\n");
 		goto out;
 	}
-	wait_on_page_locked(lower_page);
 	page_data = (char *)kmap(page);
 	if (!page_data) {
 		rc = -ENOMEM;
Index: linux-2.6/fs/nfs/dir.c
===================================================================
--- linux-2.6.orig/fs/nfs/dir.c
+++ linux-2.6/fs/nfs/dir.c
@@ -322,8 +322,6 @@ int find_dirent_page(nfs_readdir_descrip
 		status = PTR_ERR(page);
 		goto out;
 	}
-	if (!PageUptodate(page))
-		goto read_error;
 
 	/* NOTE: Someone else may have changed the READDIRPLUS flag */
 	desc->page = page;
@@ -337,9 +335,6 @@ int find_dirent_page(nfs_readdir_descrip
  out:
 	dfprintk(DIRCACHE, "NFS: %s: returns %d\n", __FUNCTION__, status);
 	return status;
- read_error:
-	page_cache_release(page);
-	return -EIO;
 }
 
 /*
Index: linux-2.6/fs/nfs/symlink.c
===================================================================
--- linux-2.6.orig/fs/nfs/symlink.c
+++ linux-2.6/fs/nfs/symlink.c
@@ -61,15 +61,9 @@ static void *nfs_follow_link(struct dent
 		err = page;
 		goto read_failed;
 	}
-	if (!PageUptodate(page)) {
-		err = ERR_PTR(-EIO);
-		goto getlink_read_error;
-	}
 	nd_set_link(nd, kmap(page));
 	return page;
 
-getlink_read_error:
-	page_cache_release(page);
 read_failed:
 	nd_set_link(nd, err);
 	return NULL;
Index: linux-2.6/fs/ntfs/super.c
===================================================================
--- linux-2.6.orig/fs/ntfs/super.c
+++ linux-2.6/fs/ntfs/super.c
@@ -2471,7 +2471,6 @@ static s64 get_nr_free_clusters(ntfs_vol
 	s64 nr_free = vol->nr_clusters;
 	u32 *kaddr;
 	struct address_space *mapping = vol->lcnbmp_ino->i_mapping;
-	filler_t *readpage = (filler_t*)mapping->a_ops->readpage;
 	struct page *page;
 	pgoff_t index, max_index;
 
@@ -2494,24 +2493,14 @@ static s64 get_nr_free_clusters(ntfs_vol
 		 * Read the page from page cache, getting it from backing store
 		 * if necessary, and increment the use count.
 		 */
-		page = read_cache_page(mapping, index, (filler_t*)readpage,
-				NULL);
+		page = read_mapping_page(mapping, index, NULL);
 		/* Ignore pages which errored synchronously. */
 		if (IS_ERR(page)) {
-			ntfs_debug("Sync read_cache_page() error. Skipping "
+			ntfs_debug("read_mapping_page() error. Skipping "
 					"page (index 0x%lx).", index);
 			nr_free -= PAGE_CACHE_SIZE * 8;
 			continue;
 		}
-		wait_on_page_locked(page);
-		/* Ignore pages which errored asynchronously. */
-		if (!PageUptodate(page)) {
-			ntfs_debug("Async read_cache_page() error. Skipping "
-					"page (index 0x%lx).", index);
-			page_cache_release(page);
-			nr_free -= PAGE_CACHE_SIZE * 8;
-			continue;
-		}
 		kaddr = (u32*)kmap_atomic(page, KM_USER0);
 		/*
 		 * For each 4 bytes, subtract the number of set bits. If this
@@ -2562,7 +2551,6 @@ static unsigned long __get_nr_free_mft_r
 {
 	u32 *kaddr;
 	struct address_space *mapping = vol->mftbmp_ino->i_mapping;
-	filler_t *readpage = (filler_t*)mapping->a_ops->readpage;
 	struct page *page;
 	pgoff_t index;
 
@@ -2576,21 +2564,11 @@ static unsigned long __get_nr_free_mft_r
 		 * Read the page from page cache, getting it from backing store
 		 * if necessary, and increment the use count.
 		 */
-		page = read_cache_page(mapping, index, (filler_t*)readpage,
-				NULL);
+		page = read_mapping_page(mapping, index, NULL);
 		/* Ignore pages which errored synchronously. */
 		if (IS_ERR(page)) {
-			ntfs_debug("Sync read_cache_page() error. Skipping "
-					"page (index 0x%lx).", index);
-			nr_free -= PAGE_CACHE_SIZE * 8;
-			continue;
-		}
-		wait_on_page_locked(page);
-		/* Ignore pages which errored asynchronously. */
-		if (!PageUptodate(page)) {
-			ntfs_debug("Async read_cache_page() error. Skipping "
+			ntfs_debug("read_mapping_page() error. Skipping "
 					"page (index 0x%lx).", index);
-			page_cache_release(page);
 			nr_free -= PAGE_CACHE_SIZE * 8;
 			continue;
 		}
Index: linux-2.6/fs/ufs/dir.c
===================================================================
--- linux-2.6.orig/fs/ufs/dir.c
+++ linux-2.6/fs/ufs/dir.c
@@ -180,13 +180,9 @@ fail:
 static struct page *ufs_get_page(struct inode *dir, unsigned long n)
 {
 	struct address_space *mapping = dir->i_mapping;
-	struct page *page = read_cache_page(mapping, n,
-				(filler_t*)mapping->a_ops->readpage, NULL);
+	struct page *page = read_mapping_page(mapping, n, NULL);
 	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
 		kmap(page);
-		if (!PageUptodate(page))
-			goto fail;
 		if (!PageChecked(page))
 			ufs_check_page(page);
 		if (PageError(page))
Index: linux-2.6/fs/ufs/util.c
===================================================================
--- linux-2.6.orig/fs/ufs/util.c
+++ linux-2.6/fs/ufs/util.c
@@ -251,13 +251,11 @@ struct page *ufs_get_locked_page(struct 
 
 	page = find_lock_page(mapping, index);
 	if (!page) {
-		page = read_cache_page(mapping, index,
-				       (filler_t*)mapping->a_ops->readpage,
-				       NULL);
+		page = read_mapping_page(mapping, index, NULL);
 
 		if (IS_ERR(page)) {
 			printk(KERN_ERR "ufs_change_blocknr: "
-			       "read_cache_page error: ino %lu, index: %lu\n",
+			       "read_mapping_page error: ino %lu, index: %lu\n",
 			       mapping->host->i_ino, index);
 			goto out;
 		}
Index: linux-2.6/include/linux/pagemap.h
===================================================================
--- linux-2.6.orig/include/linux/pagemap.h
+++ linux-2.6/include/linux/pagemap.h
@@ -97,12 +97,23 @@ static inline struct page *grab_cache_pa
 
 extern struct page * grab_cache_page_nowait(struct address_space *mapping,
 				unsigned long index);
+extern struct page * read_cache_page_async(struct address_space *mapping,
+				unsigned long index, filler_t *filler,
+				void *data);
 extern struct page * read_cache_page(struct address_space *mapping,
 				unsigned long index, filler_t *filler,
 				void *data);
 extern int read_cache_pages(struct address_space *mapping,
 		struct list_head *pages, filler_t *filler, void *data);
 
+static inline struct page *read_mapping_page_async(
+						struct address_space *mapping,
+					     unsigned long index, void *data)
+{
+	filler_t *filler = (filler_t *)mapping->a_ops->readpage;
+	return read_cache_page_async(mapping, index, filler, data);
+}
+
 static inline struct page *read_mapping_page(struct address_space *mapping,
 					     unsigned long index, void *data)
 {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2/3] fs: buffer don't PageUptodate without page locked
  2007-02-06  8:02 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin
@ 2007-02-06  8:21   ` Andrew Morton
  2007-02-06  8:31     ` Nick Piggin
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2007-02-06  8:21 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linus Torvalds, Hugh Dickins, Linux Kernel,
	Linux Memory Management, Linux Filesystems

On Tue,  6 Feb 2007 09:02:23 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote:

> __block_write_full_page is calling SetPageUptodate without the page locked.
> This is unusual, but not incorrect, as PG_writeback is still set.
> 
> However with the previous patch, this is now a problem: so don't bother
> setting the page uptodate in this case (it is weird that the write path
> does such a thing anyway). Instead just leave it to the read side to bring
> the page uptodate when it notices that all buffers are uptodate.
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> 
> Index: linux-2.6/fs/buffer.c
> ===================================================================
> --- linux-2.6.orig/fs/buffer.c
> +++ linux-2.6/fs/buffer.c
> @@ -1679,6 +1679,7 @@ static int __block_write_full_page(struc
>  	 */
>  	BUG_ON(PageWriteback(page));
>  	set_page_writeback(page);
> +	unlock_page(page);
>  
>  	do {
>  		struct buffer_head *next = bh->b_this_page;
> @@ -1688,7 +1689,6 @@ static int __block_write_full_page(struc
>  		}
>  		bh = next;
>  	} while (bh != head);
> -	unlock_page(page);
>  
>  	err = 0;
>  done:

Why this change?  Without looking at it too hard, it seems that if
submit_bh() completes synchronously, this thread can end up playing with
the buffers on a non-locked, non-PageWriteback page.  Someone else could
whip the buffers away and oops?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 1/3] mm: fix PageUptodate memorder
  2007-02-06  8:02 ` [patch 1/3] mm: fix PageUptodate memorder Nick Piggin
@ 2007-02-06  8:25   ` Andrew Morton
  2007-02-06  8:51     ` Nick Piggin
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2007-02-06  8:25 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linus Torvalds, Hugh Dickins, Linux Kernel,
	Linux Memory Management, Linux Filesystems

On Tue,  6 Feb 2007 09:02:11 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote:

> +static inline void __SetPageUptodate(struct page *page)
> +{
> +#ifdef CONFIG_S390
>  	if (!test_and_set_bit(PG_uptodate, &page->flags))
>  		page_test_and_clear_dirty(page);
> -}
>  #else
> -#define SetPageUptodate(page)	set_bit(PG_uptodate, &(page)->flags)
> +	/*
> +	 * Memory barrier must be issued before setting the PG_uptodate bit,
> +	 * so all previous writes that served to bring the page uptodate are
> +	 * visible before PageUptodate becomes true.
> +	 *
> +	 * S390 is guaranteed to have a barrier in the test_and_set operation
> +	 * (see Documentation/atomic_ops.txt).
> +	 *
> +	 * XXX: does this memory barrier need to be anything special to
> +	 * handle things like DMA writes into the page?
> +	 */
> +	smp_wmb();
> +	set_bit(PG_uptodate, &(page)->flags);
>  #endif
> +}
> +
> +static inline void SetPageUptodate(struct page *page)
> +{
> +	WARN_ON(!PageLocked(page));
> +	__SetPageUptodate(page);
> +}
> +
> +static inline void SetNewPageUptodate(struct page *page)
> +{
> +	__SetPageUptodate(page);
> +}

I was panicing for a minute when I saw that __SetPageUptodate() in there.

Conventionally the __SetPageFoo namespace is for nonatomic updates to
page->flags.  Can we call this something different?


What a fugly patchset :(

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 3/3] mm: make read_cache_page synchronous
  2007-02-06  8:02 ` [patch 3/3] mm: make read_cache_page synchronous Nick Piggin
@ 2007-02-06  8:28   ` Andrew Morton
  2007-02-06  8:56     ` Nick Piggin
  2007-02-06  8:58     ` Nick Piggin
  0 siblings, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2007-02-06  8:28 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linus Torvalds, Hugh Dickins, Linux Kernel,
	Linux Memory Management, Linux Filesystems

On Tue,  6 Feb 2007 09:02:33 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote:

> Ensure pages are uptodate after returning from read_cache_page, which allows
> us to cut out most of the filesystem-internal PageUptodate_NoLock calls.

Normally it's good to rename functions when we change their behaviour, but
I guess any missed (or out-of-tree) filesystems will just end up doing a
pointless wait_on_page_locked() and will continue to work OK, yes?

> I didn't have a great look down the call chains, but this appears to fixes 7
> possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs,
> 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd.
> All depending on whether the filler is async and/or can return with a !uptodate
> page.
> 
> Also, a memory leak in sys_swapon().

Separate patch?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2/3] fs: buffer don't PageUptodate without page locked
  2007-02-06  8:21   ` Andrew Morton
@ 2007-02-06  8:31     ` Nick Piggin
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2007-02-06  8:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Hugh Dickins, Linux Kernel,
	Linux Memory Management, Linux Filesystems

On Tue, Feb 06, 2007 at 12:21:40AM -0800, Andrew Morton wrote:
> On Tue,  6 Feb 2007 09:02:23 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote:
> 
> > __block_write_full_page is calling SetPageUptodate without the page locked.
> > This is unusual, but not incorrect, as PG_writeback is still set.
> > 
> > However with the previous patch, this is now a problem: so don't bother
> > setting the page uptodate in this case (it is weird that the write path
> > does such a thing anyway). Instead just leave it to the read side to bring
> > the page uptodate when it notices that all buffers are uptodate.
> > 
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > 
> > Index: linux-2.6/fs/buffer.c
> > ===================================================================
> > --- linux-2.6.orig/fs/buffer.c
> > +++ linux-2.6/fs/buffer.c
> > @@ -1679,6 +1679,7 @@ static int __block_write_full_page(struc
> >  	 */
> >  	BUG_ON(PageWriteback(page));
> >  	set_page_writeback(page);
> > +	unlock_page(page);
> >  
> >  	do {
> >  		struct buffer_head *next = bh->b_this_page;
> > @@ -1688,7 +1689,6 @@ static int __block_write_full_page(struc
> >  		}
> >  		bh = next;
> >  	} while (bh != head);
> > -	unlock_page(page);
> >  
> >  	err = 0;
> >  done:
> 
> Why this change?  Without looking at it too hard, it seems that if
> submit_bh() completes synchronously, this thread can end up playing with
> the buffers on a non-locked, non-PageWriteback page.  Someone else could
> whip the buffers away and oops?

Hmm, it definitely shouldn't be there, it leaked in from another patch
to bring partiy with the error handling...

Here is an updated patch.

--

__block_write_full_page is calling SetPageUptodate without the page locked.
This is unusual, but not incorrect, as PG_writeback is still set.

However with the previous patch, this is now a problem: so don't bother
setting the page uptodate in this case (it is weird that the write path
does such a thing anyway). Instead just leave it to the read side to bring
the page uptodate when it notices that all buffers are uptodate.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1698,17 +1698,8 @@ done:
 		 * clean.  Someone wrote them back by hand with
 		 * ll_rw_block/submit_bh.  A rare case.
 		 */
-		int uptodate = 1;
-		do {
-			if (!buffer_uptodate(bh)) {
-				uptodate = 0;
-				break;
-			}
-			bh = bh->b_this_page;
-		} while (bh != head);
-		if (uptodate)
-			SetPageUptodate(page);
 		end_page_writeback(page);
+
 		/*
 		 * The page and buffer_heads can be released at any time from
 		 * here on.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 1/3] mm: fix PageUptodate memorder
  2007-02-06  8:25   ` Andrew Morton
@ 2007-02-06  8:51     ` Nick Piggin
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2007-02-06  8:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Linus Torvalds, Hugh Dickins, Linux Kernel,
	Linux Memory Management, Linux Filesystems

Andrew Morton wrote:
> On Tue,  6 Feb 2007 09:02:11 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote:
> 
> 
>>+static inline void __SetPageUptodate(struct page *page)
>>+{
>>+#ifdef CONFIG_S390
>> 	if (!test_and_set_bit(PG_uptodate, &page->flags))
>> 		page_test_and_clear_dirty(page);
>>-}
>> #else
>>-#define SetPageUptodate(page)	set_bit(PG_uptodate, &(page)->flags)
>>+	/*
>>+	 * Memory barrier must be issued before setting the PG_uptodate bit,
>>+	 * so all previous writes that served to bring the page uptodate are
>>+	 * visible before PageUptodate becomes true.
>>+	 *
>>+	 * S390 is guaranteed to have a barrier in the test_and_set operation
>>+	 * (see Documentation/atomic_ops.txt).
>>+	 *
>>+	 * XXX: does this memory barrier need to be anything special to
>>+	 * handle things like DMA writes into the page?
>>+	 */
>>+	smp_wmb();
>>+	set_bit(PG_uptodate, &(page)->flags);
>> #endif
>>+}
>>+
>>+static inline void SetPageUptodate(struct page *page)
>>+{
>>+	WARN_ON(!PageLocked(page));
>>+	__SetPageUptodate(page);
>>+}
>>+
>>+static inline void SetNewPageUptodate(struct page *page)
>>+{
>>+	__SetPageUptodate(page);
>>+}
> 
> 
> I was panicing for a minute when I saw that __SetPageUptodate() in there.
> 
> Conventionally the __SetPageFoo namespace is for nonatomic updates to
> page->flags.  Can we call this something different?

Duh, of course, sorry.

> What a fugly patchset :(

Fugly problem. One could fix it by always locking the page, but I was
worried about Hugh flaming me if I tried that ;)

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 3/3] mm: make read_cache_page synchronous
  2007-02-06  8:28   ` Andrew Morton
@ 2007-02-06  8:56     ` Nick Piggin
  2007-02-06  8:58     ` Nick Piggin
  1 sibling, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2007-02-06  8:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Linus Torvalds, Hugh Dickins, Linux Kernel,
	Linux Memory Management, Linux Filesystems

Andrew Morton wrote:
> On Tue,  6 Feb 2007 09:02:33 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote:
> 
> 
>>Ensure pages are uptodate after returning from read_cache_page, which allows
>>us to cut out most of the filesystem-internal PageUptodate_NoLock calls.
> 
> 
> Normally it's good to rename functions when we change their behaviour, but
> I guess any missed (or out-of-tree) filesystems will just end up doing a
> pointless wait_on_page_locked() and will continue to work OK, yes?

Yeah.

> 
> 
>>I didn't have a great look down the call chains, but this appears to fixes 7
>>possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs,
>>1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd.
>>All depending on whether the filler is async and/or can return with a !uptodate
>>page.
>>
>>Also, a memory leak in sys_swapon().
> 
> 
> Separate patch?

Well its fixed by virtue of read_cache_page now correctly dropping the page
refcount if it finds the page !uptodate, rather than any special logic I
added.

I can do another patch though. No problem, I'll be resending the series after
this round of feedback.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 3/3] mm: make read_cache_page synchronous
  2007-02-06  8:28   ` Andrew Morton
  2007-02-06  8:56     ` Nick Piggin
@ 2007-02-06  8:58     ` Nick Piggin
  1 sibling, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2007-02-06  8:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Hugh Dickins, Linux Kernel,
	Linux Memory Management, Linux Filesystems

On Tue, Feb 06, 2007 at 12:28:39AM -0800, Andrew Morton wrote:
> > 
> > Also, a memory leak in sys_swapon().
> 
> Separate patch?

Gack, I'm an idiot, there is no memory leak :P

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/3] 2.6.20 fix for PageUptodate memorder problem
  2007-02-06  8:02 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem Nick Piggin
                   ` (2 preceding siblings ...)
  2007-02-06  8:02 ` [patch 3/3] mm: make read_cache_page synchronous Nick Piggin
@ 2007-02-06 22:58 ` David Chinner
  2007-02-07  3:13   ` Nick Piggin
  3 siblings, 1 reply; 16+ messages in thread
From: David Chinner @ 2007-02-06 22:58 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linus Torvalds, Hugh Dickins, Andrew Morton,
	Linux Memory Management, Linux Kernel, Linux Filesystems

On Tue, Feb 06, 2007 at 09:02:01AM +0100, Nick Piggin wrote:
> Still no independent confirmation as to whether this is a problem or not.
> I think it is, so I'll propose this patchset to fix it. Patch 1/3 has a
> reasonable description of the problem.
> 

Nick, can you include a diffstat at the head of your patches? Makes
it much easier to see what the scope of the changes are when you
are changing code in several filesystems....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/3] 2.6.20 fix for PageUptodate memorder problem
  2007-02-06 22:58 ` [patch 0/3] 2.6.20 fix for PageUptodate memorder problem David Chinner
@ 2007-02-07  3:13   ` Nick Piggin
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2007-02-07  3:13 UTC (permalink / raw)
  To: David Chinner
  Cc: Linus Torvalds, Hugh Dickins, Andrew Morton,
	Linux Memory Management, Linux Kernel, Linux Filesystems

On Wed, Feb 07, 2007 at 09:58:57AM +1100, David Chinner wrote:
> On Tue, Feb 06, 2007 at 09:02:01AM +0100, Nick Piggin wrote:
> > Still no independent confirmation as to whether this is a problem or not.
> > I think it is, so I'll propose this patchset to fix it. Patch 1/3 has a
> > reasonable description of the problem.
> > 
> 
> Nick, can you include a diffstat at the head of your patches? Makes
> it much easier to see what the scope of the changes are when you
> are changing code in several filesystems....

Good idea, I'll do that.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 2/3] fs: buffer don't PageUptodate without page locked
  2007-02-15  7:31 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 4) Nick Piggin
@ 2007-02-15  7:31 ` Nick Piggin
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2007-02-15  7:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel, Nick Piggin, Linux Memory Management

__block_write_full_page is calling SetPageUptodate without the page locked.
This is unusual, but not incorrect, as PG_writeback is still set.

However the next patch will require that SetPageUptodate always be called
with the page locked. Simply don't bother setting the page uptodate in this
case (it is unusual that the write path does such a thing anyway). Instead
just leave it to the read side to bring the page uptodate when it notices
that all buffers are uptodate.

Signed-off-by: Nick Piggin <npiggin@suse.de>

 fs/buffer.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1698,17 +1698,8 @@ done:
 		 * clean.  Someone wrote them back by hand with
 		 * ll_rw_block/submit_bh.  A rare case.
 		 */
-		int uptodate = 1;
-		do {
-			if (!buffer_uptodate(bh)) {
-				uptodate = 0;
-				break;
-			}
-			bh = bh->b_this_page;
-		} while (bh != head);
-		if (uptodate)
-			SetPageUptodate(page);
 		end_page_writeback(page);
+
 		/*
 		 * The page and buffer_heads can be released at any time from
 		 * here on.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 2/3] fs: buffer don't PageUptodate without page locked
  2007-02-10  2:31 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 3) Nick Piggin
@ 2007-02-10  2:31 ` Nick Piggin
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2007-02-10  2:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Nick Piggin, Linux Memory Management,
	Martin Schwidefsky, Linux Kernel

__block_write_full_page is calling SetPageUptodate without the page locked.
This is unusual, but not incorrect, as PG_writeback is still set.

However the next patch will require that SetPageUptodate always be called
with the page locked. Simply don't bother setting the page uptodate in this
case (it is unusual that the write path does such a thing anyway). Instead
just leave it to the read side to bring the page uptodate when it notices
that all buffers are uptodate.

Signed-off-by: Nick Piggin <npiggin@suse.de>

 fs/buffer.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1698,17 +1698,8 @@ done:
 		 * clean.  Someone wrote them back by hand with
 		 * ll_rw_block/submit_bh.  A rare case.
 		 */
-		int uptodate = 1;
-		do {
-			if (!buffer_uptodate(bh)) {
-				uptodate = 0;
-				break;
-			}
-			bh = bh->b_this_page;
-		} while (bh != head);
-		if (uptodate)
-			SetPageUptodate(page);
 		end_page_writeback(page);
+
 		/*
 		 * The page and buffer_heads can be released at any time from
 		 * here on.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 2/3] fs: buffer don't PageUptodate without page locked
  2007-02-08 13:26 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Nick Piggin
@ 2007-02-08 13:27 ` Nick Piggin
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2007-02-08 13:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Linux Memory Management, Benjamin Herrenschmidt,
	Nick Piggin, Linux Kernel

__block_write_full_page is calling SetPageUptodate without the page locked.
This is unusual, but not incorrect, as PG_writeback is still set.

However with the previous patch, this is now a problem: so don't bother
setting the page uptodate in this case (it is weird that the write path
does such a thing anyway). Instead just leave it to the read side to bring
the page uptodate when it notices that all buffers are uptodate.

Signed-off-by: Nick Piggin <npiggin@suse.de>

 fs/buffer.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1698,17 +1698,8 @@ done:
 		 * clean.  Someone wrote them back by hand with
 		 * ll_rw_block/submit_bh.  A rare case.
 		 */
-		int uptodate = 1;
-		do {
-			if (!buffer_uptodate(bh)) {
-				uptodate = 0;
-				break;
-			}
-			bh = bh->b_this_page;
-		} while (bh != head);
-		if (uptodate)
-			SetPageUptodate(page);
 		end_page_writeback(page);
+
 		/*
 		 * The page and buffer_heads can be released at any time from
 		 * here on.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-02-15  7:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-06  8:02 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem Nick Piggin
2007-02-06  8:02 ` [patch 1/3] mm: fix PageUptodate memorder Nick Piggin
2007-02-06  8:25   ` Andrew Morton
2007-02-06  8:51     ` Nick Piggin
2007-02-06  8:02 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin
2007-02-06  8:21   ` Andrew Morton
2007-02-06  8:31     ` Nick Piggin
2007-02-06  8:02 ` [patch 3/3] mm: make read_cache_page synchronous Nick Piggin
2007-02-06  8:28   ` Andrew Morton
2007-02-06  8:56     ` Nick Piggin
2007-02-06  8:58     ` Nick Piggin
2007-02-06 22:58 ` [patch 0/3] 2.6.20 fix for PageUptodate memorder problem David Chinner
2007-02-07  3:13   ` Nick Piggin
2007-02-08 13:26 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 2) Nick Piggin
2007-02-08 13:27 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin
2007-02-10  2:31 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 3) Nick Piggin
2007-02-10  2:31 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin
2007-02-15  7:31 [patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 4) Nick Piggin
2007-02-15  7:31 ` [patch 2/3] fs: buffer don't PageUptodate without page locked Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).