LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
	Zwane Mwaikambo <zwane@arm.linux.org.uk>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Dave Jones <davej@redhat.com>,
	Chuck Wolber <chuckw@quantumlinux.com>,
	Chris Wedgwood <reviews@ml.cw.f00f.org>,
	Michael Krufky <mkrufky@linuxtv.org>,
	Chuck Ebbert <cebbert@redhat.com>,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Zach Brown <zach.brown@oracle.com>,
	Benjamin LaHaise <bcrl@kvack.org>,
	Leonid Ananiev <leonid.i.ananiev@linux.intel.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>
Subject: [patch 28/31] dio: invalidate clean pages before dio write
Date: Mon, 19 Mar 2007 14:41:01 -0700	[thread overview]
Message-ID: <20070319214101.GD9261@kroah.com> (raw)
In-Reply-To: <20070319213647.GB9261@kroah.com>

[-- Attachment #1: dio-invalidate-clean-pages-before-dio-write.patch --]
[-- Type: text/plain, Size: 4482 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Zach Brown <zach.brown@oracle.com>

[PATCH] dio: invalidate clean pages before dio write

This patch fixes a user-triggerable oops that was reported by Leonid
Ananiev as archived at http://lkml.org/lkml/2007/2/8/337.

dio writes invalidate clean pages that intersect the written region so that
subsequent buffered reads go to disk to read the new data.  If this fails
the interface tries to tell the caller that the cache is inconsistent by
returning EIO.

Before this patch we had the problem where this invalidation failure would
clobber -EIOCBQUEUED as it made its way from fs/direct-io.c to fs/aio.c.
Both fs/aio.c and bio completion call aio_complete() and we reference freed
memory, usually oopsing.

This patch addresses this problem by invalidating before the write so that
we can cleanly return -EIO before ->direct_IO() has had a chance to return
-EIOCBQUEUED.

There is a compromise here.  During the dio write we can fault in mmap()ed
pages which intersect the written range with get_user_pages() if the user
provided them for the source buffer.  This is a crazy thing to do, but we
can make it mostly work in most cases by trying the invalidation again.
The compromise is that we won't return an error if this second invalidation
fails if it's an AIO write and we have -EIOCBQUEUED.

This was tested by having two processes race performing large O_DIRECT and
buffered ordered writes.  Within minutes ext3 would see a race between
ext3_releasepage() and jbd holding a reference on ordered data buffers and
would cause invalidation to fail, panicing the box.  The test can be found
in the 'aio_dio_bugs' test group in test.kernel.org/autotest.  After this
patch the test passes.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Leonid Ananiev <leonid.i.ananiev@linux.intel.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/filemap.c |   46 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 11 deletions(-)

--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2393,7 +2393,8 @@ generic_file_direct_IO(int rw, struct ki
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
 	ssize_t retval;
-	size_t write_len = 0;
+	size_t write_len;
+	pgoff_t end = 0; /* silence gcc */
 
 	/*
 	 * If it's a write, unmap all mmappings of the file up-front.  This
@@ -2402,23 +2403,46 @@ generic_file_direct_IO(int rw, struct ki
 	 */
 	if (rw == WRITE) {
 		write_len = iov_length(iov, nr_segs);
+		end = (offset + write_len - 1) >> PAGE_CACHE_SHIFT;
 	       	if (mapping_mapped(mapping))
 			unmap_mapping_range(mapping, offset, write_len, 0);
 	}
 
 	retval = filemap_write_and_wait(mapping);
-	if (retval == 0) {
-		retval = mapping->a_ops->direct_IO(rw, iocb, iov,
-						offset, nr_segs);
-		if (rw == WRITE && mapping->nrpages) {
-			pgoff_t end = (offset + write_len - 1)
-						>> PAGE_CACHE_SHIFT;
-			int err = invalidate_inode_pages2_range(mapping,
+	if (retval)
+		goto out;
+
+	/*
+	 * After a write we want buffered reads to be sure to go to disk to get
+	 * the new data.  We invalidate clean cached page from the region we're
+	 * about to write.  We do this *before* the write so that we can return
+	 * -EIO without clobbering -EIOCBQUEUED from ->direct_IO().
+	 */
+	if (rw == WRITE && mapping->nrpages) {
+		retval = invalidate_inode_pages2_range(mapping,
 					offset >> PAGE_CACHE_SHIFT, end);
-			if (err)
-				retval = err;
-		}
+		if (retval)
+			goto out;
+	}
+
+	retval = mapping->a_ops->direct_IO(rw, iocb, iov, offset, nr_segs);
+	if (retval)
+		goto out;
+
+	/*
+	 * Finally, try again to invalidate clean pages which might have been
+	 * faulted in by get_user_pages() if the source of the write was an
+	 * mmap()ed region of the file we're writing.  That's a pretty crazy
+	 * thing to do, so we don't support it 100%.  If this invalidation
+	 * fails and we have -EIOCBQUEUED we ignore the failure.
+	 */
+	if (rw == WRITE && mapping->nrpages) {
+		int err = invalidate_inode_pages2_range(mapping,
+					      offset >> PAGE_CACHE_SHIFT, end);
+		if (err && retval >= 0)
+			retval = err;
 	}
+out:
 	return retval;
 }
 

-- 

  parent reply	other threads:[~2007-03-19 21:45 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070319213047.710101653@mini.kroah.org>
2007-03-19 21:36 ` [patch 00/31] 2.6.20-stable review Greg KH
2007-03-19 21:37   ` [patch 01/31] Fix another NULL pointer deref in ipv6_sockglue.c Greg KH
2007-03-19 21:37   ` [patch 02/31] Fix rtm_to_ifaddr() error return Greg KH
2007-03-19 21:37   ` [patch 03/31] Fix user copy length in ipv6_sockglue.c Greg KH
2007-03-19 22:01     ` Chris Wright
2007-03-19 22:51       ` David Miller
2007-03-20  4:05       ` Greg KH
2007-03-19 21:37   ` [patch 04/31] gdth: fix oops in gdth_copy_cmd() Greg KH
2007-03-19 21:37   ` [patch 05/31] NetLabel: Verify sensitivity level has a valid CIPSO mapping Greg KH
2007-03-19 21:38   ` [patch 06/31] NETFILTER: nfnetlink_log: fix reference counting Greg KH
2007-03-19 21:38   ` [patch 07/31] IA64: fix NULL pointer in ia64/irq_chip-mask/unmask function Greg KH
2007-03-19 21:38   ` [patch 08/31] adjust legacy IDE resource setting (v2) Greg KH
2007-03-19 21:38   ` [patch 09/31] mm: fix madvise infinine loop Greg KH
2007-03-19 21:38   ` [patch 10/31] EHCI: add delay to bus_resume before accessing ports Greg KH
2007-03-19 21:38   ` [patch 11/31] initialise pi_lock if CONFIG_RT_MUTEXES=N Greg KH
2007-03-19 21:38   ` [patch 12/31] futex: PI state locking fix Greg KH
2007-03-19 21:39   ` [patch 13/31] nfs: nfs_getattr() cant call nfs_sync_mapping_range() for non-regular files Greg KH
2007-03-19 21:39   ` [patch 14/31] hrtimer: prevent overrun DoS in hrtimer_forward() Greg KH
2007-03-19 21:39   ` [patch 15/31] fix MTIME_SEC_MAX on 32-bit Greg KH
2007-03-19 21:39   ` [patch 16/31] fix read past end of array in md/linear.c Greg KH
2007-03-19 21:39   ` [patch 17/31] r8169: fix a race between PCI probe and dev_open Greg KH
2007-03-19 21:39   ` [patch 18/31] Fix extraneous IPSEC larval SA creation Greg KH
2007-03-19 21:39   ` [patch 19/31] : Fix GFP_KERNEL with preemption disabled in fib_trie Greg KH
2007-03-19 21:40   ` [patch 20/31] Fix ipv6 flow label inheritance Greg KH
2007-03-19 21:40   ` [patch 21/31] Copy over mac_len when cloning an skb Greg KH
2007-03-19 21:40   ` [patch 22/31] Fix sparc64 hugepage bugs Greg KH
2007-03-19 21:40   ` [patch 23/31] Fix page allocation debugging on sparc64 Greg KH
2007-03-19 21:40   ` [patch 24/31] IrDA: irttp_dup spin_lock initialisation Greg KH
2007-03-19 21:40   ` [patch 25/31] Input: i8042 - really suppress ACK/NAK during panic blink Greg KH
2007-03-19 21:40   ` [patch 26/31] hda-intel - Fix codec probe with ATI controllers Greg KH
2007-03-19 21:40   ` [patch 27/31] oom fix: prevent oom from killing a process with children/sibling unkillable Greg KH
2007-03-19 21:41   ` Greg KH [this message]
2007-03-19 21:41   ` [patch 29/31] Input: i8042 - fix AUX IRQ delivery check Greg KH
2007-03-19 21:48     ` Dmitry Torokhov
2007-03-19 21:55       ` Chuck Ebbert
2007-03-20  4:18       ` [stable] " Greg KH
2007-03-19 21:41   ` [patch 30/31] fix deadlock in audit_log_task_context() Greg KH
2007-03-19 21:41   ` [patch 31/31] UML - arch_prctl should set thread fs Greg KH
2007-03-19 21:43   ` [patch 00/31] 2.6.20-stable review Greg KH
2007-03-20  5:15   ` Gene Heskett
2007-03-20 15:52     ` Greg KH
2007-03-20 19:59       ` Gene Heskett
2007-03-20 20:12         ` Michael Krufky
2007-03-21  2:56           ` Gene Heskett
2007-03-21  3:04           ` Gene Heskett
2007-03-21  3:39             ` Greg KH
2007-03-21  3:53               ` Gene Heskett
2007-03-25 16:30                 ` Adrian Bunk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070319214101.GD9261@kroah.com \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=bcrl@kvack.org \
    --cc=cebbert@redhat.com \
    --cc=chuckw@quantumlinux.com \
    --cc=davej@redhat.com \
    --cc=jmforbes@linuxtx.org \
    --cc=leonid.i.ananiev@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkrufky@linuxtv.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=rdunlap@xenotime.net \
    --cc=reviews@ml.cw.f00f.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=zach.brown@oracle.com \
    --cc=zwane@arm.linux.org.uk \
    --subject='Re: [patch 28/31] dio: invalidate clean pages before dio write' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).