LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [rfc][patch] mm: half-fix page tail zeroing on write problem
@ 2007-02-02  5:51 Nick Piggin
  2007-02-02  6:29 ` Neil Brown
  0 siblings, 1 reply; 3+ messages in thread
From: Nick Piggin @ 2007-02-02  5:51 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux Memory Management List, Linux Kernel Mailing List

Hi,

For no important reason, I've again looked at those zeroing patches that
Neil did a while back. I've always thought that a simple
`write(fd, NULL, size)` would cause the same sorts of problems.

Turns out it does. If you first write all 1s into a page, then do the
`write(fd, NULL, size)` at the same position, you end up with all 0s in
the page (test-case available on request).  Incredible; surely this
violates the spec?

The buffered-write fixes I've got actually fix this properly, but  they
don't look like getting merged any time soon. We could do this simple
patch which just reduces the chance of corruption from a certainty down
to a small race.

Any thoughts?

-- 
Index: linux-2.6/include/linux/pagemap.h
===================================================================
--- linux-2.6.orig/include/linux/pagemap.h	2007-02-02 13:41:21.000000000 +1100
+++ linux-2.6/include/linux/pagemap.h	2007-02-02 13:42:09.000000000 +1100
@@ -198,6 +198,9 @@ static inline int fault_in_pages_writeab
 {
 	int ret;
 
+	if (unlikely(size == 0))
+		return 0;
+
 	/*
 	 * Writing zeroes into userspace here is OK, because we know that if
 	 * the zero gets there, we'll be overwriting it.
@@ -217,19 +220,23 @@ static inline int fault_in_pages_writeab
 	return ret;
 }
 
-static inline void fault_in_pages_readable(const char __user *uaddr, int size)
+static inline int fault_in_pages_readable(const char __user *uaddr, int size)
 {
 	volatile char c;
 	int ret;
 
+	if (unlikely(size == 0))
+		return 0;
+
 	ret = __get_user(c, uaddr);
 	if (ret == 0) {
 		const char __user *end = uaddr + size - 1;
 
 		if (((unsigned long)uaddr & PAGE_MASK) !=
 				((unsigned long)end & PAGE_MASK))
-		 	__get_user(c, end);
+		 	ret = __get_user(c, end);
 	}
+	return ret;
 }
 
 #endif /* _LINUX_PAGEMAP_H */
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2007-02-02 13:42:40.000000000 +1100
+++ linux-2.6/mm/filemap.c	2007-02-02 14:00:19.000000000 +1100
@@ -2112,7 +2112,10 @@ generic_file_buffered_write(struct kiocb
 		 * same page as we're writing to, without it being marked
 		 * up-to-date.
 		 */
-		fault_in_pages_readable(buf, bytes);
+		if (fault_in_pages_readable(buf, bytes)) {
+			status = -EFAULT;
+			break;
+		}
 
 		page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
 		if (!page) {

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [rfc][patch] mm: half-fix page tail zeroing on write problem
  2007-02-02  5:51 [rfc][patch] mm: half-fix page tail zeroing on write problem Nick Piggin
@ 2007-02-02  6:29 ` Neil Brown
  2007-02-02  7:21   ` Nick Piggin
  0 siblings, 1 reply; 3+ messages in thread
From: Neil Brown @ 2007-02-02  6:29 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Linux Memory Management List, Linux Kernel Mailing List

On Friday February 2, npiggin@suse.de wrote:
> Hi,
> 
> For no important reason, I've again looked at those zeroing patches that
> Neil did a while back. I've always thought that a simple
> `write(fd, NULL, size)` would cause the same sorts of problems.

Yeh, but who in their right mind would do that???
Oh, you did :-)

> 
> Turns out it does. If you first write all 1s into a page, then do the
> `write(fd, NULL, size)` at the same position, you end up with all 0s in
> the page (test-case available on request).  Incredible; surely this
> violates the spec?

Does it?
I guess filling with zeros isn't what one would expect, but you could
make a case for it being right.
  write(fd, 0, size)
writes 'size' 0s.  Cool.   Ok, bad-cool.

> 
> The buffered-write fixes I've got actually fix this properly, but  they
> don't look like getting merged any time soon. We could do this simple
> patch which just reduces the chance of corruption from a certainty down
> to a small race.
> 
> Any thoughts?

I cannot see why you make a change to fault_in_pages_writeable.  Is it
just for symmetry?
For the rest, it certainly makes sense to return an early -EFAULT if
you cannot fault in the page.

NeilBrown

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [rfc][patch] mm: half-fix page tail zeroing on write problem
  2007-02-02  6:29 ` Neil Brown
@ 2007-02-02  7:21   ` Nick Piggin
  0 siblings, 0 replies; 3+ messages in thread
From: Nick Piggin @ 2007-02-02  7:21 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux Memory Management List, Linux Kernel Mailing List

On Fri, Feb 02, 2007 at 05:29:06PM +1100, Neil Brown wrote:
> On Friday February 2, npiggin@suse.de wrote:
> > Hi,
> > 
> > For no important reason, I've again looked at those zeroing patches that
> > Neil did a while back. I've always thought that a simple
> > `write(fd, NULL, size)` would cause the same sorts of problems.
> 
> Yeh, but who in their right mind would do that???
> Oh, you did :-)

Well that's the test-case. Obviously not many people do it, but that's
all the more reason to be careful about correct behaviour.

> I cannot see why you make a change to fault_in_pages_writeable.  Is it
> just for symmetry?

Yes.

> For the rest, it certainly makes sense to return an early -EFAULT if
> you cannot fault in the page.

I think so.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-02-02  7:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-02  5:51 [rfc][patch] mm: half-fix page tail zeroing on write problem Nick Piggin
2007-02-02  6:29 ` Neil Brown
2007-02-02  7:21   ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).