LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations
@ 2011-01-28 14:56 Robert Jennings
  2011-01-28 14:57 ` [PATCH 1/7] [v2] zram/vmalloc: Correct tunings to enable use with 64K pages Robert Jennings
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Robert Jennings @ 2011-01-28 14:56 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Robert Jennings, Pekka Enberg, devel, linux-kernel

Based on feedback this is version 2 of my patches to fix zram and the
xvmalloc allocator for 64K page size kernels along with a few small
zram optimizations.

I have dropped the patch to mark the device as non-rotational media as
this was duplicated elsewhere.  I also dropped the patch regarding caching
the indices for page size allocations because gcc is quite smart.

There are two new patches at the end of this patch set.  The first
changes zram to return zero'd pages for reads of pages which have not
been written to, this eliminates passing uninitialized pages back
to user-space.  The second new patch cleans up freelist pointer
management and combines the two delete node functions into one common
function.

The xvmalloc allocator is non-functional when running with a 64K page
size.  The first two patches fix 64K page related issues.
[1/7] [v2] zram/vmalloc: Correct tunings to enable use with 64K pages
[2/7] [v2] zram: Prevent overflow in logical block size

The next 3 patches provide some small optimizations for zram and
xvmalloc.
[3/7] [v2] zram/xvmalloc: free bit block insertion optimization
[4/7] [v2] zram/xvmalloc: create CONFIG_ZRAM_DEBUG for debug code
[5/7] [v2] zram/xvmalloc: Close 32byte hole on 64bit CPUs

These last 2 patches are new, see the description above.
[6/7] zram: Return zero'd pages on new reads
[7/7] zram/xvmalloc: combine duplicate block delete code

Thanks for the helpful reviews.

Regards,
Robert Jennings

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] [v2] zram/vmalloc: Correct tunings to enable use with 64K pages
  2011-01-28 14:56 [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Robert Jennings
@ 2011-01-28 14:57 ` Robert Jennings
  2011-01-29  8:47   ` Pekka Enberg
  2011-01-28 14:58 ` [PATCH 2/7] [v2] zram: Prevent overflow in logical block size Robert Jennings
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Robert Jennings @ 2011-01-28 14:57 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Robert Jennings, Pekka Enberg, devel, linux-kernel

xvmalloc will not currently function with 64K pages.  Newly allocated
pages will be inserted at an offset beyond the end of the first-level
index.  This tuning is needed to properly size the allocator for 64K
pages.

The default 3 byte shift results in a second level list size which can not
be indexed using the 64 bits of the flbitmap in the xv_pool structure.
The shift must increase to 4 bytes between second level list entries to
fit the size of the first level bitmap.

Here are a few statistics for structure sizes on 32- and 64-bit CPUs
with 4KB and 64KB page sizes.

bits_per_long              32        64        64
page_size               4,096     4,096    65,535
xv_align                    4         8         8
fl_delta                    3         3         4
num_free_lists            508       508     4,094
xv_pool size            4,144b    8,216b   66,040b
per object overhead        32        64        64
zram struct 0.5GB disk    512KB    1024KB      64KB

This patch maintains the current tunings for 4K pages, adds an optimal
sizing for 64K pages and adds a safe tuning for any other page sizes.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
---
 drivers/staging/zram/xvmalloc_int.h |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/zram/xvmalloc_int.h b/drivers/staging/zram/xvmalloc_int.h
index e23ed5c..82a31fb 100644
--- a/drivers/staging/zram/xvmalloc_int.h
+++ b/drivers/staging/zram/xvmalloc_int.h
@@ -19,7 +19,11 @@
 /* User configurable params */
 
 /* Must be power of two */
+#ifdef CONFIG_64BIT
+#define XV_ALIGN_SHIFT 3
+#else
 #define XV_ALIGN_SHIFT	2
+#endif
 #define XV_ALIGN	(1 << XV_ALIGN_SHIFT)
 #define XV_ALIGN_MASK	(XV_ALIGN - 1)
 
@@ -27,8 +31,16 @@
 #define XV_MIN_ALLOC_SIZE	32
 #define XV_MAX_ALLOC_SIZE	(PAGE_SIZE - XV_ALIGN)
 
-/* Free lists are separated by FL_DELTA bytes */
-#define FL_DELTA_SHIFT	3
+/*
+ * Free lists are separated by FL_DELTA bytes
+ * This value is 3 for 4k pages and 4 for 64k pages, for any
+ * other page size, a conservative (PAGE_SHIFT - 9) is used.
+ */
+#if PAGE_SHIFT == 16
+#define FL_DELTA_SHIFT 4
+#else
+#define FL_DELTA_SHIFT (PAGE_SHIFT - 9)
+#endif
 #define FL_DELTA	(1 << FL_DELTA_SHIFT)
 #define FL_DELTA_MASK	(FL_DELTA - 1)
 #define NUM_FREE_LISTS	((XV_MAX_ALLOC_SIZE - XV_MIN_ALLOC_SIZE) \
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/7] [v2] zram: Prevent overflow in logical block size
  2011-01-28 14:56 [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Robert Jennings
  2011-01-28 14:57 ` [PATCH 1/7] [v2] zram/vmalloc: Correct tunings to enable use with 64K pages Robert Jennings
@ 2011-01-28 14:58 ` Robert Jennings
  2011-01-29  8:48   ` Pekka Enberg
  2011-01-28 14:58 ` [PATCH 3/7] [v2] zram/xvmalloc: free bit block insertion optimization Robert Jennings
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Robert Jennings @ 2011-01-28 14:58 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Robert Jennings, Pekka Enberg, devel, linux-kernel

On a 64K page kernel, the value PAGE_SIZE passed to
blk_queue_logical_block_size would overflow the logical block size
argument (resulting in setting it to 0).

This patch sets the logical block size to 4096, using a new
ZRAM_LOGICAL_BLOCK_SIZE constant.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
---
 drivers/staging/zram/zram_drv.c |    3 ++-
 drivers/staging/zram/zram_drv.h |    1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 5415712..d85d034 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -620,7 +620,8 @@ static int create_device(struct zram *zram, int device_id)
 	 * and n*PAGE_SIZED sized I/O requests.
 	 */
 	blk_queue_physical_block_size(zram->disk->queue, PAGE_SIZE);
-	blk_queue_logical_block_size(zram->disk->queue, PAGE_SIZE);
+	blk_queue_logical_block_size(zram->disk->queue,
+					ZRAM_LOGICAL_BLOCK_SIZE);
 	blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
 	blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);
 
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index a481551..408b2c0 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -61,6 +61,7 @@ static const unsigned max_zpage_size = PAGE_SIZE / 4 * 3;
 #define SECTOR_SIZE		(1 << SECTOR_SHIFT)
 #define SECTORS_PER_PAGE_SHIFT	(PAGE_SHIFT - SECTOR_SHIFT)
 #define SECTORS_PER_PAGE	(1 << SECTORS_PER_PAGE_SHIFT)
+#define ZRAM_LOGICAL_BLOCK_SIZE	4096
 
 /* Flags for zram pages (table[page_no].flags) */
 enum zram_pageflags {
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/7] [v2] zram/xvmalloc: free bit block insertion optimization
  2011-01-28 14:56 [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Robert Jennings
  2011-01-28 14:57 ` [PATCH 1/7] [v2] zram/vmalloc: Correct tunings to enable use with 64K pages Robert Jennings
  2011-01-28 14:58 ` [PATCH 2/7] [v2] zram: Prevent overflow in logical block size Robert Jennings
@ 2011-01-28 14:58 ` Robert Jennings
  2011-01-29  8:48   ` Pekka Enberg
  2011-01-28 14:59 ` [PATCH 4/7] [v2] zram/xvmalloc: create CONFIG_ZRAM_DEBUG for debug code Robert Jennings
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Robert Jennings @ 2011-01-28 14:58 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Robert Jennings, Pekka Enberg, devel, linux-kernel

This change is in a conditional block which is entered only when there is
an existing data block on the freelist where the insert has taken place.

The new block is pushed onto the freelist stack and this conditional block
is updating links in the prior stack head to point to the new stack head.
After this conditional block the first-/second-level indices are updated
to indicate that there is a free block at this location.

This patch adds an immediate return from the conditional block to avoid
setting bits again to indicate a free block on this freelist. The bits
would already be set because there was an existing free block on this
freelist.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
---
 drivers/staging/zram/xvmalloc.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/zram/xvmalloc.c b/drivers/staging/zram/xvmalloc.c
index b644067..149138a 100644
--- a/drivers/staging/zram/xvmalloc.c
+++ b/drivers/staging/zram/xvmalloc.c
@@ -200,6 +200,8 @@ static void insert_block(struct xv_pool *pool, struct page *page, u32 offset,
 		nextblock->link.prev_page = page;
 		nextblock->link.prev_offset = offset;
 		put_ptr_atomic(nextblock, KM_USER1);
+		/* If there was a next page then the free bits are set. */
+		return;
 	}
 
 	__set_bit(slindex % BITS_PER_LONG, &pool->slbitmap[flindex]);
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/7] [v2] zram/xvmalloc: create CONFIG_ZRAM_DEBUG for debug code
  2011-01-28 14:56 [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Robert Jennings
                   ` (2 preceding siblings ...)
  2011-01-28 14:58 ` [PATCH 3/7] [v2] zram/xvmalloc: free bit block insertion optimization Robert Jennings
@ 2011-01-28 14:59 ` Robert Jennings
  2011-01-29  8:48   ` Pekka Enberg
  2011-01-28 15:00 ` [PATCH 5/7] [v2] zram/xvmalloc: Close 32byte hole on 64bit CPUs Robert Jennings
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Robert Jennings @ 2011-01-28 14:59 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Robert Jennings, Pekka Enberg, devel, linux-kernel

Add a debug config flag to enable debug printk output and future
debug code.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
---
 drivers/staging/zram/Kconfig    |    8 ++++++++
 drivers/staging/zram/xvmalloc.c |    4 ++++
 drivers/staging/zram/zram_drv.c |    4 ++++
 3 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/zram/Kconfig b/drivers/staging/zram/Kconfig
index da079f8..d957747 100644
--- a/drivers/staging/zram/Kconfig
+++ b/drivers/staging/zram/Kconfig
@@ -15,3 +15,11 @@ config ZRAM
 
 	  See zram.txt for more information.
 	  Project home: http://compcache.googlecode.com/
+
+config ZRAM_DEBUG
+	bool "Compressed RAM block device debug support"
+	depends on ZRAM
+	default n
+	help
+	  This option adds additional debugging code to the compressed
+	  RAM block device driver.
diff --git a/drivers/staging/zram/xvmalloc.c b/drivers/staging/zram/xvmalloc.c
index 149138a..0560e9d 100644
--- a/drivers/staging/zram/xvmalloc.c
+++ b/drivers/staging/zram/xvmalloc.c
@@ -10,6 +10,10 @@
  * Released under the terms of GNU General Public License Version 2.0
  */
 
+#ifdef CONFIG_ZRAM_DEBUG
+#define DEBUG
+#endif
+
 #include <linux/bitops.h>
 #include <linux/errno.h>
 #include <linux/highmem.h>
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index d85d034..c67060c 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -15,6 +15,10 @@
 #define KMSG_COMPONENT "zram"
 #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
 
+#ifdef CONFIG_ZRAM_DEBUG
+#define DEBUG
+#endif
+
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/bio.h>
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 5/7] [v2] zram/xvmalloc: Close 32byte hole on 64bit CPUs
  2011-01-28 14:56 [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Robert Jennings
                   ` (3 preceding siblings ...)
  2011-01-28 14:59 ` [PATCH 4/7] [v2] zram/xvmalloc: create CONFIG_ZRAM_DEBUG for debug code Robert Jennings
@ 2011-01-28 15:00 ` Robert Jennings
  2011-01-29  8:49   ` Pekka Enberg
  2011-01-28 15:00 ` [PATCH 6/7] zram: Return zero'd pages on new reads Robert Jennings
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Robert Jennings @ 2011-01-28 15:00 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Robert Jennings, Pekka Enberg, devel, linux-kernel

By swapping the total_pages statistic with the lock we close a
hole in the structure for 64-bit CPUs.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
---
 drivers/staging/zram/xvmalloc_int.h |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/zram/xvmalloc_int.h b/drivers/staging/zram/xvmalloc_int.h
index 82a31fb..b5f1f7f 100644
--- a/drivers/staging/zram/xvmalloc_int.h
+++ b/drivers/staging/zram/xvmalloc_int.h
@@ -87,12 +87,9 @@ struct block_header {
 struct xv_pool {
 	ulong flbitmap;
 	ulong slbitmap[MAX_FLI];
-	spinlock_t lock;
-
+	u64 total_pages;	/* stats */
 	struct freelist_entry freelist[NUM_FREE_LISTS];
-
-	/* stats */
-	u64 total_pages;
+	spinlock_t lock;
 };
 
 #endif
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 6/7] zram: Return zero'd pages on new reads
  2011-01-28 14:56 [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Robert Jennings
                   ` (4 preceding siblings ...)
  2011-01-28 15:00 ` [PATCH 5/7] [v2] zram/xvmalloc: Close 32byte hole on 64bit CPUs Robert Jennings
@ 2011-01-28 15:00 ` Robert Jennings
  2011-01-29  8:49   ` Pekka Enberg
  2011-01-28 15:01 ` [PATCH 7/7] zram/xvmalloc: combine duplicate block delete code Robert Jennings
  2011-01-29  8:47 ` [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Pekka Enberg
  7 siblings, 1 reply; 18+ messages in thread
From: Robert Jennings @ 2011-01-28 15:00 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Robert Jennings, Pekka Enberg, devel, linux-kernel

Currently zram will do nothing to the page in the bvec when that page
has not been previously written.  This allows random data to leak to
user space.  That can be seen by doing the following:

 ## Load the module and create a 256Mb zram device called /dev/zram0
 # modprobe zram
 # echo $((256*1024*1024)) > /sys/class/block/zram0/disksize

 ## Initialize the device by writing zero to the first block
 # dd if=/dev/zero of=/dev/zram0 bs=512 count=1

 ## Read ~256Mb of memory into a file and hope for something interesting
 # dd if=/dev/zram0 of=file

This patch will treat an unwritten page as a zero-filled page.  If a
page is read before a write has occurred the data returned is all 0's.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
---
 drivers/staging/zram/zram_drv.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index c67060c..86066fd 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -238,7 +238,7 @@ static int zram_read(struct zram *zram, struct bio *bio)
 		if (unlikely(!zram->table[index].page)) {
 			pr_debug("Read before write: sector=%lu, size=%u",
 				(ulong)(bio->bi_sector), bio->bi_size);
-			/* Do nothing */
+			handle_zero_page(page);
 			continue;
 		}
 
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 7/7] zram/xvmalloc: combine duplicate block delete code
  2011-01-28 14:56 [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Robert Jennings
                   ` (5 preceding siblings ...)
  2011-01-28 15:00 ` [PATCH 6/7] zram: Return zero'd pages on new reads Robert Jennings
@ 2011-01-28 15:01 ` Robert Jennings
  2011-01-29  8:50   ` Pekka Enberg
  2011-01-31 13:31   ` Nitin Gupta
  2011-01-29  8:47 ` [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Pekka Enberg
  7 siblings, 2 replies; 18+ messages in thread
From: Robert Jennings @ 2011-01-28 15:01 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg Kroah-Hartman, Robert Jennings, Pekka Enberg, devel, linux-kernel

This patch eliminates duplicate code.  The remove_block_head function
is a special case of remove_block which can be contained in remove_block
without confusion.

The portion of code in remove_block_head which was noted as "DEBUG ONLY"
is now mandatory.  Doing this provides consistent management of the double
linked list of blocks under a freelist and makes this consolidation
of delete block code safe.  The first and last blocks will have NULL
pointers in their previous and next page pointers respectively.

Additionally, any time a block is removed from a free list the next and
previous pointers will be set to NULL to avoid misuse outside xvmalloc.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
---
 drivers/staging/zram/xvmalloc.c |   73 ++++++++++++++++----------------------
 1 files changed, 31 insertions(+), 42 deletions(-)

diff --git a/drivers/staging/zram/xvmalloc.c b/drivers/staging/zram/xvmalloc.c
index 0560e9d..fc4a3df 100644
--- a/drivers/staging/zram/xvmalloc.c
+++ b/drivers/staging/zram/xvmalloc.c
@@ -213,54 +213,14 @@ static void insert_block(struct xv_pool *pool, struct page *page, u32 offset,
 }
 
 /*
- * Remove block from head of freelist. Index 'slindex' identifies the freelist.
- */
-static void remove_block_head(struct xv_pool *pool,
-			struct block_header *block, u32 slindex)
-{
-	struct block_header *tmpblock;
-	u32 flindex = slindex / BITS_PER_LONG;
-
-	pool->freelist[slindex].page = block->link.next_page;
-	pool->freelist[slindex].offset = block->link.next_offset;
-	block->link.prev_page = NULL;
-	block->link.prev_offset = 0;
-
-	if (!pool->freelist[slindex].page) {
-		__clear_bit(slindex % BITS_PER_LONG, &pool->slbitmap[flindex]);
-		if (!pool->slbitmap[flindex])
-			__clear_bit(flindex, &pool->flbitmap);
-	} else {
-		/*
-		 * DEBUG ONLY: We need not reinitialize freelist head previous
-		 * pointer to 0 - we never depend on its value. But just for
-		 * sanity, lets do it.
-		 */
-		tmpblock = get_ptr_atomic(pool->freelist[slindex].page,
-				pool->freelist[slindex].offset, KM_USER1);
-		tmpblock->link.prev_page = NULL;
-		tmpblock->link.prev_offset = 0;
-		put_ptr_atomic(tmpblock, KM_USER1);
-	}
-}
-
-/*
  * Remove block from freelist. Index 'slindex' identifies the freelist.
  */
 static void remove_block(struct xv_pool *pool, struct page *page, u32 offset,
 			struct block_header *block, u32 slindex)
 {
-	u32 flindex;
+	u32 flindex = slindex / BITS_PER_LONG;
 	struct block_header *tmpblock;
 
-	if (pool->freelist[slindex].page == page
-	   && pool->freelist[slindex].offset == offset) {
-		remove_block_head(pool, block, slindex);
-		return;
-	}
-
-	flindex = slindex / BITS_PER_LONG;
-
 	if (block->link.prev_page) {
 		tmpblock = get_ptr_atomic(block->link.prev_page,
 				block->link.prev_offset, KM_USER1);
@@ -276,6 +236,35 @@ static void remove_block(struct xv_pool *pool, struct page *page, u32 offset,
 		tmpblock->link.prev_offset = block->link.prev_offset;
 		put_ptr_atomic(tmpblock, KM_USER1);
 	}
+
+	/* Is this block is at the head of the freelist? */
+	if (pool->freelist[slindex].page == page
+	   && pool->freelist[slindex].offset == offset) {
+
+		pool->freelist[slindex].page = block->link.next_page;
+		pool->freelist[slindex].offset = block->link.next_offset;
+
+		if (pool->freelist[slindex].page) {
+			struct block_header *tmpblock;
+			tmpblock = get_ptr_atomic(pool->freelist[slindex].page,
+					pool->freelist[slindex].offset,
+					KM_USER1);
+			tmpblock->link.prev_page = NULL;
+			tmpblock->link.prev_offset = 0;
+			put_ptr_atomic(tmpblock, KM_USER1);
+		} else {
+			/* This freelist bucket is empty */
+			__clear_bit(slindex % BITS_PER_LONG,
+				    &pool->slbitmap[flindex]);
+			if (!pool->slbitmap[flindex])
+				__clear_bit(flindex, &pool->flbitmap);
+		}
+	}
+
+	block->link.prev_page = NULL;
+	block->link.prev_offset = 0;
+	block->link.next_page = NULL;
+	block->link.next_offset = 0;
 }
 
 /*
@@ -384,7 +373,7 @@ int xv_malloc(struct xv_pool *pool, u32 size, struct page **page,
 
 	block = get_ptr_atomic(*page, *offset, KM_USER0);
 
-	remove_block_head(pool, block, index);
+	remove_block(pool, *page, *offset, block, index);
 
 	/* Split the block if required */
 	tmpoffset = *offset + size + XV_ALIGN;
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations
  2011-01-28 14:56 [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Robert Jennings
                   ` (6 preceding siblings ...)
  2011-01-28 15:01 ` [PATCH 7/7] zram/xvmalloc: combine duplicate block delete code Robert Jennings
@ 2011-01-29  8:47 ` Pekka Enberg
  2011-01-29 18:54   ` Robert Jennings
  7 siblings, 1 reply; 18+ messages in thread
From: Pekka Enberg @ 2011-01-29  8:47 UTC (permalink / raw)
  To: Robert Jennings; +Cc: Nitin Gupta, Greg Kroah-Hartman, devel, linux-kernel

On Fri, 2011-01-28 at 08:56 -0600, Robert Jennings wrote:
> Based on feedback this is version 2 of my patches to fix zram and the
> xvmalloc allocator for 64K page size kernels along with a few small
> zram optimizations.
> 
> I have dropped the patch to mark the device as non-rotational media as
> this was duplicated elsewhere.  I also dropped the patch regarding caching
> the indices for page size allocations because gcc is quite smart.
> 
> There are two new patches at the end of this patch set.  The first
> changes zram to return zero'd pages for reads of pages which have not
> been written to, this eliminates passing uninitialized pages back
> to user-space.  The second new patch cleans up freelist pointer
> management and combines the two delete node functions into one common
> function.
> 
> The xvmalloc allocator is non-functional when running with a 64K page
> size.  The first two patches fix 64K page related issues.
> [1/7] [v2] zram/vmalloc: Correct tunings to enable use with 64K pages
> [2/7] [v2] zram: Prevent overflow in logical block size
> 
> The next 3 patches provide some small optimizations for zram and
> xvmalloc.
> [3/7] [v2] zram/xvmalloc: free bit block insertion optimization
> [4/7] [v2] zram/xvmalloc: create CONFIG_ZRAM_DEBUG for debug code
> [5/7] [v2] zram/xvmalloc: Close 32byte hole on 64bit CPUs
> 
> These last 2 patches are new, see the description above.
> [6/7] zram: Return zero'd pages on new reads
> [7/7] zram/xvmalloc: combine duplicate block delete code
> 
> Thanks for the helpful reviews.

Hey, you dropped Nitin's and my Reviewed-by tags! Please don't do that
in the future.

			Pekka


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/7] [v2] zram/vmalloc: Correct tunings to enable use with 64K pages
  2011-01-28 14:57 ` [PATCH 1/7] [v2] zram/vmalloc: Correct tunings to enable use with 64K pages Robert Jennings
@ 2011-01-29  8:47   ` Pekka Enberg
  0 siblings, 0 replies; 18+ messages in thread
From: Pekka Enberg @ 2011-01-29  8:47 UTC (permalink / raw)
  To: Robert Jennings; +Cc: Nitin Gupta, Greg Kroah-Hartman, devel, linux-kernel

On Fri, 2011-01-28 at 08:57 -0600, Robert Jennings wrote:
> xvmalloc will not currently function with 64K pages.  Newly allocated
> pages will be inserted at an offset beyond the end of the first-level
> index.  This tuning is needed to properly size the allocator for 64K
> pages.
> 
> The default 3 byte shift results in a second level list size which can not
> be indexed using the 64 bits of the flbitmap in the xv_pool structure.
> The shift must increase to 4 bytes between second level list entries to
> fit the size of the first level bitmap.
> 
> Here are a few statistics for structure sizes on 32- and 64-bit CPUs
> with 4KB and 64KB page sizes.
> 
> bits_per_long              32        64        64
> page_size               4,096     4,096    65,535
> xv_align                    4         8         8
> fl_delta                    3         3         4
> num_free_lists            508       508     4,094
> xv_pool size            4,144b    8,216b   66,040b
> per object overhead        32        64        64
> zram struct 0.5GB disk    512KB    1024KB      64KB
> 
> This patch maintains the current tunings for 4K pages, adds an optimal
> sizing for 64K pages and adds a safe tuning for any other page sizes.
> 
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

Reviewed-by: Pekka Enberg <penberg@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/7] [v2] zram: Prevent overflow in logical block size
  2011-01-28 14:58 ` [PATCH 2/7] [v2] zram: Prevent overflow in logical block size Robert Jennings
@ 2011-01-29  8:48   ` Pekka Enberg
  0 siblings, 0 replies; 18+ messages in thread
From: Pekka Enberg @ 2011-01-29  8:48 UTC (permalink / raw)
  To: Robert Jennings; +Cc: Nitin Gupta, Greg Kroah-Hartman, devel, linux-kernel

On Fri, 2011-01-28 at 08:58 -0600, Robert Jennings wrote:
> On a 64K page kernel, the value PAGE_SIZE passed to
> blk_queue_logical_block_size would overflow the logical block size
> argument (resulting in setting it to 0).
> 
> This patch sets the logical block size to 4096, using a new
> ZRAM_LOGICAL_BLOCK_SIZE constant.
> 
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

Reviewed-by: Pekka Enberg <penberg@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/7] [v2] zram/xvmalloc: free bit block insertion optimization
  2011-01-28 14:58 ` [PATCH 3/7] [v2] zram/xvmalloc: free bit block insertion optimization Robert Jennings
@ 2011-01-29  8:48   ` Pekka Enberg
  0 siblings, 0 replies; 18+ messages in thread
From: Pekka Enberg @ 2011-01-29  8:48 UTC (permalink / raw)
  To: Robert Jennings; +Cc: Nitin Gupta, Greg Kroah-Hartman, devel, linux-kernel

On Fri, 2011-01-28 at 08:58 -0600, Robert Jennings wrote:
> This change is in a conditional block which is entered only when there is
> an existing data block on the freelist where the insert has taken place.
> 
> The new block is pushed onto the freelist stack and this conditional block
> is updating links in the prior stack head to point to the new stack head.
> After this conditional block the first-/second-level indices are updated
> to indicate that there is a free block at this location.
> 
> This patch adds an immediate return from the conditional block to avoid
> setting bits again to indicate a free block on this freelist. The bits
> would already be set because there was an existing free block on this
> freelist.
> 
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

Reviewed-by: Pekka Enberg <penberg@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/7] [v2] zram/xvmalloc: create CONFIG_ZRAM_DEBUG for debug code
  2011-01-28 14:59 ` [PATCH 4/7] [v2] zram/xvmalloc: create CONFIG_ZRAM_DEBUG for debug code Robert Jennings
@ 2011-01-29  8:48   ` Pekka Enberg
  0 siblings, 0 replies; 18+ messages in thread
From: Pekka Enberg @ 2011-01-29  8:48 UTC (permalink / raw)
  To: Robert Jennings; +Cc: Nitin Gupta, Greg Kroah-Hartman, devel, linux-kernel

On Fri, 2011-01-28 at 08:59 -0600, Robert Jennings wrote:
> Add a debug config flag to enable debug printk output and future
> debug code.
> 
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

Reviewed-by: Pekka Enberg <penberg@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/7] [v2] zram/xvmalloc: Close 32byte hole on 64bit CPUs
  2011-01-28 15:00 ` [PATCH 5/7] [v2] zram/xvmalloc: Close 32byte hole on 64bit CPUs Robert Jennings
@ 2011-01-29  8:49   ` Pekka Enberg
  0 siblings, 0 replies; 18+ messages in thread
From: Pekka Enberg @ 2011-01-29  8:49 UTC (permalink / raw)
  To: Robert Jennings; +Cc: Nitin Gupta, Greg Kroah-Hartman, devel, linux-kernel

On Fri, 2011-01-28 at 09:00 -0600, Robert Jennings wrote:
> By swapping the total_pages statistic with the lock we close a
> hole in the structure for 64-bit CPUs.
> 
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

Reviewed-by: Pekka Enberg <penberg@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] zram: Return zero'd pages on new reads
  2011-01-28 15:00 ` [PATCH 6/7] zram: Return zero'd pages on new reads Robert Jennings
@ 2011-01-29  8:49   ` Pekka Enberg
  0 siblings, 0 replies; 18+ messages in thread
From: Pekka Enberg @ 2011-01-29  8:49 UTC (permalink / raw)
  To: Robert Jennings; +Cc: Nitin Gupta, Greg Kroah-Hartman, devel, linux-kernel

On Fri, 2011-01-28 at 09:00 -0600, Robert Jennings wrote:
> Currently zram will do nothing to the page in the bvec when that page
> has not been previously written.  This allows random data to leak to
> user space.  That can be seen by doing the following:
> 
>  ## Load the module and create a 256Mb zram device called /dev/zram0
>  # modprobe zram
>  # echo $((256*1024*1024)) > /sys/class/block/zram0/disksize
> 
>  ## Initialize the device by writing zero to the first block
>  # dd if=/dev/zero of=/dev/zram0 bs=512 count=1
> 
>  ## Read ~256Mb of memory into a file and hope for something interesting
>  # dd if=/dev/zram0 of=file
> 
> This patch will treat an unwritten page as a zero-filled page.  If a
> page is read before a write has occurred the data returned is all 0's.
> 
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

Reviewed-by: Pekka Enberg <penberg@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] zram/xvmalloc: combine duplicate block delete code
  2011-01-28 15:01 ` [PATCH 7/7] zram/xvmalloc: combine duplicate block delete code Robert Jennings
@ 2011-01-29  8:50   ` Pekka Enberg
  2011-01-31 13:31   ` Nitin Gupta
  1 sibling, 0 replies; 18+ messages in thread
From: Pekka Enberg @ 2011-01-29  8:50 UTC (permalink / raw)
  To: Robert Jennings; +Cc: Nitin Gupta, Greg Kroah-Hartman, devel, linux-kernel

On Fri, 2011-01-28 at 09:01 -0600, Robert Jennings wrote:
> This patch eliminates duplicate code.  The remove_block_head function
> is a special case of remove_block which can be contained in remove_block
> without confusion.
> 
> The portion of code in remove_block_head which was noted as "DEBUG ONLY"
> is now mandatory.  Doing this provides consistent management of the double
> linked list of blocks under a freelist and makes this consolidation
> of delete block code safe.  The first and last blocks will have NULL
> pointers in their previous and next page pointers respectively.
> 
> Additionally, any time a block is removed from a free list the next and
> previous pointers will be set to NULL to avoid misuse outside xvmalloc.
> 
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

Looks OK to me but we should really get Nitin's ACK for this.

Reviewed-by: Pekka Enberg <penberg@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations
  2011-01-29  8:47 ` [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Pekka Enberg
@ 2011-01-29 18:54   ` Robert Jennings
  0 siblings, 0 replies; 18+ messages in thread
From: Robert Jennings @ 2011-01-29 18:54 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Nitin Gupta, Greg Kroah-Hartman, devel, linux-kernel

* Pekka Enberg (penberg@cs.helsinki.fi) wrote:
> On Fri, 2011-01-28 at 08:56 -0600, Robert Jennings wrote:
> > Based on feedback this is version 2 of my patches to fix zram and the
> > xvmalloc allocator for 64K page size kernels along with a few small
> > zram optimizations.
> > 
> > I have dropped the patch to mark the device as non-rotational media as
> > this was duplicated elsewhere.  I also dropped the patch regarding caching
> > the indices for page size allocations because gcc is quite smart.
> > 
> > There are two new patches at the end of this patch set.  The first
> > changes zram to return zero'd pages for reads of pages which have not
> > been written to, this eliminates passing uninitialized pages back
> > to user-space.  The second new patch cleans up freelist pointer
> > management and combines the two delete node functions into one common
> > function.
> > 
> > The xvmalloc allocator is non-functional when running with a 64K page
> > size.  The first two patches fix 64K page related issues.
> > [1/7] [v2] zram/vmalloc: Correct tunings to enable use with 64K pages
> > [2/7] [v2] zram: Prevent overflow in logical block size
> > 
> > The next 3 patches provide some small optimizations for zram and
> > xvmalloc.
> > [3/7] [v2] zram/xvmalloc: free bit block insertion optimization
> > [4/7] [v2] zram/xvmalloc: create CONFIG_ZRAM_DEBUG for debug code
> > [5/7] [v2] zram/xvmalloc: Close 32byte hole on 64bit CPUs
> > 
> > These last 2 patches are new, see the description above.
> > [6/7] zram: Return zero'd pages on new reads
> > [7/7] zram/xvmalloc: combine duplicate block delete code
> > 
> > Thanks for the helpful reviews.
> 
> Hey, you dropped Nitin's and my Reviewed-by tags! Please don't do that
> in the future.
> 
> 			Pekka
> 
Very sorry about that, it was not my intention to drop the Reviewed-by
tags on the unchanged patches.  Thank you for reviewing these patches.

--Robert Jennings

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] zram/xvmalloc: combine duplicate block delete code
  2011-01-28 15:01 ` [PATCH 7/7] zram/xvmalloc: combine duplicate block delete code Robert Jennings
  2011-01-29  8:50   ` Pekka Enberg
@ 2011-01-31 13:31   ` Nitin Gupta
  1 sibling, 0 replies; 18+ messages in thread
From: Nitin Gupta @ 2011-01-31 13:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Pekka Enberg, devel, linux-kernel

On 01/28/2011 10:01 AM, Robert Jennings wrote:
> This patch eliminates duplicate code.  The remove_block_head function
> is a special case of remove_block which can be contained in remove_block
> without confusion.
>
> The portion of code in remove_block_head which was noted as "DEBUG ONLY"
> is now mandatory.  Doing this provides consistent management of the double
> linked list of blocks under a freelist and makes this consolidation
> of delete block code safe.  The first and last blocks will have NULL
> pointers in their previous and next page pointers respectively.
>
> Additionally, any time a block is removed from a free list the next and
> previous pointers will be set to NULL to avoid misuse outside xvmalloc.
>
> Signed-off-by: Robert Jennings<rcj@linux.vnet.ibm.com>

The reason for introducing remove_block_head() as a separate function 
was to make malloc slightly faster but since I lack any profiling data, 
I'm not very sure if this may impact performance. Ideally, some sort of 
data with some malloc heavy test would have been useful. Anyways, I 
think major allocator changes will happen when we make xvmalloc 
allocated memory reclaimable, so maybe we can defer profiling.

Acked-by: Nitin Gupta <ngupta@vflare.org>

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-01-31 14:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-28 14:56 [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Robert Jennings
2011-01-28 14:57 ` [PATCH 1/7] [v2] zram/vmalloc: Correct tunings to enable use with 64K pages Robert Jennings
2011-01-29  8:47   ` Pekka Enberg
2011-01-28 14:58 ` [PATCH 2/7] [v2] zram: Prevent overflow in logical block size Robert Jennings
2011-01-29  8:48   ` Pekka Enberg
2011-01-28 14:58 ` [PATCH 3/7] [v2] zram/xvmalloc: free bit block insertion optimization Robert Jennings
2011-01-29  8:48   ` Pekka Enberg
2011-01-28 14:59 ` [PATCH 4/7] [v2] zram/xvmalloc: create CONFIG_ZRAM_DEBUG for debug code Robert Jennings
2011-01-29  8:48   ` Pekka Enberg
2011-01-28 15:00 ` [PATCH 5/7] [v2] zram/xvmalloc: Close 32byte hole on 64bit CPUs Robert Jennings
2011-01-29  8:49   ` Pekka Enberg
2011-01-28 15:00 ` [PATCH 6/7] zram: Return zero'd pages on new reads Robert Jennings
2011-01-29  8:49   ` Pekka Enberg
2011-01-28 15:01 ` [PATCH 7/7] zram/xvmalloc: combine duplicate block delete code Robert Jennings
2011-01-29  8:50   ` Pekka Enberg
2011-01-31 13:31   ` Nitin Gupta
2011-01-29  8:47 ` [PATCH 0/7][v2] zram/xvmalloc: 64K page fixes and optimizations Pekka Enberg
2011-01-29 18:54   ` Robert Jennings

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).