LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v1 00/10] zsmalloc compaction support
@ 2015-01-21  6:14 Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 01/10] zram: avoid calling of zram_meta_free under init_lock Minchan Kim
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

Recently, there was issue about zsmalloc fragmentation and
I got a report from Juneho that new fork failed although there
are plenty of free pages in the system.
His investigation revealed zram is one of the culprit to make
heavy fragmentation so there was no more contiguous 16K page
for pgd to fork in the ARM.

This patchset implement *basic* zsmalloc compaction support
and zram utilizes it so admin can do

        "echo 1 > /sys/block/zram0/compact"

In my experiment(high compress ratio data with heavy swap in/out
on zram 8G swap), data is as follows,

Before =
zram allocated object :      60212066 bytes
zram total used:     140103680 bytes
ratio:         42.98 percent
MemFree:          840192 kB

Compaction

After =
frag ratio after compaction
zram allocated object :      60212066 bytes
zram total used:      76185600 bytes
ratio:         79.03 percent
MemFree:          901932 kB

This patchset adds more logics in zsmalloc but when I tested
heavy swapin/out program, the regression is marginal because
most of overheads were caused by compress/decompress and
other MM reclaim stuff.

Minchan Kim (10):
  zram: avoid calling of zram_meta_free under init_lock
  zsmalloc: decouple handle and object
  zsmalloc: implement reverse mapping
  zsmalloc: factor out obj_[malloc|free]
  zsmalloc: add status bit
  zsmalloc: support compaction
  zsmalloc: adjust ZS_ALMOST_FULL
  zram: support compaction
  zsmalloc: add fullness into stat
  zsmalloc: record handle in page->private for huge object

 Documentation/ABI/testing/sysfs-block-zram |    8 +
 drivers/block/zram/zram_drv.c              |   30 +-
 drivers/block/zram/zram_drv.h              |    1 +
 include/linux/zsmalloc.h                   |    1 +
 mm/zsmalloc.c                              | 1008 +++++++++++++++++++++-------
 5 files changed, 786 insertions(+), 262 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 01/10] zram: avoid calling of zram_meta_free under init_lock
  2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
@ 2015-01-21  6:14 ` Minchan Kim
  2015-01-21 14:21   ` Sergey Senozhatsky
  2015-01-21  6:14 ` [PATCH v1 02/10] zsmalloc: decouple handle and object Minchan Kim
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

We don't need to call zram_meta_free under init_lock.
What we need to prevent race is setting NULL into zram->meta
(ie, init_done). This patch does it.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 9250b3f..7e03d86 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -719,6 +719,8 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 	}
 
 	meta = zram->meta;
+	zram->meta = NULL;
+
 	/* Free all pages that are still in this zram device */
 	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
 		unsigned long handle = meta->table[index].handle;
@@ -731,8 +733,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 	zcomp_destroy(zram->comp);
 	zram->max_comp_streams = 1;
 
-	zram_meta_free(zram->meta);
-	zram->meta = NULL;
 	/* Reset stats */
 	memset(&zram->stats, 0, sizeof(zram->stats));
 
@@ -741,6 +741,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 		set_capacity(zram->disk, 0);
 
 	up_write(&zram->init_lock);
+	zram_meta_free(meta);
 
 	/*
 	 * Revalidate disk out of the init_lock to avoid lockdep splat.
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 02/10] zsmalloc: decouple handle and object
  2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 01/10] zram: avoid calling of zram_meta_free under init_lock Minchan Kim
@ 2015-01-21  6:14 ` Minchan Kim
  2015-01-26  2:53   ` Ganesh Mahendran
  2015-01-21  6:14 ` [PATCH v1 03/10] zsmalloc: implement reverse mapping Minchan Kim
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

Currently, zram's handle encodes object's location directly so
it makes hard to support migration/compaction.

This patch decouples handle and object via adding indirect layer.
For it, it allocates handle dynamically and returns it to user.
The handle is the address allocated by slab allocation so it's
unique and the memory allocated keeps object's position so that
we can get object's position from derefercing handle.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 90 ++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 68 insertions(+), 22 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 0dec1fa..9436ee8 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -110,6 +110,8 @@
 #define ZS_MAX_ZSPAGE_ORDER 2
 #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
 
+#define ZS_HANDLE_SIZE (sizeof(unsigned long))
+
 /*
  * Object location (<PFN>, <obj_idx>) is encoded as
  * as single (unsigned long) handle value.
@@ -241,6 +243,7 @@ struct zs_pool {
 	char *name;
 
 	struct size_class **size_class;
+	struct kmem_cache *handle_cachep;
 
 	gfp_t flags;	/* allocation flags used when growing pool */
 	atomic_long_t pages_allocated;
@@ -269,6 +272,34 @@ struct mapping_area {
 	enum zs_mapmode vm_mm; /* mapping mode */
 };
 
+static int create_handle_cache(struct zs_pool *pool)
+{
+	pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,
+					0, 0, NULL);
+	return pool->handle_cachep ? 0 : 1;
+}
+
+static void destroy_handle_cache(struct zs_pool *pool)
+{
+	kmem_cache_destroy(pool->handle_cachep);
+}
+
+static unsigned long alloc_handle(struct zs_pool *pool)
+{
+	return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
+		pool->flags & ~__GFP_HIGHMEM);
+}
+
+static void free_handle(struct zs_pool *pool, unsigned long handle)
+{
+	kmem_cache_free(pool->handle_cachep, (void *)handle);
+}
+
+static void record_obj(unsigned long handle, unsigned long obj)
+{
+	*(unsigned long *)handle = obj;
+}
+
 /* zpool driver */
 
 #ifdef CONFIG_ZPOOL
@@ -595,13 +626,18 @@ static void *obj_location_to_handle(struct page *page, unsigned long obj_idx)
  * decoded obj_idx back to its original value since it was adjusted in
  * obj_location_to_handle().
  */
-static void obj_handle_to_location(unsigned long handle, struct page **page,
+static void obj_to_location(unsigned long handle, struct page **page,
 				unsigned long *obj_idx)
 {
 	*page = pfn_to_page(handle >> OBJ_INDEX_BITS);
 	*obj_idx = (handle & OBJ_INDEX_MASK) - 1;
 }
 
+static unsigned long handle_to_obj(unsigned long handle)
+{
+	return *(unsigned long *)handle;
+}
+
 static unsigned long obj_idx_to_offset(struct page *page,
 				unsigned long obj_idx, int class_size)
 {
@@ -1153,7 +1189,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 			enum zs_mapmode mm)
 {
 	struct page *page;
-	unsigned long obj_idx, off;
+	unsigned long obj, obj_idx, off;
 
 	unsigned int class_idx;
 	enum fullness_group fg;
@@ -1170,7 +1206,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	 */
 	BUG_ON(in_interrupt());
 
-	obj_handle_to_location(handle, &page, &obj_idx);
+	obj = handle_to_obj(handle);
+	obj_to_location(obj, &page, &obj_idx);
 	get_zspage_mapping(get_first_page(page), &class_idx, &fg);
 	class = pool->size_class[class_idx];
 	off = obj_idx_to_offset(page, obj_idx, class->size);
@@ -1195,7 +1232,7 @@ EXPORT_SYMBOL_GPL(zs_map_object);
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 {
 	struct page *page;
-	unsigned long obj_idx, off;
+	unsigned long obj, obj_idx, off;
 
 	unsigned int class_idx;
 	enum fullness_group fg;
@@ -1204,7 +1241,8 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 
 	BUG_ON(!handle);
 
-	obj_handle_to_location(handle, &page, &obj_idx);
+	obj = handle_to_obj(handle);
+	obj_to_location(obj, &page, &obj_idx);
 	get_zspage_mapping(get_first_page(page), &class_idx, &fg);
 	class = pool->size_class[class_idx];
 	off = obj_idx_to_offset(page, obj_idx, class->size);
@@ -1236,7 +1274,7 @@ EXPORT_SYMBOL_GPL(zs_unmap_object);
  */
 unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 {
-	unsigned long obj;
+	unsigned long handle, obj;
 	struct link_free *link;
 	struct size_class *class;
 	void *vaddr;
@@ -1247,6 +1285,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
 		return 0;
 
+	handle = alloc_handle(pool);
+	if (!handle)
+		return 0;
+
 	class = pool->size_class[get_size_class_index(size)];
 
 	spin_lock(&class->lock);
@@ -1255,8 +1297,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	if (!first_page) {
 		spin_unlock(&class->lock);
 		first_page = alloc_zspage(class, pool->flags);
-		if (unlikely(!first_page))
+		if (unlikely(!first_page)) {
+			free_handle(pool, handle);
 			return 0;
+		}
 
 		set_zspage_mapping(first_page, class->index, ZS_EMPTY);
 		atomic_long_add(class->pages_per_zspage,
@@ -1268,7 +1312,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	}
 
 	obj = (unsigned long)first_page->freelist;
-	obj_handle_to_location(obj, &m_page, &m_objidx);
+	obj_to_location(obj, &m_page, &m_objidx);
 	m_offset = obj_idx_to_offset(m_page, m_objidx, class->size);
 
 	vaddr = kmap_atomic(m_page);
@@ -1281,27 +1325,30 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	zs_stat_inc(class, OBJ_USED, 1);
 	/* Now move the zspage to another fullness group, if required */
 	fix_fullness_group(pool, first_page);
+	record_obj(handle, obj);
 	spin_unlock(&class->lock);
 
-	return obj;
+	return handle;
 }
 EXPORT_SYMBOL_GPL(zs_malloc);
 
-void zs_free(struct zs_pool *pool, unsigned long obj)
+void zs_free(struct zs_pool *pool, unsigned long handle)
 {
 	struct link_free *link;
 	struct page *first_page, *f_page;
-	unsigned long f_objidx, f_offset;
+	unsigned long obj, f_objidx, f_offset;
 	void *vaddr;
 
 	int class_idx;
 	struct size_class *class;
 	enum fullness_group fullness;
 
-	if (unlikely(!obj))
+	if (unlikely(!handle))
 		return;
 
-	obj_handle_to_location(obj, &f_page, &f_objidx);
+	obj = handle_to_obj(handle);
+	free_handle(pool, handle);
+	obj_to_location(obj, &f_page, &f_objidx);
 	first_page = get_first_page(f_page);
 
 	get_zspage_mapping(first_page, &class_idx, &fullness);
@@ -1356,18 +1403,16 @@ struct zs_pool *zs_create_pool(char *name, gfp_t flags)
 		return NULL;
 
 	pool->name = kstrdup(name, GFP_KERNEL);
-	if (!pool->name) {
-		kfree(pool);
-		return NULL;
-	}
+	if (!pool->name)
+		goto err;
+
+	if (create_handle_cache(pool))
+		goto err;
 
 	pool->size_class = kcalloc(zs_size_classes, sizeof(struct size_class *),
 			GFP_KERNEL);
-	if (!pool->size_class) {
-		kfree(pool->name);
-		kfree(pool);
-		return NULL;
-	}
+	if (!pool->size_class)
+		goto err;
 
 	/*
 	 * Iterate reversly, because, size of size_class that we want to use
@@ -1450,6 +1495,7 @@ void zs_destroy_pool(struct zs_pool *pool)
 		kfree(class);
 	}
 
+	destroy_handle_cache(pool);
 	kfree(pool->size_class);
 	kfree(pool->name);
 	kfree(pool);
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 03/10] zsmalloc: implement reverse mapping
  2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 01/10] zram: avoid calling of zram_meta_free under init_lock Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 02/10] zsmalloc: decouple handle and object Minchan Kim
@ 2015-01-21  6:14 ` Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 04/10] zsmalloc: factor out obj_[malloc|free] Minchan Kim
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

This patch supports reverse mapping which gets handle from object.
For keeping handle per object, it allocates ZS_HANDLE_SIZE greater
than size user requested and stores handle in the extra space.
IOW, *(address mapped by zs_map_object - ZS_HANDLE_SIZE) == handle.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 38 ++++++++++++++++++++++++++++++--------
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9436ee8..2df2f1b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -142,7 +142,8 @@
 /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
 #define ZS_MIN_ALLOC_SIZE \
 	MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
-#define ZS_MAX_ALLOC_SIZE	PAGE_SIZE
+/* each chunk includes extra space to keep handle */
+#define ZS_MAX_ALLOC_SIZE	(PAGE_SIZE + ZS_HANDLE_SIZE)
 
 /*
  * On systems with 4K page size, this gives 255 size classes! There is a
@@ -235,8 +236,17 @@ struct size_class {
  * This must be power of 2 and less than or equal to ZS_ALIGN
  */
 struct link_free {
-	/* Handle of next free chunk (encodes <PFN, obj_idx>) */
-	void *next;
+	union {
+		/*
+		 * Position of next free chunk (encodes <PFN, obj_idx>)
+		 * It's valid for non-allocated object
+		 */
+		void *next;
+		/*
+		 * Handle of allocated object.
+		 */
+		unsigned long handle;
+	};
 };
 
 struct zs_pool {
@@ -896,12 +906,16 @@ static void __zs_unmap_object(struct mapping_area *area,
 {
 	int sizes[2];
 	void *addr;
-	char *buf = area->vm_buf;
+	char *buf;
 
 	/* no write fastpath */
 	if (area->vm_mm == ZS_MM_RO)
 		goto out;
 
+	buf = area->vm_buf + ZS_HANDLE_SIZE;
+	size -= ZS_HANDLE_SIZE;
+	off += ZS_HANDLE_SIZE;
+
 	sizes[0] = PAGE_SIZE - off;
 	sizes[1] = size - sizes[0];
 
@@ -1196,6 +1210,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	struct size_class *class;
 	struct mapping_area *area;
 	struct page *pages[2];
+	void *ret;
 
 	BUG_ON(!handle);
 
@@ -1217,7 +1232,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	if (off + class->size <= PAGE_SIZE) {
 		/* this object is contained entirely within a page */
 		area->vm_addr = kmap_atomic(page);
-		return area->vm_addr + off;
+		ret = area->vm_addr + off;
+		goto out;
 	}
 
 	/* this object spans two pages */
@@ -1225,7 +1241,9 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	pages[1] = get_next_page(page);
 	BUG_ON(!pages[1]);
 
-	return __zs_map_object(area, pages, off, class->size);
+	ret = __zs_map_object(area, pages, off, class->size);
+out:
+	return ret + ZS_HANDLE_SIZE;
 }
 EXPORT_SYMBOL_GPL(zs_map_object);
 
@@ -1282,13 +1300,15 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	struct page *first_page, *m_page;
 	unsigned long m_objidx, m_offset;
 
-	if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
+	if (unlikely(!size || (size + ZS_HANDLE_SIZE) > ZS_MAX_ALLOC_SIZE))
 		return 0;
 
 	handle = alloc_handle(pool);
 	if (!handle)
 		return 0;
 
+	/* extra space in chunk to keep the handle */
+	size += ZS_HANDLE_SIZE;
 	class = pool->size_class[get_size_class_index(size)];
 
 	spin_lock(&class->lock);
@@ -1318,7 +1338,9 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	vaddr = kmap_atomic(m_page);
 	link = (struct link_free *)vaddr + m_offset / sizeof(*link);
 	first_page->freelist = link->next;
-	memset(link, POISON_INUSE, sizeof(*link));
+
+	/* record handle in the header of allocated chunk */
+	link->handle = handle;
 	kunmap_atomic(vaddr);
 
 	first_page->inuse++;
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 04/10] zsmalloc: factor out obj_[malloc|free]
  2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
                   ` (2 preceding siblings ...)
  2015-01-21  6:14 ` [PATCH v1 03/10] zsmalloc: implement reverse mapping Minchan Kim
@ 2015-01-21  6:14 ` Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 05/10] zsmalloc: add status bit Minchan Kim
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

In later patch, migration needs a part of function in zs_[malloc|free].
So, this patch factor out them.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 99 ++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 61 insertions(+), 38 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2df2f1b..e52b1b6 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -525,11 +525,10 @@ static void remove_zspage(struct page *page, struct size_class *class,
  * page from the freelist of the old fullness group to that of the new
  * fullness group.
  */
-static enum fullness_group fix_fullness_group(struct zs_pool *pool,
+static enum fullness_group fix_fullness_group(struct size_class *class,
 						struct page *page)
 {
 	int class_idx;
-	struct size_class *class;
 	enum fullness_group currfg, newfg;
 
 	BUG_ON(!is_first_page(page));
@@ -539,7 +538,6 @@ static enum fullness_group fix_fullness_group(struct zs_pool *pool,
 	if (newfg == currfg)
 		goto out;
 
-	class = pool->size_class[class_idx];
 	remove_zspage(page, class, currfg);
 	insert_zspage(page, class, newfg);
 	set_zspage_mapping(page, class_idx, newfg);
@@ -1281,6 +1279,33 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
+static unsigned long obj_malloc(struct page *first_page,
+		struct size_class *class, unsigned long handle)
+{
+	unsigned long obj;
+	struct link_free *link;
+
+	struct page *m_page;
+	unsigned long m_objidx, m_offset;
+	void *vaddr;
+
+	obj = (unsigned long)first_page->freelist;
+	obj_to_location(obj, &m_page, &m_objidx);
+	m_offset = obj_idx_to_offset(m_page, m_objidx, class->size);
+
+	vaddr = kmap_atomic(m_page);
+	link = (struct link_free *)vaddr + m_offset / sizeof(*link);
+	first_page->freelist = link->next;
+	/* record handle in the header of allocated chunk */
+	link->handle = handle;
+	kunmap_atomic(vaddr);
+	first_page->inuse++;
+	zs_stat_inc(class, OBJ_USED, 1);
+
+	return obj;
+}
+
+
 /**
  * zs_malloc - Allocate block of given size from pool.
  * @pool: pool to allocate from
@@ -1293,12 +1318,8 @@ EXPORT_SYMBOL_GPL(zs_unmap_object);
 unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 {
 	unsigned long handle, obj;
-	struct link_free *link;
 	struct size_class *class;
-	void *vaddr;
-
-	struct page *first_page, *m_page;
-	unsigned long m_objidx, m_offset;
+	struct page *first_page;
 
 	if (unlikely(!size || (size + ZS_HANDLE_SIZE) > ZS_MAX_ALLOC_SIZE))
 		return 0;
@@ -1331,22 +1352,9 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 				class->size, class->pages_per_zspage));
 	}
 
-	obj = (unsigned long)first_page->freelist;
-	obj_to_location(obj, &m_page, &m_objidx);
-	m_offset = obj_idx_to_offset(m_page, m_objidx, class->size);
-
-	vaddr = kmap_atomic(m_page);
-	link = (struct link_free *)vaddr + m_offset / sizeof(*link);
-	first_page->freelist = link->next;
-
-	/* record handle in the header of allocated chunk */
-	link->handle = handle;
-	kunmap_atomic(vaddr);
-
-	first_page->inuse++;
-	zs_stat_inc(class, OBJ_USED, 1);
+	obj = obj_malloc(first_page, class, handle);
 	/* Now move the zspage to another fullness group, if required */
-	fix_fullness_group(pool, first_page);
+	fix_fullness_group(class, first_page);
 	record_obj(handle, obj);
 	spin_unlock(&class->lock);
 
@@ -1354,46 +1362,61 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 }
 EXPORT_SYMBOL_GPL(zs_malloc);
 
-void zs_free(struct zs_pool *pool, unsigned long handle)
+static void obj_free(struct zs_pool *pool, struct size_class *class,
+			unsigned long obj)
 {
 	struct link_free *link;
 	struct page *first_page, *f_page;
-	unsigned long obj, f_objidx, f_offset;
+	unsigned long f_objidx, f_offset;
 	void *vaddr;
-
 	int class_idx;
-	struct size_class *class;
 	enum fullness_group fullness;
 
-	if (unlikely(!handle))
-		return;
+	BUG_ON(!obj);
 
-	obj = handle_to_obj(handle);
-	free_handle(pool, handle);
 	obj_to_location(obj, &f_page, &f_objidx);
 	first_page = get_first_page(f_page);
 
 	get_zspage_mapping(first_page, &class_idx, &fullness);
-	class = pool->size_class[class_idx];
 	f_offset = obj_idx_to_offset(f_page, f_objidx, class->size);
 
-	spin_lock(&class->lock);
+	vaddr = kmap_atomic(f_page);
 
 	/* Insert this object in containing zspage's freelist */
-	vaddr = kmap_atomic(f_page);
 	link = (struct link_free *)(vaddr + f_offset);
 	link->next = first_page->freelist;
 	kunmap_atomic(vaddr);
 	first_page->freelist = (void *)obj;
-
 	first_page->inuse--;
-	fullness = fix_fullness_group(pool, first_page);
-
 	zs_stat_dec(class, OBJ_USED, 1);
+}
+
+void zs_free(struct zs_pool *pool, unsigned long handle)
+{
+	struct page *first_page, *f_page;
+	unsigned long obj, f_objidx;
+
+	int class_idx;
+	struct size_class *class;
+	enum fullness_group fullness;
+
+	if (unlikely(!handle))
+		return;
+
+	obj = handle_to_obj(handle);
+	free_handle(pool, handle);
+	obj_to_location(obj, &f_page, &f_objidx);
+	first_page = get_first_page(f_page);
+
+	get_zspage_mapping(first_page, &class_idx, &fullness);
+	class = pool->size_class[class_idx];
+
+	spin_lock(&class->lock);
+	obj_free(pool, class, obj);
+	fullness = fix_fullness_group(class, first_page);
 	if (fullness == ZS_EMPTY)
 		zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
 				class->size, class->pages_per_zspage));
-
 	spin_unlock(&class->lock);
 
 	if (fullness == ZS_EMPTY) {
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 05/10] zsmalloc: add status bit
  2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
                   ` (3 preceding siblings ...)
  2015-01-21  6:14 ` [PATCH v1 04/10] zsmalloc: factor out obj_[malloc|free] Minchan Kim
@ 2015-01-21  6:14 ` Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 06/10] zsmalloc: support compaction Minchan Kim
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

For migration, we need to identify which object in zspage is
allocated so that we could migrate allocated(ie, used) object.
We could know it by iterating of freed objects in a zspage
but it's inefficient. Instead, this patch adds a tag(ie,
OBJ_ALLOCATED_TAG) in the head of each object(ie, handle) so
we could check the object is allocated or not efficiently
during migration. Mixing the tag into handle kept in the memory
is okay because handle is allocated by slab so least two bit
is free to use it.

Another status bit this patch is adding is HANDLE_PIN_TAG.
During migration, it cannot move the object user are using
by zs_map_object due to data coherency between old object and
new object. Migration will check it to skip the object in
later patch.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 64 ++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 48 insertions(+), 16 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e52b1b6..99555da 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -135,7 +135,24 @@
 #endif
 #endif
 #define _PFN_BITS		(MAX_PHYSMEM_BITS - PAGE_SHIFT)
-#define OBJ_INDEX_BITS	(BITS_PER_LONG - _PFN_BITS)
+
+/*
+ * Memory for allocating handle keeps object position by
+ * encoding <page, obj_idx> and the encoded value has a room
+ * in the least bit(ie, look at obj_to_location).
+ * We use the bit to indicate the object is accessed by client
+ * via zs_map_object so the migraion can skip the object.
+ */
+#define HANDLE_PIN_TAG	1
+
+/*
+ * Head in allocated object should have OBJ_ALLOCATED_TAG.
+ * It's okay to keep the handle valid because handle is 4byte-aligned
+ * address so we have room for two bit.
+ */
+#define OBJ_ALLOCATED_TAG 1
+#define OBJ_TAG_BITS 1
+#define OBJ_INDEX_BITS	(BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
 #define OBJ_INDEX_MASK	((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
 
 #define MAX(a, b) ((a) >= (b) ? (a) : (b))
@@ -610,35 +627,35 @@ static struct page *get_next_page(struct page *page)
 
 /*
  * Encode <page, obj_idx> as a single handle value.
- * On hardware platforms with physical memory starting at 0x0 the pfn
- * could be 0 so we ensure that the handle will never be 0 by adjusting the
- * encoded obj_idx value before encoding.
+ * We use the least bit of handle for tagging.
  */
-static void *obj_location_to_handle(struct page *page, unsigned long obj_idx)
+static void *location_to_obj(struct page *page, unsigned long obj_idx)
 {
-	unsigned long handle;
+	unsigned long obj;
 
 	if (!page) {
 		BUG_ON(obj_idx);
 		return NULL;
 	}
 
-	handle = page_to_pfn(page) << OBJ_INDEX_BITS;
-	handle |= ((obj_idx + 1) & OBJ_INDEX_MASK);
+	obj = page_to_pfn(page) << OBJ_INDEX_BITS;
+	obj |= ((obj_idx) & OBJ_INDEX_MASK);
+	obj <<= OBJ_TAG_BITS;
 
-	return (void *)handle;
+	return (void *)obj;
 }
 
 /*
  * Decode <page, obj_idx> pair from the given object handle. We adjust the
  * decoded obj_idx back to its original value since it was adjusted in
- * obj_location_to_handle().
+ * location_to_obj().
  */
-static void obj_to_location(unsigned long handle, struct page **page,
+static void obj_to_location(unsigned long obj, struct page **page,
 				unsigned long *obj_idx)
 {
-	*page = pfn_to_page(handle >> OBJ_INDEX_BITS);
-	*obj_idx = (handle & OBJ_INDEX_MASK) - 1;
+	obj >>= OBJ_TAG_BITS;
+	*page = pfn_to_page(obj >> OBJ_INDEX_BITS);
+	*obj_idx = (obj & OBJ_INDEX_MASK);
 }
 
 static unsigned long handle_to_obj(unsigned long handle)
@@ -657,6 +674,16 @@ static unsigned long obj_idx_to_offset(struct page *page,
 	return off + obj_idx * class_size;
 }
 
+static void pin_tag(unsigned long handle)
+{
+	record_obj(handle, *(unsigned long *)handle | HANDLE_PIN_TAG);
+}
+
+static void unpin_tag(unsigned long handle)
+{
+	record_obj(handle, *(unsigned long *)handle & ~HANDLE_PIN_TAG);
+}
+
 static void reset_page(struct page *page)
 {
 	clear_bit(PG_private, &page->flags);
@@ -718,7 +745,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
 		link = (struct link_free *)vaddr + off / sizeof(*link);
 
 		while ((off += class->size) < PAGE_SIZE) {
-			link->next = obj_location_to_handle(page, i++);
+			link->next = location_to_obj(page, i++);
 			link += class->size / sizeof(*link);
 		}
 
@@ -728,7 +755,7 @@ static void init_zspage(struct page *first_page, struct size_class *class)
 		 * page (if present)
 		 */
 		next_page = get_next_page(page);
-		link->next = obj_location_to_handle(next_page, 0);
+		link->next = location_to_obj(next_page, 0);
 		kunmap_atomic(vaddr);
 		page = next_page;
 		off %= PAGE_SIZE;
@@ -782,7 +809,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
 
 	init_zspage(first_page, class);
 
-	first_page->freelist = obj_location_to_handle(first_page, 0);
+	first_page->freelist = location_to_obj(first_page, 0);
 	/* Maximum number of objects we can store in this zspage */
 	first_page->objects = class->pages_per_zspage * PAGE_SIZE / class->size;
 
@@ -1219,6 +1246,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	 */
 	BUG_ON(in_interrupt());
 
+	pin_tag(handle);
+
 	obj = handle_to_obj(handle);
 	obj_to_location(obj, &page, &obj_idx);
 	get_zspage_mapping(get_first_page(page), &class_idx, &fg);
@@ -1276,6 +1305,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 		__zs_unmap_object(area, pages, off, class->size);
 	}
 	put_cpu_var(zs_map_area);
+	unpin_tag(handle);
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
@@ -1289,6 +1319,7 @@ static unsigned long obj_malloc(struct page *first_page,
 	unsigned long m_objidx, m_offset;
 	void *vaddr;
 
+	handle |= OBJ_ALLOCATED_TAG;
 	obj = (unsigned long)first_page->freelist;
 	obj_to_location(obj, &m_page, &m_objidx);
 	m_offset = obj_idx_to_offset(m_page, m_objidx, class->size);
@@ -1374,6 +1405,7 @@ static void obj_free(struct zs_pool *pool, struct size_class *class,
 
 	BUG_ON(!obj);
 
+	obj &= ~OBJ_ALLOCATED_TAG;
 	obj_to_location(obj, &f_page, &f_objidx);
 	first_page = get_first_page(f_page);
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 06/10] zsmalloc: support compaction
  2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
                   ` (4 preceding siblings ...)
  2015-01-21  6:14 ` [PATCH v1 05/10] zsmalloc: add status bit Minchan Kim
@ 2015-01-21  6:14 ` Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 07/10] zsmalloc: adjust ZS_ALMOST_FULL Minchan Kim
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

This patch provides core functions for migration of zsmalloc.
Migraion policy is simple as follows.

It searches source zspages from ZS_ALMOST_EMPTY and destination
zspages from ZS_ALMOST_FULL and try to move objects in source
zspage into destination zspages. If it is lack of destination
pages in ZS_ALMOST_FULL, it falls back to ZS_ALMOST_EMPTY.
If all objects in source zspage moved out, the zspage could be
freed.

Migrate uses rcu freeing to free source zspage in migration
since migration could race with object accessing via
zs_map_object so that we can access size_class from handle
safely with rcu_read_[un]lock but it needs to recheck
handle's validity.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/zsmalloc.h |   1 +
 mm/zsmalloc.c            | 324 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 321 insertions(+), 4 deletions(-)

diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 3283c6a..1338190 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -47,5 +47,6 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
 
 unsigned long zs_get_total_pages(struct zs_pool *pool);
+unsigned long zs_compact(struct zs_pool *pool);
 
 #endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 99555da..99bf5bd 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -663,6 +663,11 @@ static unsigned long handle_to_obj(unsigned long handle)
 	return *(unsigned long *)handle;
 }
 
+unsigned long obj_to_head(void *obj)
+{
+	return *(unsigned long *)obj;
+}
+
 static unsigned long obj_idx_to_offset(struct page *page,
 				unsigned long obj_idx, int class_size)
 {
@@ -1044,6 +1049,13 @@ static bool can_merge(struct size_class *prev, int size, int pages_per_zspage)
 	return true;
 }
 
+static bool zspage_full(struct page *page)
+{
+	BUG_ON(!is_first_page(page));
+
+	return page->inuse == page->objects;
+}
+
 #ifdef CONFIG_ZSMALLOC_STAT
 
 static inline void zs_stat_inc(struct size_class *class,
@@ -1246,12 +1258,27 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	 */
 	BUG_ON(in_interrupt());
 
-	pin_tag(handle);
-
+retry:
+	/*
+	 * Migrating object will not be destroyed so we can get a first_page
+	 * safely but need to verify handle again.
+	 */
+	rcu_read_lock();
 	obj = handle_to_obj(handle);
 	obj_to_location(obj, &page, &obj_idx);
 	get_zspage_mapping(get_first_page(page), &class_idx, &fg);
 	class = pool->size_class[class_idx];
+	spin_lock(&class->lock);
+	if (obj != handle_to_obj(handle)) {
+		/* the object was moved by migration. Then fetch new object */
+		spin_unlock(&class->lock);
+		rcu_read_unlock();
+		goto retry;
+	}
+	rcu_read_unlock();
+	/* From now on, migration cannot move the object */
+	pin_tag(handle);
+	spin_unlock(&class->lock);
 	off = obj_idx_to_offset(page, obj_idx, class->size);
 
 	area = &get_cpu_var(zs_map_area);
@@ -1305,7 +1332,9 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 		__zs_unmap_object(area, pages, off, class->size);
 	}
 	put_cpu_var(zs_map_area);
+	spin_lock(&class->lock);
 	unpin_tag(handle);
+	spin_unlock(&class->lock);
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
@@ -1434,9 +1463,9 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 
 	if (unlikely(!handle))
 		return;
-
+retry:
+	rcu_read_lock();
 	obj = handle_to_obj(handle);
-	free_handle(pool, handle);
 	obj_to_location(obj, &f_page, &f_objidx);
 	first_page = get_first_page(f_page);
 
@@ -1444,6 +1473,15 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 	class = pool->size_class[class_idx];
 
 	spin_lock(&class->lock);
+	/* Retry if migrate moves object */
+	if (obj != handle_to_obj(handle)) {
+		spin_unlock(&class->lock);
+		rcu_read_unlock();
+		goto retry;
+	}
+	rcu_read_unlock();
+
+	free_handle(pool, handle);
 	obj_free(pool, class, obj);
 	fullness = fix_fullness_group(class, first_page);
 	if (fullness == ZS_EMPTY)
@@ -1459,6 +1497,284 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
+static void zs_object_copy(unsigned long src, unsigned long dst,
+				struct size_class *class)
+{
+	struct page *s_page, *d_page;
+	unsigned long s_objidx, d_objidx;
+	unsigned long s_off, d_off;
+	void *s_addr, *d_addr;
+	int s_size, d_size, size;
+	int written = 0;
+
+	s_size = d_size = class->size;
+
+	obj_to_location(src, &s_page, &s_objidx);
+	obj_to_location(dst, &d_page, &d_objidx);
+
+	s_off = obj_idx_to_offset(s_page, s_objidx, class->size);
+	d_off = obj_idx_to_offset(d_page, d_objidx, class->size);
+
+	if (s_off + class->size > PAGE_SIZE)
+		s_size = PAGE_SIZE - s_off;
+
+	if (d_off + class->size > PAGE_SIZE)
+		d_size = PAGE_SIZE - d_off;
+
+	s_addr = kmap_atomic(s_page);
+	d_addr = kmap_atomic(d_page);
+
+	while (1) {
+		size = min(s_size, d_size);
+		memcpy(d_addr + d_off, s_addr + s_off, size);
+		written += size;
+
+		if (written == class->size)
+			break;
+
+		if (s_off + size >= PAGE_SIZE) {
+			kunmap_atomic(d_addr);
+			kunmap_atomic(s_addr);
+			s_page = get_next_page(s_page);
+			BUG_ON(!s_page);
+			s_addr = kmap_atomic(s_page);
+			d_addr = kmap_atomic(d_page);
+			s_size = class->size - written;
+			s_off = 0;
+		} else {
+			s_off += size;
+			s_size -= size;
+		}
+
+		if (d_off + size >= PAGE_SIZE) {
+			kunmap_atomic(d_addr);
+			d_page = get_next_page(d_page);
+			BUG_ON(!d_page);
+			d_addr = kmap_atomic(d_page);
+			d_size = class->size - written;
+			d_off = 0;
+		} else {
+			d_off += size;
+			d_size -= size;
+		}
+	}
+
+	kunmap_atomic(d_addr);
+	kunmap_atomic(s_addr);
+}
+
+/*
+ * Find alloced object in zspage from index object and
+ * return handle.
+ */
+static unsigned long find_alloced_obj(struct page *page, int index,
+					struct size_class *class)
+{
+	unsigned long head;
+	int offset = 0;
+	unsigned long handle = 0;
+	void *addr = kmap_atomic(page);
+
+	if (!is_first_page(page))
+		offset = page->index;
+	offset += class->size * index;
+
+	while (offset < PAGE_SIZE) {
+		head = obj_to_head(addr + offset);
+		if (head & OBJ_ALLOCATED_TAG) {
+			handle = head & ~OBJ_ALLOCATED_TAG;
+			if (!(*(unsigned long *)handle & HANDLE_PIN_TAG))
+				break;
+			handle = 0;
+		}
+
+		offset += class->size;
+		index++;
+	}
+
+	kunmap_atomic(addr);
+	return handle;
+}
+
+struct zs_compact_control {
+	/* from page for migration. It could be subpage, not first page */
+	struct page *s_page;
+	int index; /* start index from @s_page for finding used object */
+	/* to page for migration. It must be first_page */
+	struct page *d_page;
+};
+
+static int migrate_zspage(struct zs_pool *pool, struct zs_compact_control *cc,
+				struct size_class *class)
+{
+	unsigned long used_obj, free_obj;
+	unsigned long handle;
+	struct page *s_page = cc->s_page;
+	struct page *d_page = cc->d_page;
+	unsigned long index = cc->index;
+	int nr_migrated = 0;
+
+	while (1) {
+		handle = find_alloced_obj(s_page, index, class);
+		if (!handle) {
+			s_page = get_next_page(s_page);
+			if (!s_page)
+				break;
+			index = 0;
+			continue;
+		}
+
+		/* stop if there is no more space */
+		if (zspage_full(d_page))
+			break;
+
+		used_obj = handle_to_obj(handle);
+		free_obj = obj_malloc(d_page, class, handle);
+		zs_object_copy(used_obj, free_obj, class);
+		index++;
+		record_obj(handle, free_obj);
+		obj_free(pool, class, used_obj);
+		nr_migrated++;
+	}
+
+	cc->s_page = s_page;
+	cc->index = index;
+
+	return nr_migrated;
+}
+
+static struct page *alloc_target_page(struct size_class *class)
+{
+	int i;
+	struct page *page;
+
+	for (i = 0; i < _ZS_NR_FULLNESS_GROUPS; i++) {
+		page = class->fullness_list[i];
+		if (page) {
+			remove_zspage(page, class, i);
+			break;
+		}
+	}
+
+	return page;
+}
+
+static void rcu_free_zspage(struct rcu_head *h)
+{
+	struct page *first_page;
+
+	first_page = container_of((struct list_head *)h, struct page, lru);
+	free_zspage(first_page);
+}
+
+static void putback_zspage(struct zs_pool *pool, struct size_class *class,
+				struct page *first_page)
+{
+	int class_idx;
+	enum fullness_group fullness;
+
+	BUG_ON(!is_first_page(first_page));
+
+	get_zspage_mapping(first_page, &class_idx, &fullness);
+	insert_zspage(first_page, class, fullness);
+	fullness = fix_fullness_group(class, first_page);
+	if (fullness == ZS_EMPTY) {
+		struct rcu_head *head;
+
+		zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
+			class->size, class->pages_per_zspage));
+		atomic_long_sub(class->pages_per_zspage,
+				&pool->pages_allocated);
+		head = (struct rcu_head *)&first_page->lru;
+		call_rcu(head, rcu_free_zspage);
+	}
+}
+
+static struct page *isolate_source_page(struct size_class *class)
+{
+	struct page *page;
+
+	page = class->fullness_list[ZS_ALMOST_EMPTY];
+	if (page)
+		remove_zspage(page, class, ZS_ALMOST_EMPTY);
+
+	return page;
+}
+
+static unsigned long __zs_compact(struct zs_pool *pool,
+				struct size_class *class)
+{
+	unsigned long nr_total_migrated = 0;
+	struct page *src_page;
+	struct page *dst_page = NULL;
+
+	spin_lock(&class->lock);
+	while ((src_page = isolate_source_page(class))) {
+		int nr_to_migrate, nr_migrated;
+		struct zs_compact_control cc;
+
+		BUG_ON(!is_first_page(src_page));
+
+		cc.index = 0;
+		cc.s_page = src_page;
+		nr_to_migrate = src_page->inuse;
+new_target:
+		dst_page = alloc_target_page(class);
+		if (!dst_page)
+			break;
+
+		cc.d_page = dst_page;
+
+		nr_migrated = migrate_zspage(pool, &cc, class);
+		/*
+		 * Allocate new target page if it was failed by
+		 * shortage of free object in the target page
+		 */
+		if (nr_to_migrate > nr_migrated &&
+			zspage_full(dst_page) && cc.s_page != NULL) {
+			putback_zspage(pool, class, cc.d_page);
+			nr_total_migrated += nr_migrated;
+			nr_to_migrate -= nr_migrated;
+			goto new_target;
+		}
+
+		putback_zspage(pool, class, cc.d_page);
+		putback_zspage(pool, class, src_page);
+		spin_unlock(&class->lock);
+		nr_total_migrated += nr_migrated;
+		cond_resched();
+		spin_lock(&class->lock);
+	}
+	if (src_page)
+		putback_zspage(pool, class, src_page);
+
+	spin_unlock(&class->lock);
+
+	return nr_total_migrated;
+}
+
+unsigned long zs_compact(struct zs_pool *pool)
+{
+	int i;
+	unsigned long nr_migrated = 0;
+	struct size_class *class;
+
+	for (i = zs_size_classes - 1; i >= 0; i--) {
+		class = pool->size_class[i];
+		if (!class)
+			continue;
+		if (class->index != i)
+			continue;
+		nr_migrated += __zs_compact(pool, class);
+	}
+
+	if (nr_migrated)
+		synchronize_rcu();
+
+	return nr_migrated;
+}
+EXPORT_SYMBOL_GPL(zs_compact);
+
 /**
  * zs_create_pool - Creates an allocation pool to work from.
  * @flags: allocation flags used to allocate pool metadata
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 07/10] zsmalloc: adjust ZS_ALMOST_FULL
  2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
                   ` (5 preceding siblings ...)
  2015-01-21  6:14 ` [PATCH v1 06/10] zsmalloc: support compaction Minchan Kim
@ 2015-01-21  6:14 ` Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 08/10] zram: support compaction Minchan Kim
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

Curretly, zsmalloc regards a zspage as ZS_ALMOST_EMPTY if the zspage
has under 1/4 used objects(ie, fullness_threshold_frac).
It could make result in loose packing since zsmalloc migrates
only ZS_ALMOST_EMPTY zspage out.

This patch changes the rule so that zsmalloc makes zspage which has
above 3/4 used object ZS_ALMOST_FULL so it could make tight packing.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 99bf5bd..8217e8e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -477,7 +477,7 @@ static enum fullness_group get_fullness_group(struct page *page)
 		fg = ZS_EMPTY;
 	else if (inuse == max_objects)
 		fg = ZS_FULL;
-	else if (inuse <= max_objects / fullness_threshold_frac)
+	else if (inuse <= 3 * max_objects / fullness_threshold_frac)
 		fg = ZS_ALMOST_EMPTY;
 	else
 		fg = ZS_ALMOST_FULL;
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 08/10] zram: support compaction
  2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
                   ` (6 preceding siblings ...)
  2015-01-21  6:14 ` [PATCH v1 07/10] zsmalloc: adjust ZS_ALMOST_FULL Minchan Kim
@ 2015-01-21  6:14 ` Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 09/10] zsmalloc: add fullness into stat Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 10/10] zsmalloc: record handle in page->private for huge object Minchan Kim
  9 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

Now that zsmalloc supports compaction, zram can use it.
For the first step, this patch exports compact knob via sysfs
so user can do compaction via "echo 1 > /sys/block/zram0/compact".

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/ABI/testing/sysfs-block-zram |  8 ++++++++
 drivers/block/zram/zram_drv.c              | 25 +++++++++++++++++++++++++
 drivers/block/zram/zram_drv.h              |  1 +
 3 files changed, 34 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index a6148ea..91ad707 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -141,3 +141,11 @@ Description:
 		amount of memory ZRAM can use to store the compressed data.  The
 		limit could be changed in run time and "0" means disable the
 		limit.  No limit is the initial state.  Unit: bytes
+
+What:		/sys/block/zram<id>/compact
+Date:		August 2015
+Contact:	Minchan Kim <minchan@kernel.org>
+Description:
+		The compact file is write-only and trigger compaction for
+		allocator zrm uses. The allocator moves some objects so that
+		it could free fragment space.
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 7e03d86..7ac8dae 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -246,6 +246,27 @@ static ssize_t comp_algorithm_store(struct device *dev,
 	return len;
 }
 
+static ssize_t compact_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	unsigned long nr_migrated;
+	struct zram *zram = dev_to_zram(dev);
+	struct zram_meta *meta;
+
+	down_read(&zram->init_lock);
+	if (!init_done(zram)) {
+		up_read(&zram->init_lock);
+		return -EINVAL;
+	}
+
+	meta = zram->meta;
+	nr_migrated = zs_compact(meta->mem_pool);
+	atomic64_add(nr_migrated, &zram->stats.num_migrated);
+	up_read(&zram->init_lock);
+
+	return len;
+}
+
 /* flag operations needs meta->tb_lock */
 static int zram_test_flag(struct zram_meta *meta, u32 index,
 			enum zram_pageflags flag)
@@ -994,6 +1015,7 @@ static const struct block_device_operations zram_devops = {
 	.owner = THIS_MODULE
 };
 
+static DEVICE_ATTR_WO(compact);
 static DEVICE_ATTR_RW(disksize);
 static DEVICE_ATTR_RO(initstate);
 static DEVICE_ATTR_WO(reset);
@@ -1012,6 +1034,7 @@ ZRAM_ATTR_RO(invalid_io);
 ZRAM_ATTR_RO(notify_free);
 ZRAM_ATTR_RO(zero_pages);
 ZRAM_ATTR_RO(compr_data_size);
+ZRAM_ATTR_RO(num_migrated);
 
 static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_disksize.attr,
@@ -1021,6 +1044,8 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_num_writes.attr,
 	&dev_attr_failed_reads.attr,
 	&dev_attr_failed_writes.attr,
+	&dev_attr_num_migrated.attr,
+	&dev_attr_compact.attr,
 	&dev_attr_invalid_io.attr,
 	&dev_attr_notify_free.attr,
 	&dev_attr_zero_pages.attr,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index b05a816..5e7a565 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -84,6 +84,7 @@ struct zram_stats {
 	atomic64_t compr_data_size;	/* compressed size of pages stored */
 	atomic64_t num_reads;	/* failed + successful */
 	atomic64_t num_writes;	/* --do-- */
+	atomic64_t num_migrated;	/* no. of migrated object */
 	atomic64_t failed_reads;	/* can happen when memory is too low */
 	atomic64_t failed_writes;	/* can happen when memory is too low */
 	atomic64_t invalid_io;	/* non-page-aligned I/O requests */
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 09/10] zsmalloc: add fullness into stat
  2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
                   ` (7 preceding siblings ...)
  2015-01-21  6:14 ` [PATCH v1 08/10] zram: support compaction Minchan Kim
@ 2015-01-21  6:14 ` Minchan Kim
  2015-01-21  6:14 ` [PATCH v1 10/10] zsmalloc: record handle in page->private for huge object Minchan Kim
  9 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

During investigating compaction, fullness information of each class
and pages_per_zspage are helpful for investigating how compaction
works well on each size class.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 349 +++++++++++++++++++++++++++++++---------------------------
 1 file changed, 184 insertions(+), 165 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8217e8e..48b702e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -192,6 +192,8 @@ enum fullness_group {
 enum zs_stat_type {
 	OBJ_ALLOCATED,
 	OBJ_USED,
+	CLASS_ALMOST_FULL,
+	CLASS_ALMOST_EMPTY,
 	NR_ZS_STAT_TYPE,
 };
 
@@ -404,6 +406,11 @@ static struct zpool_driver zs_zpool_driver = {
 MODULE_ALIAS("zpool-zsmalloc");
 #endif /* CONFIG_ZPOOL */
 
+static unsigned int get_maxobj_per_zspage(int size, int pages_per_zspage)
+{
+	return pages_per_zspage * PAGE_SIZE / size;
+}
+
 /* per-cpu VM mapping areas for zspage accesses that cross page boundaries */
 static DEFINE_PER_CPU(struct mapping_area, zs_map_area);
 
@@ -457,6 +464,179 @@ static int get_size_class_index(int size)
 	return idx;
 }
 
+#ifdef CONFIG_ZSMALLOC_STAT
+
+static inline void zs_stat_inc(struct size_class *class,
+				enum zs_stat_type type, unsigned long cnt)
+{
+	class->stats.objs[type] += cnt;
+}
+
+static inline void zs_stat_dec(struct size_class *class,
+				enum zs_stat_type type, unsigned long cnt)
+{
+	class->stats.objs[type] -= cnt;
+}
+
+static inline unsigned long zs_stat_get(struct size_class *class,
+				enum zs_stat_type type)
+{
+	return class->stats.objs[type];
+}
+
+static int __init zs_stat_init(void)
+{
+	if (!debugfs_initialized())
+		return -ENODEV;
+
+	zs_stat_root = debugfs_create_dir("zsmalloc", NULL);
+	if (!zs_stat_root)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void __exit zs_stat_exit(void)
+{
+	debugfs_remove_recursive(zs_stat_root);
+}
+
+static int zs_stats_size_show(struct seq_file *s, void *v)
+{
+	int i;
+	struct zs_pool *pool = s->private;
+	struct size_class *class;
+	int objs_per_zspage;
+	unsigned long class_almost_full, class_almost_empty;
+	unsigned long obj_allocated, obj_used, pages_used;
+	unsigned long total_class_almost_full = 0, total_class_almost_empty = 0;
+	unsigned long total_objs = 0, total_used_objs = 0, total_pages = 0;
+
+	seq_printf(s, " %5s %5s %11s %12s %13s %10s %10s %16s\n",
+			"class", "size", "almost_full", "almost_empty",
+			"obj_allocated", "obj_used", "pages_used",
+			"pages_per_zspage");
+
+	for (i = 0; i < zs_size_classes; i++) {
+		class = pool->size_class[i];
+
+		if (class->index != i)
+			continue;
+
+		spin_lock(&class->lock);
+		class_almost_full = zs_stat_get(class, CLASS_ALMOST_FULL);
+		class_almost_empty = zs_stat_get(class, CLASS_ALMOST_EMPTY);
+		obj_allocated = zs_stat_get(class, OBJ_ALLOCATED);
+		obj_used = zs_stat_get(class, OBJ_USED);
+		spin_unlock(&class->lock);
+
+		objs_per_zspage = get_maxobj_per_zspage(class->size,
+				class->pages_per_zspage);
+		pages_used = obj_allocated / objs_per_zspage *
+				class->pages_per_zspage;
+
+		seq_printf(s, " %5u %5u %11lu %12lu %13lu %10lu %10lu %16d\n",
+			i, class->size, class_almost_full, class_almost_empty,
+			obj_allocated, obj_used, pages_used,
+			class->pages_per_zspage);
+
+		total_class_almost_full += class_almost_full;
+		total_class_almost_empty += class_almost_empty;
+		total_objs += obj_allocated;
+		total_used_objs += obj_used;
+		total_pages += pages_used;
+	}
+
+	seq_puts(s, "\n");
+	seq_printf(s, " %5s %5s %11lu %12lu %13lu %10lu %10lu\n",
+			"Total", "", total_class_almost_full,
+			total_class_almost_empty, total_objs,
+			total_used_objs, total_pages);
+
+	return 0;
+}
+
+static int zs_stats_size_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, zs_stats_size_show, inode->i_private);
+}
+
+static const struct file_operations zs_stat_size_ops = {
+	.open           = zs_stats_size_open,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = single_release,
+};
+
+static int zs_pool_stat_create(char *name, struct zs_pool *pool)
+{
+	struct dentry *entry;
+
+	if (!zs_stat_root)
+		return -ENODEV;
+
+	entry = debugfs_create_dir(name, zs_stat_root);
+	if (!entry) {
+		pr_warn("debugfs dir <%s> creation failed\n", name);
+		return -ENOMEM;
+	}
+	pool->stat_dentry = entry;
+
+	entry = debugfs_create_file("classes", S_IFREG | S_IRUGO,
+			pool->stat_dentry, pool, &zs_stat_size_ops);
+	if (!entry) {
+		pr_warn("%s: debugfs file entry <%s> creation failed\n",
+				name, "classes");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void zs_pool_stat_destroy(struct zs_pool *pool)
+{
+	debugfs_remove_recursive(pool->stat_dentry);
+}
+
+#else /* CONFIG_ZSMALLOC_STAT */
+
+static inline void zs_stat_inc(struct size_class *class,
+				enum zs_stat_type type, unsigned long cnt)
+{
+}
+
+static inline void zs_stat_dec(struct size_class *class,
+				enum zs_stat_type type, unsigned long cnt)
+{
+}
+
+static inline unsigned long zs_stat_get(struct size_class *class,
+				enum zs_stat_type type)
+{
+	return 0;
+}
+
+static int __init zs_stat_init(void)
+{
+	return 0;
+}
+
+static void __exit zs_stat_exit(void)
+{
+}
+
+static inline int zs_pool_stat_create(char *name, struct zs_pool *pool)
+{
+	return 0;
+}
+
+static inline void zs_pool_stat_destroy(struct zs_pool *pool)
+{
+}
+
+#endif
+
+
 /*
  * For each size class, zspages are divided into different groups
  * depending on how "full" they are. This was done so that we could
@@ -506,6 +686,8 @@ static void insert_zspage(struct page *page, struct size_class *class,
 		list_add_tail(&page->lru, &(*head)->lru);
 
 	*head = page;
+	zs_stat_inc(class, fullness == ZS_ALMOST_EMPTY ?
+			CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1);
 }
 
 /*
@@ -531,6 +713,8 @@ static void remove_zspage(struct page *page, struct size_class *class,
 					struct page, lru);
 
 	list_del_init(&page->lru);
+	zs_stat_dec(class, fullness == ZS_ALMOST_EMPTY ?
+			CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1);
 }
 
 /*
@@ -1032,11 +1216,6 @@ static void init_zs_size_classes(void)
 	zs_size_classes = nr;
 }
 
-static unsigned int get_maxobj_per_zspage(int size, int pages_per_zspage)
-{
-	return pages_per_zspage * PAGE_SIZE / size;
-}
-
 static bool can_merge(struct size_class *prev, int size, int pages_per_zspage)
 {
 	if (prev->pages_per_zspage != pages_per_zspage)
@@ -1056,166 +1235,6 @@ static bool zspage_full(struct page *page)
 	return page->inuse == page->objects;
 }
 
-#ifdef CONFIG_ZSMALLOC_STAT
-
-static inline void zs_stat_inc(struct size_class *class,
-				enum zs_stat_type type, unsigned long cnt)
-{
-	class->stats.objs[type] += cnt;
-}
-
-static inline void zs_stat_dec(struct size_class *class,
-				enum zs_stat_type type, unsigned long cnt)
-{
-	class->stats.objs[type] -= cnt;
-}
-
-static inline unsigned long zs_stat_get(struct size_class *class,
-				enum zs_stat_type type)
-{
-	return class->stats.objs[type];
-}
-
-static int __init zs_stat_init(void)
-{
-	if (!debugfs_initialized())
-		return -ENODEV;
-
-	zs_stat_root = debugfs_create_dir("zsmalloc", NULL);
-	if (!zs_stat_root)
-		return -ENOMEM;
-
-	return 0;
-}
-
-static void __exit zs_stat_exit(void)
-{
-	debugfs_remove_recursive(zs_stat_root);
-}
-
-static int zs_stats_size_show(struct seq_file *s, void *v)
-{
-	int i;
-	struct zs_pool *pool = s->private;
-	struct size_class *class;
-	int objs_per_zspage;
-	unsigned long obj_allocated, obj_used, pages_used;
-	unsigned long total_objs = 0, total_used_objs = 0, total_pages = 0;
-
-	seq_printf(s, " %5s %5s %13s %10s %10s\n", "class", "size",
-				"obj_allocated", "obj_used", "pages_used");
-
-	for (i = 0; i < zs_size_classes; i++) {
-		class = pool->size_class[i];
-
-		if (class->index != i)
-			continue;
-
-		spin_lock(&class->lock);
-		obj_allocated = zs_stat_get(class, OBJ_ALLOCATED);
-		obj_used = zs_stat_get(class, OBJ_USED);
-		spin_unlock(&class->lock);
-
-		objs_per_zspage = get_maxobj_per_zspage(class->size,
-				class->pages_per_zspage);
-		pages_used = obj_allocated / objs_per_zspage *
-				class->pages_per_zspage;
-
-		seq_printf(s, " %5u %5u    %10lu %10lu %10lu\n", i,
-			class->size, obj_allocated, obj_used, pages_used);
-
-		total_objs += obj_allocated;
-		total_used_objs += obj_used;
-		total_pages += pages_used;
-	}
-
-	seq_puts(s, "\n");
-	seq_printf(s, " %5s %5s    %10lu %10lu %10lu\n", "Total", "",
-			total_objs, total_used_objs, total_pages);
-
-	return 0;
-}
-
-static int zs_stats_size_open(struct inode *inode, struct file *file)
-{
-	return single_open(file, zs_stats_size_show, inode->i_private);
-}
-
-static const struct file_operations zs_stat_size_ops = {
-	.open           = zs_stats_size_open,
-	.read           = seq_read,
-	.llseek         = seq_lseek,
-	.release        = single_release,
-};
-
-static int zs_pool_stat_create(char *name, struct zs_pool *pool)
-{
-	struct dentry *entry;
-
-	if (!zs_stat_root)
-		return -ENODEV;
-
-	entry = debugfs_create_dir(name, zs_stat_root);
-	if (!entry) {
-		pr_warn("debugfs dir <%s> creation failed\n", name);
-		return -ENOMEM;
-	}
-	pool->stat_dentry = entry;
-
-	entry = debugfs_create_file("obj_in_classes", S_IFREG | S_IRUGO,
-			pool->stat_dentry, pool, &zs_stat_size_ops);
-	if (!entry) {
-		pr_warn("%s: debugfs file entry <%s> creation failed\n",
-				name, "obj_in_classes");
-		return -ENOMEM;
-	}
-
-	return 0;
-}
-
-static void zs_pool_stat_destroy(struct zs_pool *pool)
-{
-	debugfs_remove_recursive(pool->stat_dentry);
-}
-
-#else /* CONFIG_ZSMALLOC_STAT */
-
-static inline void zs_stat_inc(struct size_class *class,
-				enum zs_stat_type type, unsigned long cnt)
-{
-}
-
-static inline void zs_stat_dec(struct size_class *class,
-				enum zs_stat_type type, unsigned long cnt)
-{
-}
-
-static inline unsigned long zs_stat_get(struct size_class *class,
-				enum zs_stat_type type)
-{
-	return 0;
-}
-
-static int __init zs_stat_init(void)
-{
-	return 0;
-}
-
-static void __exit zs_stat_exit(void)
-{
-}
-
-static inline int zs_pool_stat_create(char *name, struct zs_pool *pool)
-{
-	return 0;
-}
-
-static inline void zs_pool_stat_destroy(struct zs_pool *pool)
-{
-}
-
-#endif
-
 unsigned long zs_get_total_pages(struct zs_pool *pool)
 {
 	return atomic_long_read(&pool->pages_allocated);
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 10/10] zsmalloc: record handle in page->private for huge object
  2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
                   ` (8 preceding siblings ...)
  2015-01-21  6:14 ` [PATCH v1 09/10] zsmalloc: add fullness into stat Minchan Kim
@ 2015-01-21  6:14 ` Minchan Kim
  9 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-21  6:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Dan Streetman, Seth Jennings,
	Nitin Gupta, Juneho Choi, Gunho Lee, Luigi Semenzato,
	Jerome Marchand, Sergey Senozhatsky, Minchan Kim

We stores handle on header of each allocated object so it
increases the size of each object by sizeof(unsigned long).

If zram stores 4096 bytes to zsmalloc(ie, bad compression),
zsmalloc needs 4104B-class to store the data with handle.

However, 4104B-class has 1-pages_per_zspage so wasted size by
internal fragment is 8192 - 4104, which is terrible.

So this patch records the handle in page->private on such
huge object(ie, pages_per_zspage == 1 && maxobj_per_zspage == 1)
instead of header of each object so we could use 4096B-class,
not 4104B-class.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 54 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 48b702e..c9f9230 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -57,6 +57,8 @@
  *
  *	page->private (union with page->first_page): refers to the
  *		component page after the first page
+ *		If the page is first_page for huge object, it stores handle.
+ *		Look at size_class->huge.
  *	page->freelist: points to the first free object in zspage.
  *		Free objects are linked together using in-place
  *		metadata.
@@ -160,7 +162,7 @@
 #define ZS_MIN_ALLOC_SIZE \
 	MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
 /* each chunk includes extra space to keep handle */
-#define ZS_MAX_ALLOC_SIZE	(PAGE_SIZE + ZS_HANDLE_SIZE)
+#define ZS_MAX_ALLOC_SIZE	PAGE_SIZE
 
 /*
  * On systems with 4K page size, this gives 255 size classes! There is a
@@ -238,6 +240,8 @@ struct size_class {
 
 	/* Number of PAGE_SIZE sized pages to combine to form a 'zspage' */
 	int pages_per_zspage;
+	/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
+	bool huge;
 
 #ifdef CONFIG_ZSMALLOC_STAT
 	struct zs_size_stat stats;
@@ -299,6 +303,7 @@ struct mapping_area {
 #endif
 	char *vm_addr; /* address of kmap_atomic()'ed pages */
 	enum zs_mapmode vm_mm; /* mapping mode */
+	bool huge;
 };
 
 static int create_handle_cache(struct zs_pool *pool)
@@ -461,7 +466,7 @@ static int get_size_class_index(int size)
 		idx = DIV_ROUND_UP(size - ZS_MIN_ALLOC_SIZE,
 				ZS_SIZE_CLASS_DELTA);
 
-	return idx;
+	return min(zs_size_classes - 1, idx);
 }
 
 #ifdef CONFIG_ZSMALLOC_STAT
@@ -847,9 +852,14 @@ static unsigned long handle_to_obj(unsigned long handle)
 	return *(unsigned long *)handle;
 }
 
-unsigned long obj_to_head(void *obj)
+static unsigned long obj_to_head(struct size_class *class, struct page *page,
+			void *obj)
 {
-	return *(unsigned long *)obj;
+	if (class->huge) {
+		VM_BUG_ON(!is_first_page(page));
+		return *(unsigned long *)page_private(page);
+	} else
+		return *(unsigned long *)obj;
 }
 
 static unsigned long obj_idx_to_offset(struct page *page,
@@ -1126,9 +1136,12 @@ static void __zs_unmap_object(struct mapping_area *area,
 	if (area->vm_mm == ZS_MM_RO)
 		goto out;
 
-	buf = area->vm_buf + ZS_HANDLE_SIZE;
-	size -= ZS_HANDLE_SIZE;
-	off += ZS_HANDLE_SIZE;
+	buf = area->vm_buf;
+	if (!area->huge) {
+		buf = buf + ZS_HANDLE_SIZE;
+		size -= ZS_HANDLE_SIZE;
+		off += ZS_HANDLE_SIZE;
+	}
 
 	sizes[0] = PAGE_SIZE - off;
 	sizes[1] = size - sizes[0];
@@ -1316,7 +1329,10 @@ retry:
 
 	ret = __zs_map_object(area, pages, off, class->size);
 out:
-	return ret + ZS_HANDLE_SIZE;
+	if (!class->huge)
+		ret += ZS_HANDLE_SIZE;
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(zs_map_object);
 
@@ -1375,8 +1391,12 @@ static unsigned long obj_malloc(struct page *first_page,
 	vaddr = kmap_atomic(m_page);
 	link = (struct link_free *)vaddr + m_offset / sizeof(*link);
 	first_page->freelist = link->next;
-	/* record handle in the header of allocated chunk */
-	link->handle = handle;
+	if (!class->huge)
+		/* record handle in the header of allocated chunk */
+		link->handle = handle;
+	else
+		/* record handle in first_page->private */
+		set_page_private(first_page, handle);
 	kunmap_atomic(vaddr);
 	first_page->inuse++;
 	zs_stat_inc(class, OBJ_USED, 1);
@@ -1400,7 +1420,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	struct size_class *class;
 	struct page *first_page;
 
-	if (unlikely(!size || (size + ZS_HANDLE_SIZE) > ZS_MAX_ALLOC_SIZE))
+	if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
 		return 0;
 
 	handle = alloc_handle(pool);
@@ -1410,6 +1430,11 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 	/* extra space in chunk to keep the handle */
 	size += ZS_HANDLE_SIZE;
 	class = pool->size_class[get_size_class_index(size)];
+	/* In huge class size, we store the handle into first_page->private */
+	if (class->huge) {
+		size -= ZS_HANDLE_SIZE;
+		class = pool->size_class[get_size_class_index(size)];
+	}
 
 	spin_lock(&class->lock);
 	first_page = find_get_zspage(class);
@@ -1465,6 +1490,8 @@ static void obj_free(struct zs_pool *pool, struct size_class *class,
 	/* Insert this object in containing zspage's freelist */
 	link = (struct link_free *)(vaddr + f_offset);
 	link->next = first_page->freelist;
+	if (class->huge)
+		set_page_private(first_page, 0);
 	kunmap_atomic(vaddr);
 	first_page->freelist = (void *)obj;
 	first_page->inuse--;
@@ -1599,7 +1626,7 @@ static unsigned long find_alloced_obj(struct page *page, int index,
 	offset += class->size * index;
 
 	while (offset < PAGE_SIZE) {
-		head = obj_to_head(addr + offset);
+		head = obj_to_head(class, page, addr + offset);
 		if (head & OBJ_ALLOCATED_TAG) {
 			handle = head & ~OBJ_ALLOCATED_TAG;
 			if (!(*(unsigned long *)handle & HANDLE_PIN_TAG))
@@ -1863,6 +1890,9 @@ struct zs_pool *zs_create_pool(char *name, gfp_t flags)
 		class->size = size;
 		class->index = i;
 		class->pages_per_zspage = pages_per_zspage;
+		if (pages_per_zspage == 1 &&
+			get_maxobj_per_zspage(size, pages_per_zspage) == 1)
+			class->huge = true;
 		spin_lock_init(&class->lock);
 		pool->size_class[i] = class;
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 01/10] zram: avoid calling of zram_meta_free under init_lock
  2015-01-21  6:14 ` [PATCH v1 01/10] zram: avoid calling of zram_meta_free under init_lock Minchan Kim
@ 2015-01-21 14:21   ` Sergey Senozhatsky
  2015-01-23  1:03     ` Minchan Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2015-01-21 14:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Dan Streetman,
	Seth Jennings, Nitin Gupta, Juneho Choi, Gunho Lee,
	Luigi Semenzato, Jerome Marchand, Sergey Senozhatsky

On (01/21/15 15:14), Minchan Kim wrote:
> We don't need to call zram_meta_free under init_lock.
> What we need to prevent race is setting NULL into zram->meta
> (ie, init_done). This patch does it.
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  drivers/block/zram/zram_drv.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 9250b3f..7e03d86 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -719,6 +719,8 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  	}
>  
>  	meta = zram->meta;
> +	zram->meta = NULL;
> +
>  	/* Free all pages that are still in this zram device */
>  	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
>  		unsigned long handle = meta->table[index].handle;
> @@ -731,8 +733,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  	zcomp_destroy(zram->comp);
>  	zram->max_comp_streams = 1;
>  
> -	zram_meta_free(zram->meta);
> -	zram->meta = NULL;
>  	/* Reset stats */
>  	memset(&zram->stats, 0, sizeof(zram->stats));
>  
> @@ -741,6 +741,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  		set_capacity(zram->disk, 0);
>  
>  	up_write(&zram->init_lock);
> +	zram_meta_free(meta);

Hello,

since we detached ->meta from zram, this one doesn't really need
->init_lock protection:

	/* Free all pages that are still in this zram device */
	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
		unsigned long handle = meta->table[index].handle;
		if (!handle)
			continue;

		zs_free(meta->mem_pool, handle);
	}


	-ss

>  	/*
>  	 * Revalidate disk out of the init_lock to avoid lockdep splat.
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 01/10] zram: avoid calling of zram_meta_free under init_lock
  2015-01-21 14:21   ` Sergey Senozhatsky
@ 2015-01-23  1:03     ` Minchan Kim
  2015-01-23  1:15       ` Minchan Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2015-01-23  1:03 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-kernel, linux-mm, Dan Streetman,
	Seth Jennings, Nitin Gupta, Juneho Choi, Gunho Lee,
	Luigi Semenzato, Jerome Marchand

Hello,

On Wed, Jan 21, 2015 at 11:21:53PM +0900, Sergey Senozhatsky wrote:
> On (01/21/15 15:14), Minchan Kim wrote:
> > We don't need to call zram_meta_free under init_lock.
> > What we need to prevent race is setting NULL into zram->meta
> > (ie, init_done). This patch does it.
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  drivers/block/zram/zram_drv.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index 9250b3f..7e03d86 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -719,6 +719,8 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  	}
> >  
> >  	meta = zram->meta;
> > +	zram->meta = NULL;
> > +
> >  	/* Free all pages that are still in this zram device */
> >  	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
> >  		unsigned long handle = meta->table[index].handle;
> > @@ -731,8 +733,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  	zcomp_destroy(zram->comp);
> >  	zram->max_comp_streams = 1;
> >  
> > -	zram_meta_free(zram->meta);
> > -	zram->meta = NULL;
> >  	/* Reset stats */
> >  	memset(&zram->stats, 0, sizeof(zram->stats));
> >  
> > @@ -741,6 +741,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  		set_capacity(zram->disk, 0);
> >  
> >  	up_write(&zram->init_lock);
> > +	zram_meta_free(meta);
> 
> Hello,
> 
> since we detached ->meta from zram, this one doesn't really need
> ->init_lock protection:
> 
> 	/* Free all pages that are still in this zram device */
> 	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
> 		unsigned long handle = meta->table[index].handle;
> 		if (!handle)
> 			continue;
> 
> 		zs_free(meta->mem_pool, handle);
> 	}
> 
> 
> 	-ss

Good catch.

As well, we could move zcomp_destroy and memset(&zram->stats)
out of the lock but zram_rw_page, ZRAM_ATTR_RO, disksize_show
and orig_data_size_show have a race bug which access stats
out of the lock so that it could show the stale vaule.
Although it's not a significant, there is no reason to hesitate the fix. :)

I will fix it. Thanks!


> 
> >  	/*
> >  	 * Revalidate disk out of the init_lock to avoid lockdep splat.
> > -- 
> > 1.9.3
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 01/10] zram: avoid calling of zram_meta_free under init_lock
  2015-01-23  1:03     ` Minchan Kim
@ 2015-01-23  1:15       ` Minchan Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-23  1:15 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-kernel, linux-mm, Dan Streetman,
	Seth Jennings, Nitin Gupta, Juneho Choi, Gunho Lee,
	Luigi Semenzato, Jerome Marchand

On Fri, Jan 23, 2015 at 10:03:36AM +0900, Minchan Kim wrote:
> Hello,
> 
> On Wed, Jan 21, 2015 at 11:21:53PM +0900, Sergey Senozhatsky wrote:
> > On (01/21/15 15:14), Minchan Kim wrote:
> > > We don't need to call zram_meta_free under init_lock.
> > > What we need to prevent race is setting NULL into zram->meta
> > > (ie, init_done). This patch does it.
> > > 
> > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > ---
> > >  drivers/block/zram/zram_drv.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > > index 9250b3f..7e03d86 100644
> > > --- a/drivers/block/zram/zram_drv.c
> > > +++ b/drivers/block/zram/zram_drv.c
> > > @@ -719,6 +719,8 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> > >  	}
> > >  
> > >  	meta = zram->meta;
> > > +	zram->meta = NULL;
> > > +
> > >  	/* Free all pages that are still in this zram device */
> > >  	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
> > >  		unsigned long handle = meta->table[index].handle;
> > > @@ -731,8 +733,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> > >  	zcomp_destroy(zram->comp);
> > >  	zram->max_comp_streams = 1;
> > >  
> > > -	zram_meta_free(zram->meta);
> > > -	zram->meta = NULL;
> > >  	/* Reset stats */
> > >  	memset(&zram->stats, 0, sizeof(zram->stats));
> > >  
> > > @@ -741,6 +741,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> > >  		set_capacity(zram->disk, 0);
> > >  
> > >  	up_write(&zram->init_lock);
> > > +	zram_meta_free(meta);
> > 
> > Hello,
> > 
> > since we detached ->meta from zram, this one doesn't really need
> > ->init_lock protection:
> > 
> > 	/* Free all pages that are still in this zram device */
> > 	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
> > 		unsigned long handle = meta->table[index].handle;
> > 		if (!handle)
> > 			continue;
> > 
> > 		zs_free(meta->mem_pool, handle);
> > 	}
> > 
> > 
> > 	-ss
> 
> Good catch.
> 
> As well, we could move zcomp_destroy and memset(&zram->stats)
> out of the lock but zram_rw_page, ZRAM_ATTR_RO, disksize_show
> and orig_data_size_show have a race bug which access stats
> out of the lock so that it could show the stale vaule.
> Although it's not a significant, there is no reason to hesitate the fix. :)

Argh, sent wrong version.
zram->stats should be protected by the lock but other stat show
functions don't hold a lock so it's racy.

> 
> I will fix it. Thanks!
> 
> 
> > 
> > >  	/*
> > >  	 * Revalidate disk out of the init_lock to avoid lockdep splat.
> > > -- 
> > > 1.9.3
> > > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 02/10] zsmalloc: decouple handle and object
  2015-01-21  6:14 ` [PATCH v1 02/10] zsmalloc: decouple handle and object Minchan Kim
@ 2015-01-26  2:53   ` Ganesh Mahendran
  2015-01-27  3:27     ` Minchan Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Ganesh Mahendran @ 2015-01-26  2:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, Linux-MM, Dan Streetman,
	Seth Jennings, Nitin Gupta, Juneho Choi, Gunho Lee,
	Luigi Semenzato, Jerome Marchand, Sergey Senozhatsky

Hello, Minchan

2015-01-21 14:14 GMT+08:00 Minchan Kim <minchan@kernel.org>:
> Currently, zram's handle encodes object's location directly so
> it makes hard to support migration/compaction.
>
> This patch decouples handle and object via adding indirect layer.
> For it, it allocates handle dynamically and returns it to user.
> The handle is the address allocated by slab allocation so it's
> unique and the memory allocated keeps object's position so that
> we can get object's position from derefercing handle.
>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/zsmalloc.c | 90 ++++++++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 68 insertions(+), 22 deletions(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 0dec1fa..9436ee8 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -110,6 +110,8 @@
>  #define ZS_MAX_ZSPAGE_ORDER 2
>  #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
>
> +#define ZS_HANDLE_SIZE (sizeof(unsigned long))
> +
>  /*
>   * Object location (<PFN>, <obj_idx>) is encoded as
>   * as single (unsigned long) handle value.
> @@ -241,6 +243,7 @@ struct zs_pool {
>         char *name;
>
>         struct size_class **size_class;
> +       struct kmem_cache *handle_cachep;
>
>         gfp_t flags;    /* allocation flags used when growing pool */
>         atomic_long_t pages_allocated;
> @@ -269,6 +272,34 @@ struct mapping_area {
>         enum zs_mapmode vm_mm; /* mapping mode */
>  };
>
> +static int create_handle_cache(struct zs_pool *pool)
> +{
> +       pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,
> +                                       0, 0, NULL);
> +       return pool->handle_cachep ? 0 : 1;
> +}
> +
> +static void destroy_handle_cache(struct zs_pool *pool)
> +{
> +       kmem_cache_destroy(pool->handle_cachep);
> +}
> +
> +static unsigned long alloc_handle(struct zs_pool *pool)
> +{
> +       return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
> +               pool->flags & ~__GFP_HIGHMEM);
> +}
> +
> +static void free_handle(struct zs_pool *pool, unsigned long handle)
> +{
> +       kmem_cache_free(pool->handle_cachep, (void *)handle);
> +}
> +
> +static void record_obj(unsigned long handle, unsigned long obj)
> +{
> +       *(unsigned long *)handle = obj;
> +}
> +
>  /* zpool driver */
>
>  #ifdef CONFIG_ZPOOL
> @@ -595,13 +626,18 @@ static void *obj_location_to_handle(struct page *page, unsigned long obj_idx)
>   * decoded obj_idx back to its original value since it was adjusted in
>   * obj_location_to_handle().
>   */
> -static void obj_handle_to_location(unsigned long handle, struct page **page,
> +static void obj_to_location(unsigned long handle, struct page **page,
>                                 unsigned long *obj_idx)
>  {
>         *page = pfn_to_page(handle >> OBJ_INDEX_BITS);
>         *obj_idx = (handle & OBJ_INDEX_MASK) - 1;
>  }
>
> +static unsigned long handle_to_obj(unsigned long handle)
> +{
> +       return *(unsigned long *)handle;
> +}
> +
>  static unsigned long obj_idx_to_offset(struct page *page,
>                                 unsigned long obj_idx, int class_size)
>  {
> @@ -1153,7 +1189,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
>                         enum zs_mapmode mm)
>  {
>         struct page *page;
> -       unsigned long obj_idx, off;
> +       unsigned long obj, obj_idx, off;
>
>         unsigned int class_idx;
>         enum fullness_group fg;
> @@ -1170,7 +1206,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
>          */
>         BUG_ON(in_interrupt());
>
> -       obj_handle_to_location(handle, &page, &obj_idx);
> +       obj = handle_to_obj(handle);
> +       obj_to_location(obj, &page, &obj_idx);
>         get_zspage_mapping(get_first_page(page), &class_idx, &fg);
>         class = pool->size_class[class_idx];
>         off = obj_idx_to_offset(page, obj_idx, class->size);
> @@ -1195,7 +1232,7 @@ EXPORT_SYMBOL_GPL(zs_map_object);
>  void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
>  {
>         struct page *page;
> -       unsigned long obj_idx, off;
> +       unsigned long obj, obj_idx, off;
>
>         unsigned int class_idx;
>         enum fullness_group fg;
> @@ -1204,7 +1241,8 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
>
>         BUG_ON(!handle);
>
> -       obj_handle_to_location(handle, &page, &obj_idx);
> +       obj = handle_to_obj(handle);
> +       obj_to_location(obj, &page, &obj_idx);
>         get_zspage_mapping(get_first_page(page), &class_idx, &fg);
>         class = pool->size_class[class_idx];
>         off = obj_idx_to_offset(page, obj_idx, class->size);
> @@ -1236,7 +1274,7 @@ EXPORT_SYMBOL_GPL(zs_unmap_object);
>   */
>  unsigned long zs_malloc(struct zs_pool *pool, size_t size)
>  {
> -       unsigned long obj;
> +       unsigned long handle, obj;
>         struct link_free *link;
>         struct size_class *class;
>         void *vaddr;
> @@ -1247,6 +1285,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
>         if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
>                 return 0;
>
> +       handle = alloc_handle(pool);
> +       if (!handle)
> +               return 0;
> +
>         class = pool->size_class[get_size_class_index(size)];
>
>         spin_lock(&class->lock);
> @@ -1255,8 +1297,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
>         if (!first_page) {
>                 spin_unlock(&class->lock);
>                 first_page = alloc_zspage(class, pool->flags);
> -               if (unlikely(!first_page))
> +               if (unlikely(!first_page)) {
> +                       free_handle(pool, handle);
>                         return 0;
> +               }
>
>                 set_zspage_mapping(first_page, class->index, ZS_EMPTY);
>                 atomic_long_add(class->pages_per_zspage,
> @@ -1268,7 +1312,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
>         }
>
>         obj = (unsigned long)first_page->freelist;
> -       obj_handle_to_location(obj, &m_page, &m_objidx);
> +       obj_to_location(obj, &m_page, &m_objidx);
>         m_offset = obj_idx_to_offset(m_page, m_objidx, class->size);
>
>         vaddr = kmap_atomic(m_page);
> @@ -1281,27 +1325,30 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
>         zs_stat_inc(class, OBJ_USED, 1);
>         /* Now move the zspage to another fullness group, if required */
>         fix_fullness_group(pool, first_page);
> +       record_obj(handle, obj);
>         spin_unlock(&class->lock);
>
> -       return obj;
> +       return handle;
>  }
>  EXPORT_SYMBOL_GPL(zs_malloc);
>
> -void zs_free(struct zs_pool *pool, unsigned long obj)
> +void zs_free(struct zs_pool *pool, unsigned long handle)
>  {
>         struct link_free *link;
>         struct page *first_page, *f_page;
> -       unsigned long f_objidx, f_offset;
> +       unsigned long obj, f_objidx, f_offset;
>         void *vaddr;
>
>         int class_idx;
>         struct size_class *class;
>         enum fullness_group fullness;
>
> -       if (unlikely(!obj))
> +       if (unlikely(!handle))
>                 return;
>
> -       obj_handle_to_location(obj, &f_page, &f_objidx);
> +       obj = handle_to_obj(handle);
> +       free_handle(pool, handle);
> +       obj_to_location(obj, &f_page, &f_objidx);
>         first_page = get_first_page(f_page);
>
>         get_zspage_mapping(first_page, &class_idx, &fullness);
> @@ -1356,18 +1403,16 @@ struct zs_pool *zs_create_pool(char *name, gfp_t flags)
>                 return NULL;
>
>         pool->name = kstrdup(name, GFP_KERNEL);
> -       if (!pool->name) {
> -               kfree(pool);
> -               return NULL;
> -       }
> +       if (!pool->name)
> +               goto err;

We can not goto err here. Since in zs_destroy_pool(), the
pool->size_class[x] will
be touched. But it has not been allocated yet.

> +
> +       if (create_handle_cache(pool))
> +               goto err;
>
>         pool->size_class = kcalloc(zs_size_classes, sizeof(struct size_class *),
>                         GFP_KERNEL);
> -       if (!pool->size_class) {
> -               kfree(pool->name);
> -               kfree(pool);
> -               return NULL;
> -       }
> +       if (!pool->size_class)
> +               goto err;
>
>         /*
>          * Iterate reversly, because, size of size_class that we want to use
> @@ -1450,6 +1495,7 @@ void zs_destroy_pool(struct zs_pool *pool)
>                 kfree(class);
>         }
>
> +       destroy_handle_cache(pool);
>         kfree(pool->size_class);
>         kfree(pool->name);
>         kfree(pool);
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 02/10] zsmalloc: decouple handle and object
  2015-01-26  2:53   ` Ganesh Mahendran
@ 2015-01-27  3:27     ` Minchan Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2015-01-27  3:27 UTC (permalink / raw)
  To: Ganesh Mahendran
  Cc: Andrew Morton, linux-kernel, Linux-MM, Dan Streetman,
	Seth Jennings, Nitin Gupta, Juneho Choi, Gunho Lee,
	Luigi Semenzato, Jerome Marchand, Sergey Senozhatsky

Hello Ganesh,

On Mon, Jan 26, 2015 at 10:53:57AM +0800, Ganesh Mahendran wrote:
> Hello, Minchan
> 
> 2015-01-21 14:14 GMT+08:00 Minchan Kim <minchan@kernel.org>:
> > Currently, zram's handle encodes object's location directly so
> > it makes hard to support migration/compaction.
> >
> > This patch decouples handle and object via adding indirect layer.
> > For it, it allocates handle dynamically and returns it to user.
> > The handle is the address allocated by slab allocation so it's
> > unique and the memory allocated keeps object's position so that
> > we can get object's position from derefercing handle.
> >
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  mm/zsmalloc.c | 90 ++++++++++++++++++++++++++++++++++++++++++++---------------
> >  1 file changed, 68 insertions(+), 22 deletions(-)
> >
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index 0dec1fa..9436ee8 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -110,6 +110,8 @@
> >  #define ZS_MAX_ZSPAGE_ORDER 2
> >  #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
> >
> > +#define ZS_HANDLE_SIZE (sizeof(unsigned long))
> > +
> >  /*
> >   * Object location (<PFN>, <obj_idx>) is encoded as
> >   * as single (unsigned long) handle value.
> > @@ -241,6 +243,7 @@ struct zs_pool {
> >         char *name;
> >
> >         struct size_class **size_class;
> > +       struct kmem_cache *handle_cachep;
> >
> >         gfp_t flags;    /* allocation flags used when growing pool */
> >         atomic_long_t pages_allocated;
> > @@ -269,6 +272,34 @@ struct mapping_area {
> >         enum zs_mapmode vm_mm; /* mapping mode */
> >  };
> >
> > +static int create_handle_cache(struct zs_pool *pool)
> > +{
> > +       pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,
> > +                                       0, 0, NULL);
> > +       return pool->handle_cachep ? 0 : 1;
> > +}
> > +
> > +static void destroy_handle_cache(struct zs_pool *pool)
> > +{
> > +       kmem_cache_destroy(pool->handle_cachep);
> > +}
> > +
> > +static unsigned long alloc_handle(struct zs_pool *pool)
> > +{
> > +       return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
> > +               pool->flags & ~__GFP_HIGHMEM);
> > +}
> > +
> > +static void free_handle(struct zs_pool *pool, unsigned long handle)
> > +{
> > +       kmem_cache_free(pool->handle_cachep, (void *)handle);
> > +}
> > +
> > +static void record_obj(unsigned long handle, unsigned long obj)
> > +{
> > +       *(unsigned long *)handle = obj;
> > +}
> > +
> >  /* zpool driver */
> >
> >  #ifdef CONFIG_ZPOOL
> > @@ -595,13 +626,18 @@ static void *obj_location_to_handle(struct page *page, unsigned long obj_idx)
> >   * decoded obj_idx back to its original value since it was adjusted in
> >   * obj_location_to_handle().
> >   */
> > -static void obj_handle_to_location(unsigned long handle, struct page **page,
> > +static void obj_to_location(unsigned long handle, struct page **page,
> >                                 unsigned long *obj_idx)
> >  {
> >         *page = pfn_to_page(handle >> OBJ_INDEX_BITS);
> >         *obj_idx = (handle & OBJ_INDEX_MASK) - 1;
> >  }
> >
> > +static unsigned long handle_to_obj(unsigned long handle)
> > +{
> > +       return *(unsigned long *)handle;
> > +}
> > +
> >  static unsigned long obj_idx_to_offset(struct page *page,
> >                                 unsigned long obj_idx, int class_size)
> >  {
> > @@ -1153,7 +1189,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> >                         enum zs_mapmode mm)
> >  {
> >         struct page *page;
> > -       unsigned long obj_idx, off;
> > +       unsigned long obj, obj_idx, off;
> >
> >         unsigned int class_idx;
> >         enum fullness_group fg;
> > @@ -1170,7 +1206,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> >          */
> >         BUG_ON(in_interrupt());
> >
> > -       obj_handle_to_location(handle, &page, &obj_idx);
> > +       obj = handle_to_obj(handle);
> > +       obj_to_location(obj, &page, &obj_idx);
> >         get_zspage_mapping(get_first_page(page), &class_idx, &fg);
> >         class = pool->size_class[class_idx];
> >         off = obj_idx_to_offset(page, obj_idx, class->size);
> > @@ -1195,7 +1232,7 @@ EXPORT_SYMBOL_GPL(zs_map_object);
> >  void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
> >  {
> >         struct page *page;
> > -       unsigned long obj_idx, off;
> > +       unsigned long obj, obj_idx, off;
> >
> >         unsigned int class_idx;
> >         enum fullness_group fg;
> > @@ -1204,7 +1241,8 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
> >
> >         BUG_ON(!handle);
> >
> > -       obj_handle_to_location(handle, &page, &obj_idx);
> > +       obj = handle_to_obj(handle);
> > +       obj_to_location(obj, &page, &obj_idx);
> >         get_zspage_mapping(get_first_page(page), &class_idx, &fg);
> >         class = pool->size_class[class_idx];
> >         off = obj_idx_to_offset(page, obj_idx, class->size);
> > @@ -1236,7 +1274,7 @@ EXPORT_SYMBOL_GPL(zs_unmap_object);
> >   */
> >  unsigned long zs_malloc(struct zs_pool *pool, size_t size)
> >  {
> > -       unsigned long obj;
> > +       unsigned long handle, obj;
> >         struct link_free *link;
> >         struct size_class *class;
> >         void *vaddr;
> > @@ -1247,6 +1285,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
> >         if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
> >                 return 0;
> >
> > +       handle = alloc_handle(pool);
> > +       if (!handle)
> > +               return 0;
> > +
> >         class = pool->size_class[get_size_class_index(size)];
> >
> >         spin_lock(&class->lock);
> > @@ -1255,8 +1297,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
> >         if (!first_page) {
> >                 spin_unlock(&class->lock);
> >                 first_page = alloc_zspage(class, pool->flags);
> > -               if (unlikely(!first_page))
> > +               if (unlikely(!first_page)) {
> > +                       free_handle(pool, handle);
> >                         return 0;
> > +               }
> >
> >                 set_zspage_mapping(first_page, class->index, ZS_EMPTY);
> >                 atomic_long_add(class->pages_per_zspage,
> > @@ -1268,7 +1312,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
> >         }
> >
> >         obj = (unsigned long)first_page->freelist;
> > -       obj_handle_to_location(obj, &m_page, &m_objidx);
> > +       obj_to_location(obj, &m_page, &m_objidx);
> >         m_offset = obj_idx_to_offset(m_page, m_objidx, class->size);
> >
> >         vaddr = kmap_atomic(m_page);
> > @@ -1281,27 +1325,30 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
> >         zs_stat_inc(class, OBJ_USED, 1);
> >         /* Now move the zspage to another fullness group, if required */
> >         fix_fullness_group(pool, first_page);
> > +       record_obj(handle, obj);
> >         spin_unlock(&class->lock);
> >
> > -       return obj;
> > +       return handle;
> >  }
> >  EXPORT_SYMBOL_GPL(zs_malloc);
> >
> > -void zs_free(struct zs_pool *pool, unsigned long obj)
> > +void zs_free(struct zs_pool *pool, unsigned long handle)
> >  {
> >         struct link_free *link;
> >         struct page *first_page, *f_page;
> > -       unsigned long f_objidx, f_offset;
> > +       unsigned long obj, f_objidx, f_offset;
> >         void *vaddr;
> >
> >         int class_idx;
> >         struct size_class *class;
> >         enum fullness_group fullness;
> >
> > -       if (unlikely(!obj))
> > +       if (unlikely(!handle))
> >                 return;
> >
> > -       obj_handle_to_location(obj, &f_page, &f_objidx);
> > +       obj = handle_to_obj(handle);
> > +       free_handle(pool, handle);
> > +       obj_to_location(obj, &f_page, &f_objidx);
> >         first_page = get_first_page(f_page);
> >
> >         get_zspage_mapping(first_page, &class_idx, &fullness);
> > @@ -1356,18 +1403,16 @@ struct zs_pool *zs_create_pool(char *name, gfp_t flags)
> >                 return NULL;
> >
> >         pool->name = kstrdup(name, GFP_KERNEL);
> > -       if (!pool->name) {
> > -               kfree(pool);
> > -               return NULL;
> > -       }
> > +       if (!pool->name)
> > +               goto err;
> 
> We can not goto err here. Since in zs_destroy_pool(), the
> pool->size_class[x] will
> be touched. But it has not been allocated yet.

First of all, I should sorry about that. I forgot you had an interest
on this work. I should have Cced you. I will in respin.

And thanks for the review.
I will fix it.

Thanks!

> 
> > +
> > +       if (create_handle_cache(pool))
> > +               goto err;
> >
> >         pool->size_class = kcalloc(zs_size_classes, sizeof(struct size_class *),
> >                         GFP_KERNEL);
> > -       if (!pool->size_class) {
> > -               kfree(pool->name);
> > -               kfree(pool);
> > -               return NULL;
> > -       }
> > +       if (!pool->size_class)
> > +               goto err;
> >
> >         /*
> >          * Iterate reversly, because, size of size_class that we want to use
> > @@ -1450,6 +1495,7 @@ void zs_destroy_pool(struct zs_pool *pool)
> >                 kfree(class);
> >         }
> >
> > +       destroy_handle_cache(pool);
> >         kfree(pool->size_class);
> >         kfree(pool->name);
> >         kfree(pool);
> > --
> > 1.9.3
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-01-27  3:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-21  6:14 [PATCH v1 00/10] zsmalloc compaction support Minchan Kim
2015-01-21  6:14 ` [PATCH v1 01/10] zram: avoid calling of zram_meta_free under init_lock Minchan Kim
2015-01-21 14:21   ` Sergey Senozhatsky
2015-01-23  1:03     ` Minchan Kim
2015-01-23  1:15       ` Minchan Kim
2015-01-21  6:14 ` [PATCH v1 02/10] zsmalloc: decouple handle and object Minchan Kim
2015-01-26  2:53   ` Ganesh Mahendran
2015-01-27  3:27     ` Minchan Kim
2015-01-21  6:14 ` [PATCH v1 03/10] zsmalloc: implement reverse mapping Minchan Kim
2015-01-21  6:14 ` [PATCH v1 04/10] zsmalloc: factor out obj_[malloc|free] Minchan Kim
2015-01-21  6:14 ` [PATCH v1 05/10] zsmalloc: add status bit Minchan Kim
2015-01-21  6:14 ` [PATCH v1 06/10] zsmalloc: support compaction Minchan Kim
2015-01-21  6:14 ` [PATCH v1 07/10] zsmalloc: adjust ZS_ALMOST_FULL Minchan Kim
2015-01-21  6:14 ` [PATCH v1 08/10] zram: support compaction Minchan Kim
2015-01-21  6:14 ` [PATCH v1 09/10] zsmalloc: add fullness into stat Minchan Kim
2015-01-21  6:14 ` [PATCH v1 10/10] zsmalloc: record handle in page->private for huge object Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).