LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/4] fix THP and memcg issues v3
@ 2011-01-18 2:35 KAMEZAWA Hiroyuki
2011-01-18 2:36 ` [PATCH 1/4] memcg: modify accounting function for supporting THP better KAMEZAWA Hiroyuki
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-18 2:35 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, akpm, nishimura, balbir, hannes
I found PCG_ACCT_LRU is copied at splitting in patch 2/4 but it will cause
VM_BUG_ON(). The fix for it was in patch 3/4...This set is a corrected one.
==
Now, when THP is enabled, memcg's counter goes wrong. Moreover, rmdir()
may not end. I fixed some races since v1.
This series is a fix for obviouse counter breakage. When you test,
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
is appreciated. Tests should be done is:
# mount -t cgroup none /cgroup/memory -omemory
# mkdir /cgroup/memory/A
# mkdir /cgroup/memory/A/B
# run some programs under B.
# echo 0 > /cgroup/memory/A/B/memory.force_empty
and check B's memory.stat shows RSS/CACHE/LRU are all 0.
Moving tasks while running is another good test.
I know there are another problem when memory cgroup hits limit and
reclaim in busy. But I will fix it in another patch.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] memcg: modify accounting function for supporting THP better
2011-01-18 2:35 [PATCH 0/4] fix THP and memcg issues v3 KAMEZAWA Hiroyuki
@ 2011-01-18 2:36 ` KAMEZAWA Hiroyuki
2011-01-18 2:40 ` [PATCH 2/4] memcg: fix USED bit handling at uncharge in THP KAMEZAWA Hiroyuki
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-18 2:36 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, akpm, nishimura, balbir, hannes
mem_cgroup_charge_statisics() was designed for charging a page but
now, we have transparent hugepage. To fix problems (in following patch)
it's required to change the function to get the number of pages
as its arguments.
The new function gets following as argument.
- type of page rather than 'pc'
- size of page which is accounted.
Changelog:
- use usual variable names for args.
- removed unnecessary variable.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
Index: mmotm-0107/mm/memcontrol.c
===================================================================
--- mmotm-0107.orig/mm/memcontrol.c
+++ mmotm-0107/mm/memcontrol.c
@@ -600,23 +600,22 @@ static void mem_cgroup_swap_statistics(s
}
static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
- struct page_cgroup *pc,
- bool charge)
+ bool file, int nr_pages)
{
- int val = (charge) ? 1 : -1;
-
preempt_disable();
- if (PageCgroupCache(pc))
- __this_cpu_add(mem->stat->count[MEM_CGROUP_STAT_CACHE], val);
+ if (file)
+ __this_cpu_add(mem->stat->count[MEM_CGROUP_STAT_CACHE], nr_pages);
else
- __this_cpu_add(mem->stat->count[MEM_CGROUP_STAT_RSS], val);
+ __this_cpu_add(mem->stat->count[MEM_CGROUP_STAT_RSS], nr_pages);
- if (charge)
+ /* pagein of a big page is an event. So, ignore page size */
+ if (nr_pages > 0)
__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGIN_COUNT]);
else
__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_PGPGOUT_COUNT]);
- __this_cpu_inc(mem->stat->count[MEM_CGROUP_EVENTS]);
+
+ __this_cpu_add(mem->stat->count[MEM_CGROUP_EVENTS], nr_pages);
preempt_enable();
}
@@ -2115,7 +2114,7 @@ static void ____mem_cgroup_commit_charge
break;
}
- mem_cgroup_charge_statistics(mem, pc, true);
+ mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), 1);
}
static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
@@ -2186,14 +2185,14 @@ static void __mem_cgroup_move_account(st
__this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
preempt_enable();
}
- mem_cgroup_charge_statistics(from, pc, false);
+ mem_cgroup_charge_statistics(from, PageCgroupCache(pc), -1);
if (uncharge)
/* This is not "cancel", but cancel_charge does all we need. */
mem_cgroup_cancel_charge(from, PAGE_SIZE);
/* caller should have done css_get */
pc->mem_cgroup = to;
- mem_cgroup_charge_statistics(to, pc, true);
+ mem_cgroup_charge_statistics(to, PageCgroupCache(pc), 1);
/*
* We charges against "to" which may not have any tasks. Then, "to"
* can be under rmdir(). But in current implementation, caller of
@@ -2597,7 +2596,7 @@ __mem_cgroup_uncharge_common(struct page
}
for (i = 0; i < count; i++)
- mem_cgroup_charge_statistics(mem, pc + i, false);
+ mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), -1);
ClearPageCgroupUsed(pc);
/*
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/4] memcg: fix USED bit handling at uncharge in THP
2011-01-18 2:35 [PATCH 0/4] fix THP and memcg issues v3 KAMEZAWA Hiroyuki
2011-01-18 2:36 ` [PATCH 1/4] memcg: modify accounting function for supporting THP better KAMEZAWA Hiroyuki
@ 2011-01-18 2:40 ` KAMEZAWA Hiroyuki
2011-01-19 12:10 ` Johannes Weiner
2011-01-18 2:42 ` [PATCH 3/4] memcg: fix LRU accounting with THP KAMEZAWA Hiroyuki
2011-01-18 2:43 ` [PATCH 4/4] memcg: fix rmdir, force_empty " KAMEZAWA Hiroyuki
3 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-18 2:40 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, akpm, nishimura, balbir, hannes
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Now, under THP:
at charge:
- PageCgroupUsed bit is set to all page_cgroup on a hugepage.
....set to 512 pages.
at uncharge
- PageCgroupUsed bit is unset on the head page.
So, some pages will remain with "Used" bit.
This patch fixes that Used bit is set only to the head page.
Used bits for tail pages will be set at splitting if necessary.
This patch adds this lock order:
compound_lock() -> page_cgroup_move_lock().
Changelog: v2->v3
- added "don't copy PCG_ACCT_LRU"
Changelog:
- fixed some styles.
- removed BUG_ON()
- fixed mem_cgroup_update_page_stat() v.s. splitting races.
To avoid races, PCG_MOVE_LOCK is used.
- copy all required flags at splitting. (ACCT_LRU etc..should be copied)
- add a dummy function for compile.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/memcontrol.h | 8 +++
mm/huge_memory.c | 2
mm/memcontrol.c | 91 +++++++++++++++++++++++++--------------------
3 files changed, 61 insertions(+), 40 deletions(-)
Index: mmotm-0107/mm/memcontrol.c
===================================================================
--- mmotm-0107.orig/mm/memcontrol.c
+++ mmotm-0107/mm/memcontrol.c
@@ -1614,7 +1614,7 @@ void mem_cgroup_update_page_stat(struct
if (unlikely(!mem || !PageCgroupUsed(pc)))
goto out;
/* pc->mem_cgroup is unstable ? */
- if (unlikely(mem_cgroup_stealed(mem))) {
+ if (unlikely(mem_cgroup_stealed(mem)) || PageTransHuge(page)) {
/* take a lock against to access pc->mem_cgroup */
move_lock_page_cgroup(pc, &flags);
need_unlock = true;
@@ -2083,14 +2083,27 @@ struct mem_cgroup *try_get_mem_cgroup_fr
return mem;
}
-/*
- * commit a charge got by __mem_cgroup_try_charge() and makes page_cgroup to be
- * USED state. If already USED, uncharge and return.
- */
-static void ____mem_cgroup_commit_charge(struct mem_cgroup *mem,
- struct page_cgroup *pc,
- enum charge_type ctype)
+static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
+ struct page_cgroup *pc,
+ enum charge_type ctype,
+ int page_size)
{
+ int nr_pages = page_size >> PAGE_SHIFT;
+
+ /* try_charge() can return NULL to *memcg, taking care of it. */
+ if (!mem)
+ return;
+
+ lock_page_cgroup(pc);
+ if (unlikely(PageCgroupUsed(pc))) {
+ unlock_page_cgroup(pc);
+ mem_cgroup_cancel_charge(mem, page_size);
+ return;
+ }
+ /*
+ * we don't need page_cgroup_lock about tail pages, becase they are not
+ * accessed by any other context at this point.
+ */
pc->mem_cgroup = mem;
/*
* We access a page_cgroup asynchronously without lock_page_cgroup().
@@ -2114,35 +2127,7 @@ static void ____mem_cgroup_commit_charge
break;
}
- mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), 1);
-}
-
-static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
- struct page_cgroup *pc,
- enum charge_type ctype,
- int page_size)
-{
- int i;
- int count = page_size >> PAGE_SHIFT;
-
- /* try_charge() can return NULL to *memcg, taking care of it. */
- if (!mem)
- return;
-
- lock_page_cgroup(pc);
- if (unlikely(PageCgroupUsed(pc))) {
- unlock_page_cgroup(pc);
- mem_cgroup_cancel_charge(mem, page_size);
- return;
- }
-
- /*
- * we don't need page_cgroup_lock about tail pages, becase they are not
- * accessed by any other context at this point.
- */
- for (i = 0; i < count; i++)
- ____mem_cgroup_commit_charge(mem, pc + i, ctype);
-
+ mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), nr_pages);
unlock_page_cgroup(pc);
/*
* "charge_statistics" updated event counter. Then, check it.
@@ -2152,6 +2137,34 @@ static void __mem_cgroup_commit_charge(s
memcg_check_events(mem, pc->page);
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+
+#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MOVE_LOCK) |\
+ (1 << PCG_ACCT_LRU) | (1 << PCG_MIGRATION))
+/*
+ * Because tail pages are not marked as "used", set it. We're under
+ * zone->lru_lock, 'splitting on pmd' and compund_lock.
+ */
+void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
+{
+ struct page_cgroup *head_pc = lookup_page_cgroup(head);
+ struct page_cgroup *tail_pc = lookup_page_cgroup(tail);
+ unsigned long flags;
+
+ /*
+ * We have no races witch charge/uncharge but will have races with
+ * page state accounting.
+ */
+ move_lock_page_cgroup(head_pc, &flags);
+
+ tail_pc->mem_cgroup = head_pc->mem_cgroup;
+ smp_wmb(); /* see __commit_charge() */
+ /* we don't need to copy all flags...*/
+ tail_pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
+ move_unlock_page_cgroup(head_pc, &flags);
+}
+#endif
+
/**
* __mem_cgroup_move_account - move account of the page
* @pc: page_cgroup of the page.
@@ -2545,7 +2558,6 @@ direct_uncharge:
static struct mem_cgroup *
__mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
{
- int i;
int count;
struct page_cgroup *pc;
struct mem_cgroup *mem = NULL;
@@ -2595,8 +2607,7 @@ __mem_cgroup_uncharge_common(struct page
break;
}
- for (i = 0; i < count; i++)
- mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), -1);
+ mem_cgroup_charge_statistics(mem, PageCgroupCache(pc), -count);
ClearPageCgroupUsed(pc);
/*
Index: mmotm-0107/include/linux/memcontrol.h
===================================================================
--- mmotm-0107.orig/include/linux/memcontrol.h
+++ mmotm-0107/include/linux/memcontrol.h
@@ -146,6 +146,10 @@ unsigned long mem_cgroup_soft_limit_recl
gfp_t gfp_mask);
u64 mem_cgroup_get_limit(struct mem_cgroup *mem);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail);
+#endif
+
#else /* CONFIG_CGROUP_MEM_RES_CTLR */
struct mem_cgroup;
@@ -335,6 +339,10 @@ u64 mem_cgroup_get_limit(struct mem_cgro
return 0;
}
+static inline mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
+{
+}
+
#endif /* CONFIG_CGROUP_MEM_CONT */
#endif /* _LINUX_MEMCONTROL_H */
Index: mmotm-0107/mm/huge_memory.c
===================================================================
--- mmotm-0107.orig/mm/huge_memory.c
+++ mmotm-0107/mm/huge_memory.c
@@ -1203,6 +1203,8 @@ static void __split_huge_page_refcount(s
BUG_ON(!PageDirty(page_tail));
BUG_ON(!PageSwapBacked(page_tail));
+ mem_cgroup_split_huge_fixup(page, page_tail);
+
lru_add_page_tail(zone, page, page_tail);
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/4] memcg: fix LRU accounting with THP
2011-01-18 2:35 [PATCH 0/4] fix THP and memcg issues v3 KAMEZAWA Hiroyuki
2011-01-18 2:36 ` [PATCH 1/4] memcg: modify accounting function for supporting THP better KAMEZAWA Hiroyuki
2011-01-18 2:40 ` [PATCH 2/4] memcg: fix USED bit handling at uncharge in THP KAMEZAWA Hiroyuki
@ 2011-01-18 2:42 ` KAMEZAWA Hiroyuki
2011-01-18 2:43 ` [PATCH 4/4] memcg: fix rmdir, force_empty " KAMEZAWA Hiroyuki
3 siblings, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-18 2:42 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, akpm, nishimura, balbir, hannes
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
memory cgroup's LRU stat should take care of size of pages because
Transparent Hugepage inserts hugepage into LRU.
If this value is the number wrong, memory reclaim will not work well.
Note: only head page of THP's huge page is linked into LRU.
CHangelog: v2->v3
- removed PCG_ACCT_LRU fix.
Changelog:
- fixed to use compound_order() directly.
- added a counter fix for splitting.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
Index: mmotm-0107/mm/memcontrol.c
===================================================================
--- mmotm-0107.orig/mm/memcontrol.c
+++ mmotm-0107/mm/memcontrol.c
@@ -814,7 +814,8 @@ void mem_cgroup_del_lru_list(struct page
* removed from global LRU.
*/
mz = page_cgroup_zoneinfo(pc);
- MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+ /* huge page split is done under lru_lock. so, we have no races. */
+ MEM_CGROUP_ZSTAT(mz, lru) -= 1 << compound_order(page);
if (mem_cgroup_is_root(pc->mem_cgroup))
return;
VM_BUG_ON(list_empty(&pc->lru));
@@ -865,7 +866,8 @@ void mem_cgroup_add_lru_list(struct page
return;
mz = page_cgroup_zoneinfo(pc);
- MEM_CGROUP_ZSTAT(mz, lru) += 1;
+ /* huge page split is done under lru_lock. so, we have no races. */
+ MEM_CGROUP_ZSTAT(mz, lru) += 1 << compound_order(page);
SetPageCgroupAcctLRU(pc);
if (mem_cgroup_is_root(pc->mem_cgroup))
return;
@@ -2152,14 +2154,26 @@ void mem_cgroup_split_huge_fixup(struct
unsigned long flags;
/*
- * We have no races witch charge/uncharge but will have races with
+ * We have no races with charge/uncharge but will have races with
* page state accounting.
*/
move_lock_page_cgroup(head_pc, &flags);
tail_pc->mem_cgroup = head_pc->mem_cgroup;
smp_wmb(); /* see __commit_charge() */
- /* we don't need to copy all flags...*/
+ if (PageCgroupAcctLRU(head_pc)) {
+ enum lru_list lru;
+ struct mem_cgroup_per_zone *mz;
+
+ /*
+ * LRU flags cannot be copied because we need to add tail
+ *.page to LRU by generic call and our hook will be called.
+ * We hold lru_lock, then, reduce counter directly.
+ */
+ lru = page_lru(head);
+ mz = page_cgroup_zoneinfo(head_pc);
+ MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+ }
tail_pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
move_unlock_page_cgroup(head_pc, &flags);
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 4/4] memcg: fix rmdir, force_empty with THP
2011-01-18 2:35 [PATCH 0/4] fix THP and memcg issues v3 KAMEZAWA Hiroyuki
` (2 preceding siblings ...)
2011-01-18 2:42 ` [PATCH 3/4] memcg: fix LRU accounting with THP KAMEZAWA Hiroyuki
@ 2011-01-18 2:43 ` KAMEZAWA Hiroyuki
2011-01-20 13:41 ` Johannes Weiner
3 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-18 2:43 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, akpm, nishimura, balbir, hannes
Now, when THP is enabled, memcg's rmdir() function is broken
because move_account() for THP page is not supported.
This will cause account leak or -EBUSY issue at rmdir().
This patch fixes the issue by supporting move_account() THP pages.
Changelog:
- style fix.
- add compound_lock for avoiding races.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)
Index: mmotm-0107/mm/memcontrol.c
===================================================================
--- mmotm-0107.orig/mm/memcontrol.c
+++ mmotm-0107/mm/memcontrol.c
@@ -2197,8 +2197,11 @@ void mem_cgroup_split_huge_fixup(struct
*/
static void __mem_cgroup_move_account(struct page_cgroup *pc,
- struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge)
+ struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge,
+ int charge_size)
{
+ int nr_pages = charge_size >> PAGE_SHIFT;
+
VM_BUG_ON(from == to);
VM_BUG_ON(PageLRU(pc->page));
VM_BUG_ON(!page_is_cgroup_locked(pc));
@@ -2212,14 +2215,14 @@ static void __mem_cgroup_move_account(st
__this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
preempt_enable();
}
- mem_cgroup_charge_statistics(from, PageCgroupCache(pc), -1);
+ mem_cgroup_charge_statistics(from, PageCgroupCache(pc), -nr_pages);
if (uncharge)
/* This is not "cancel", but cancel_charge does all we need. */
- mem_cgroup_cancel_charge(from, PAGE_SIZE);
+ mem_cgroup_cancel_charge(from, charge_size);
/* caller should have done css_get */
pc->mem_cgroup = to;
- mem_cgroup_charge_statistics(to, PageCgroupCache(pc), 1);
+ mem_cgroup_charge_statistics(to, PageCgroupCache(pc), nr_pages);
/*
* We charges against "to" which may not have any tasks. Then, "to"
* can be under rmdir(). But in current implementation, caller of
@@ -2234,15 +2237,19 @@ static void __mem_cgroup_move_account(st
* __mem_cgroup_move_account()
*/
static int mem_cgroup_move_account(struct page_cgroup *pc,
- struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge)
+ struct mem_cgroup *from, struct mem_cgroup *to,
+ bool uncharge, int charge_size)
{
int ret = -EINVAL;
unsigned long flags;
+ if ((charge_size > PAGE_SIZE) && !PageTransHuge(pc->page))
+ return -EBUSY;
+
lock_page_cgroup(pc);
if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
move_lock_page_cgroup(pc, &flags);
- __mem_cgroup_move_account(pc, from, to, uncharge);
+ __mem_cgroup_move_account(pc, from, to, uncharge, charge_size);
move_unlock_page_cgroup(pc, &flags);
ret = 0;
}
@@ -2267,6 +2274,8 @@ static int mem_cgroup_move_parent(struct
struct cgroup *cg = child->css.cgroup;
struct cgroup *pcg = cg->parent;
struct mem_cgroup *parent;
+ int charge = PAGE_SIZE;
+ unsigned long flags;
int ret;
/* Is ROOT ? */
@@ -2278,17 +2287,23 @@ static int mem_cgroup_move_parent(struct
goto out;
if (isolate_lru_page(page))
goto put;
+ /* The page is isolated from LRU and we have no race with splitting */
+ charge = PAGE_SIZE << compound_order(page);
parent = mem_cgroup_from_cont(pcg);
- ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false,
- PAGE_SIZE);
+ ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, charge);
if (ret || !parent)
goto put_back;
- ret = mem_cgroup_move_account(pc, child, parent, true);
+ if (charge > PAGE_SIZE)
+ flags = compound_lock_irqsave(page);
+
+ ret = mem_cgroup_move_account(pc, child, parent, true, charge);
if (ret)
- mem_cgroup_cancel_charge(parent, PAGE_SIZE);
+ mem_cgroup_cancel_charge(parent, charge);
put_back:
+ if (charge > PAGE_SIZE)
+ compound_unlock_irqrestore(page, flags);
putback_lru_page(page);
put:
put_page(page);
@@ -4868,7 +4883,7 @@ retry:
goto put;
pc = lookup_page_cgroup(page);
if (!mem_cgroup_move_account(pc,
- mc.from, mc.to, false)) {
+ mc.from, mc.to, false, PAGE_SIZE)) {
mc.precharge--;
/* we uncharge from mc.from later. */
mc.moved_charge++;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/4] memcg: fix USED bit handling at uncharge in THP
2011-01-18 2:40 ` [PATCH 2/4] memcg: fix USED bit handling at uncharge in THP KAMEZAWA Hiroyuki
@ 2011-01-19 12:10 ` Johannes Weiner
2011-01-20 0:55 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2011-01-19 12:10 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, akpm, nishimura, balbir
Hello KAMEZAWA-san,
On Tue, Jan 18, 2011 at 11:40:49AM +0900, KAMEZAWA Hiroyuki wrote:
> +void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
> +{
> + struct page_cgroup *head_pc = lookup_page_cgroup(head);
> + struct page_cgroup *tail_pc = lookup_page_cgroup(tail);
> + unsigned long flags;
> +
> + /*
> + * We have no races witch charge/uncharge but will have races with
> + * page state accounting.
> + */
> + move_lock_page_cgroup(head_pc, &flags);
> +
> + tail_pc->mem_cgroup = head_pc->mem_cgroup;
> + smp_wmb(); /* see __commit_charge() */
I thought the barriers were needed because charging does not hold the
lru lock. But here we do, and all the 'lockless' read-sides do so as
well. Am I missing something or can this barrier be removed?
Hannes
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/4] memcg: fix USED bit handling at uncharge in THP
2011-01-19 12:10 ` Johannes Weiner
@ 2011-01-20 0:55 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-20 0:55 UTC (permalink / raw)
To: Johannes Weiner; +Cc: linux-mm, linux-kernel, akpm, nishimura, balbir
On Wed, 19 Jan 2011 13:10:43 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:
> Hello KAMEZAWA-san,
>
> On Tue, Jan 18, 2011 at 11:40:49AM +0900, KAMEZAWA Hiroyuki wrote:
> > +void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
> > +{
> > + struct page_cgroup *head_pc = lookup_page_cgroup(head);
> > + struct page_cgroup *tail_pc = lookup_page_cgroup(tail);
> > + unsigned long flags;
> > +
> > + /*
> > + * We have no races witch charge/uncharge but will have races with
> > + * page state accounting.
> > + */
> > + move_lock_page_cgroup(head_pc, &flags);
> > +
> > + tail_pc->mem_cgroup = head_pc->mem_cgroup;
> > + smp_wmb(); /* see __commit_charge() */
>
> I thought the barriers were needed because charging does not hold the
> lru lock. But here we do, and all the 'lockless' read-sides do so as
> well. Am I missing something or can this barrier be removed?
>
Hmm. I think this can be removed. But it's just because there is only one
referer of lockless access to pc->mem_cgroup. I think it's ok to remove
this but it should be done by independent patch with enough patch
clarification. IOW, I'll do later in another patch.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] memcg: fix rmdir, force_empty with THP
2011-01-18 2:43 ` [PATCH 4/4] memcg: fix rmdir, force_empty " KAMEZAWA Hiroyuki
@ 2011-01-20 13:41 ` Johannes Weiner
2011-01-20 23:39 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2011-01-20 13:41 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, akpm, nishimura, balbir
On Tue, Jan 18, 2011 at 11:43:48AM +0900, KAMEZAWA Hiroyuki wrote:
>
> Now, when THP is enabled, memcg's rmdir() function is broken
> because move_account() for THP page is not supported.
>
> This will cause account leak or -EBUSY issue at rmdir().
> This patch fixes the issue by supporting move_account() THP pages.
>
> Changelog:
> - style fix.
> - add compound_lock for avoiding races.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> mm/memcontrol.c | 37 ++++++++++++++++++++++++++-----------
> 1 file changed, 26 insertions(+), 11 deletions(-)
>
> Index: mmotm-0107/mm/memcontrol.c
> ===================================================================
> --- mmotm-0107.orig/mm/memcontrol.c
> +++ mmotm-0107/mm/memcontrol.c
> @@ -2267,6 +2274,8 @@ static int mem_cgroup_move_parent(struct
> struct cgroup *cg = child->css.cgroup;
> struct cgroup *pcg = cg->parent;
> struct mem_cgroup *parent;
> + int charge = PAGE_SIZE;
No need to initialize, you assign it unconditionally below.
It's also a bit unfortunate that the parameter/variable with this
meaning appears under a whole bunch of different names. page_size,
charge_size, and now charge. Could you stick with page_size?
> @@ -2278,17 +2287,23 @@ static int mem_cgroup_move_parent(struct
> goto out;
> if (isolate_lru_page(page))
> goto put;
> + /* The page is isolated from LRU and we have no race with splitting */
> + charge = PAGE_SIZE << compound_order(page);
Why is LRU isolation preventing the splitting?
I think we need the compound lock to get a stable read, like
compound_trans_order() does.
Hannes
> parent = mem_cgroup_from_cont(pcg);
> - ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false,
> - PAGE_SIZE);
> + ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, charge);
> if (ret || !parent)
> goto put_back;
>
> - ret = mem_cgroup_move_account(pc, child, parent, true);
> + if (charge > PAGE_SIZE)
> + flags = compound_lock_irqsave(page);
> +
> + ret = mem_cgroup_move_account(pc, child, parent, true, charge);
> if (ret)
> - mem_cgroup_cancel_charge(parent, PAGE_SIZE);
> + mem_cgroup_cancel_charge(parent, charge);
> put_back:
> + if (charge > PAGE_SIZE)
> + compound_unlock_irqrestore(page, flags);
> putback_lru_page(page);
> put:
> put_page(page);
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] memcg: fix rmdir, force_empty with THP
2011-01-20 13:41 ` Johannes Weiner
@ 2011-01-20 23:39 ` KAMEZAWA Hiroyuki
2011-01-20 23:41 ` KAMEZAWA Hiroyuki
2011-01-21 0:21 ` nishimura
0 siblings, 2 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-20 23:39 UTC (permalink / raw)
To: Johannes Weiner; +Cc: linux-mm, linux-kernel, akpm, nishimura, balbir
On Thu, 20 Jan 2011 14:41:08 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Tue, Jan 18, 2011 at 11:43:48AM +0900, KAMEZAWA Hiroyuki wrote:
> >
> > Now, when THP is enabled, memcg's rmdir() function is broken
> > because move_account() for THP page is not supported.
> >
> > This will cause account leak or -EBUSY issue at rmdir().
> > This patch fixes the issue by supporting move_account() THP pages.
> >
> > Changelog:
> > - style fix.
> > - add compound_lock for avoiding races.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > mm/memcontrol.c | 37 ++++++++++++++++++++++++++-----------
> > 1 file changed, 26 insertions(+), 11 deletions(-)
> >
> > Index: mmotm-0107/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-0107.orig/mm/memcontrol.c
> > +++ mmotm-0107/mm/memcontrol.c
>
> > @@ -2267,6 +2274,8 @@ static int mem_cgroup_move_parent(struct
> > struct cgroup *cg = child->css.cgroup;
> > struct cgroup *pcg = cg->parent;
> > struct mem_cgroup *parent;
> > + int charge = PAGE_SIZE;
>
> No need to initialize, you assign it unconditionally below.
>
> It's also a bit unfortunate that the parameter/variable with this
> meaning appears under a whole bunch of different names. page_size,
> charge_size, and now charge. Could you stick with page_size?
>
charge_size != page_size.
Clean up as you like, later. I'll Ack.
> > @@ -2278,17 +2287,23 @@ static int mem_cgroup_move_parent(struct
> > goto out;
> > if (isolate_lru_page(page))
> > goto put;
> > + /* The page is isolated from LRU and we have no race with splitting */
> > + charge = PAGE_SIZE << compound_order(page);
>
> Why is LRU isolation preventing the splitting?
>
That's my mistake of comment, which was in the older patch.
I use compound_lock now. I'll post clean up.
> I think we need the compound lock to get a stable read, like
> compound_trans_order() does.
>
Thanks,
-Kame
> Hannes
>
> > parent = mem_cgroup_from_cont(pcg);
> > - ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false,
> > - PAGE_SIZE);
> > + ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, charge);
> > if (ret || !parent)
> > goto put_back;
> >
> > - ret = mem_cgroup_move_account(pc, child, parent, true);
> > + if (charge > PAGE_SIZE)
> > + flags = compound_lock_irqsave(page);
> > +
> > + ret = mem_cgroup_move_account(pc, child, parent, true, charge);
> > if (ret)
> > - mem_cgroup_cancel_charge(parent, PAGE_SIZE);
> > + mem_cgroup_cancel_charge(parent, charge);
> > put_back:
> > + if (charge > PAGE_SIZE)
> > + compound_unlock_irqrestore(page, flags);
> > putback_lru_page(page);
> > put:
> > put_page(page);
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] memcg: fix rmdir, force_empty with THP
2011-01-20 23:39 ` KAMEZAWA Hiroyuki
@ 2011-01-20 23:41 ` KAMEZAWA Hiroyuki
2011-01-21 0:21 ` nishimura
1 sibling, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-20 23:41 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Johannes Weiner, linux-mm, linux-kernel, akpm, nishimura, balbir
On Fri, 21 Jan 2011 08:39:30 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 20 Jan 2011 14:41:08 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> > On Tue, Jan 18, 2011 at 11:43:48AM +0900, KAMEZAWA Hiroyuki wrote:
> > >
> > > Now, when THP is enabled, memcg's rmdir() function is broken
> > > because move_account() for THP page is not supported.
> > >
> > > This will cause account leak or -EBUSY issue at rmdir().
> > > This patch fixes the issue by supporting move_account() THP pages.
> > >
> > > Changelog:
> > > - style fix.
> > > - add compound_lock for avoiding races.
> > >
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > ---
> > > mm/memcontrol.c | 37 ++++++++++++++++++++++++++-----------
> > > 1 file changed, 26 insertions(+), 11 deletions(-)
> > >
> > > Index: mmotm-0107/mm/memcontrol.c
> > > ===================================================================
> > > --- mmotm-0107.orig/mm/memcontrol.c
> > > +++ mmotm-0107/mm/memcontrol.c
> >
> > > @@ -2267,6 +2274,8 @@ static int mem_cgroup_move_parent(struct
> > > struct cgroup *cg = child->css.cgroup;
> > > struct cgroup *pcg = cg->parent;
> > > struct mem_cgroup *parent;
> > > + int charge = PAGE_SIZE;
> >
> > No need to initialize, you assign it unconditionally below.
> >
> > It's also a bit unfortunate that the parameter/variable with this
> > meaning appears under a whole bunch of different names. page_size,
> > charge_size, and now charge. Could you stick with page_size?
> >
>
> charge_size != page_size.
>
> Clean up as you like, later. I'll Ack.
>
> > > @@ -2278,17 +2287,23 @@ static int mem_cgroup_move_parent(struct
> > > goto out;
> > > if (isolate_lru_page(page))
> > > goto put;
> > > + /* The page is isolated from LRU and we have no race with splitting */
> > > + charge = PAGE_SIZE << compound_order(page);
> >
> > Why is LRU isolation preventing the splitting?
> >
>
> That's my mistake of comment, which was in the older patch.
> I use compound_lock now. I'll post clean up.
>
It seems patches are sent to Linus's tree.
I'll post some clean up patches for memcontrol.c.
I hope that will not break other guy's works.
Thanks,
-kame
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] memcg: fix rmdir, force_empty with THP
2011-01-20 23:39 ` KAMEZAWA Hiroyuki
2011-01-20 23:41 ` KAMEZAWA Hiroyuki
@ 2011-01-21 0:21 ` nishimura
1 sibling, 0 replies; 11+ messages in thread
From: nishimura @ 2011-01-21 0:21 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki, Johannes Weiner
Cc: linux-mm, linux-kernel, akpm, nishimura, balbir
>> > @@ -2278,17 +2287,23 @@ static int mem_cgroup_move_parent(struct
>> > goto out;
>> > if (isolate_lru_page(page))
>> > goto put;
>> > + /* The page is isolated from LRU and we have no race with splitting */
>> > + charge = PAGE_SIZE << compound_order(page);
>>
>> Why is LRU isolation preventing the splitting?
>>
Oops! It seems that this comment made me confuse 'split' and 'collapse'.
Yes, it's 'collapse', not 'split', that is prevented by isolation.
> I use compound_lock now. I'll post clean up.
>
I'll wait for your patch.
Thanks,
Daisuke Nishimura.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-01-21 0:28 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-18 2:35 [PATCH 0/4] fix THP and memcg issues v3 KAMEZAWA Hiroyuki
2011-01-18 2:36 ` [PATCH 1/4] memcg: modify accounting function for supporting THP better KAMEZAWA Hiroyuki
2011-01-18 2:40 ` [PATCH 2/4] memcg: fix USED bit handling at uncharge in THP KAMEZAWA Hiroyuki
2011-01-19 12:10 ` Johannes Weiner
2011-01-20 0:55 ` KAMEZAWA Hiroyuki
2011-01-18 2:42 ` [PATCH 3/4] memcg: fix LRU accounting with THP KAMEZAWA Hiroyuki
2011-01-18 2:43 ` [PATCH 4/4] memcg: fix rmdir, force_empty " KAMEZAWA Hiroyuki
2011-01-20 13:41 ` Johannes Weiner
2011-01-20 23:39 ` KAMEZAWA Hiroyuki
2011-01-20 23:41 ` KAMEZAWA Hiroyuki
2011-01-21 0:21 ` nishimura
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).