LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* (no subject) @ 2011-01-31 14:03 Johannes Weiner 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw) To: akpm Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel Hi, these three patches are small fixes for urgent issues with memory cgroups and transparent huge pages. They are inspired by 4/7 of KAMEZAWA-san's recent series 'memcg : more fixes and clean up for 2.6.28-rc' to make memory cgroups work with THP. 2/3 fixes a bug that is first uncovered through 1/3. Minchan suggested this order for review purposes, but they should probably be applied the other way round (2, then 1) for better bisectability. If everybody agrees, I would prefer these going into -mmotm now, and have further cleanups and optimizations based on them. Hannes ^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch 1/3] memcg: prevent endless loop when charging huge pages 2011-01-31 14:03 Johannes Weiner @ 2011-01-31 14:03 ` Johannes Weiner 2011-01-31 22:27 ` Minchan Kim 2011-01-31 23:48 ` KAMEZAWA Hiroyuki 2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner 2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 2 siblings, 2 replies; 25+ messages in thread From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw) To: akpm Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel The charging code can encounter a charge size that is bigger than a regular page in two situations: one is a batched charge to fill the per-cpu stocks, the other is a huge page charge. This code is distributed over two functions, however, and only the outer one is aware of huge pages. In case the charging fails, the inner function will tell the outer function to retry if the charge size is bigger than regular pages--assuming batched charging is the only case. And the outer function will retry forever charging a huge page. This patch makes sure the inner function can distinguish between batch charging and a single huge page charge. It will only signal another attempt if batch charging failed, and go into regular reclaim when it is called on behalf of a huge page. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/memcontrol.c | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d572102..73ea323 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1837,8 +1837,15 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask, flags |= MEM_CGROUP_RECLAIM_NOSWAP; } else mem_over_limit = mem_cgroup_from_res_counter(fail_res, res); - - if (csize > PAGE_SIZE) /* change csize and retry */ + /* + * csize can be either a huge page (HPAGE_SIZE), a batch of + * regular pages (CHARGE_SIZE), or a single regular page + * (PAGE_SIZE). + * + * Never reclaim on behalf of optional batching, retry with a + * single page instead. + */ + if (csize == CHARGE_SIZE) return CHARGE_RETRY; if (!(gfp_mask & __GFP_WAIT)) -- 1.7.3.5 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [patch 1/3] memcg: prevent endless loop when charging huge pages 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner @ 2011-01-31 22:27 ` Minchan Kim 2011-01-31 23:48 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 25+ messages in thread From: Minchan Kim @ 2011-01-31 22:27 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, kamezawa.hiroyu, nishimura, balbir, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 11:03 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > The charging code can encounter a charge size that is bigger than a > regular page in two situations: one is a batched charge to fill the > per-cpu stocks, the other is a huge page charge. > > This code is distributed over two functions, however, and only the > outer one is aware of huge pages. In case the charging fails, the > inner function will tell the outer function to retry if the charge > size is bigger than regular pages--assuming batched charging is the > only case. And the outer function will retry forever charging a huge > page. > > This patch makes sure the inner function can distinguish between batch > charging and a single huge page charge. It will only signal another > attempt if batch charging failed, and go into regular reclaim when it > is called on behalf of a huge page. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch 1/3] memcg: prevent endless loop when charging huge pages 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner 2011-01-31 22:27 ` Minchan Kim @ 2011-01-31 23:48 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 25+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-31 23:48 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, 31 Jan 2011 15:03:53 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > The charging code can encounter a charge size that is bigger than a > regular page in two situations: one is a batched charge to fill the > per-cpu stocks, the other is a huge page charge. > > This code is distributed over two functions, however, and only the > outer one is aware of huge pages. In case the charging fails, the > inner function will tell the outer function to retry if the charge > size is bigger than regular pages--assuming batched charging is the > only case. And the outer function will retry forever charging a huge > page. > > This patch makes sure the inner function can distinguish between batch > charging and a single huge page charge. It will only signal another > attempt if batch charging failed, and go into regular reclaim when it > is called on behalf of a huge page. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Thank you very much. Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > mm/memcontrol.c | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index d572102..73ea323 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1837,8 +1837,15 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask, > flags |= MEM_CGROUP_RECLAIM_NOSWAP; > } else > mem_over_limit = mem_cgroup_from_res_counter(fail_res, res); > - > - if (csize > PAGE_SIZE) /* change csize and retry */ > + /* > + * csize can be either a huge page (HPAGE_SIZE), a batch of > + * regular pages (CHARGE_SIZE), or a single regular page > + * (PAGE_SIZE). > + * > + * Never reclaim on behalf of optional batching, retry with a > + * single page instead. > + */ > + if (csize == CHARGE_SIZE) > return CHARGE_RETRY; > > if (!(gfp_mask & __GFP_WAIT)) > -- > 1.7.3.5 > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-01-31 14:03 Johannes Weiner 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner @ 2011-01-31 14:03 ` Johannes Weiner 2011-01-31 22:41 ` Andrew Morton 2011-01-31 22:42 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Minchan Kim 2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 2 siblings, 2 replies; 25+ messages in thread From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw) To: akpm Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel If reclaim after a failed charging was unsuccessful, the limits are checked again, just in case they settled by means of other tasks. This is all fine as long as every charge is of size PAGE_SIZE, because in that case, being below the limit means having at least PAGE_SIZE bytes available. But with transparent huge pages, we may end up in an endless loop where charging and reclaim fail, but we keep going because the limits are not yet exceeded, although not allowing for a huge page. Fix this up by explicitely checking for enough room, not just whether we are within limits. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/linux/res_counter.h | 12 ++++++++++++ mm/memcontrol.c | 27 ++++++++++++++++++++------- 2 files changed, 32 insertions(+), 7 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index fcb9884..5cfd78a 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -182,6 +182,18 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt) return ret; } +static inline bool res_counter_check_margin(struct res_counter *cnt, + unsigned long bytes) +{ + bool ret; + unsigned long flags; + + spin_lock_irqsave(&cnt->lock, flags); + ret = cnt->limit - cnt->usage >= bytes; + spin_unlock_irqrestore(&cnt->lock, flags); + return ret; +} + static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt) { bool ret; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 73ea323..c28072f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) return false; } +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) +{ + if (!res_counter_check_margin(&mem->res, bytes)) + return false; + if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes)) + return false; + return true; +} + static unsigned int get_swappiness(struct mem_cgroup *memcg) { struct cgroup *cgrp = memcg->css.cgroup; @@ -1852,15 +1861,19 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask, return CHARGE_WOULDBLOCK; ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL, - gfp_mask, flags); + gfp_mask, flags); + if (mem_cgroup_check_margin(mem_over_limit, csize)) + return CHARGE_RETRY; /* - * try_to_free_mem_cgroup_pages() might not give us a full - * picture of reclaim. Some pages are reclaimed and might be - * moved to swap cache or just unmapped from the cgroup. - * Check the limit again to see if the reclaim reduced the - * current usage of the cgroup before giving up + * Even though the limit is exceeded at this point, reclaim + * may have been able to free some pages. Retry the charge + * before killing the task. + * + * Only for regular pages, though: huge pages are rather + * unlikely to succeed so close to the limit, and we fall back + * to regular pages anyway in case of failure. */ - if (ret || mem_cgroup_check_under_limit(mem_over_limit)) + if (csize == PAGE_SIZE && ret) return CHARGE_RETRY; /* -- 1.7.3.5 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner @ 2011-01-31 22:41 ` Andrew Morton 2011-01-31 23:50 ` KAMEZAWA Hiroyuki 2011-02-01 0:04 ` Johannes Weiner 2011-01-31 22:42 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Minchan Kim 1 sibling, 2 replies; 25+ messages in thread From: Andrew Morton @ 2011-01-31 22:41 UTC (permalink / raw) To: Johannes Weiner Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, 31 Jan 2011 15:03:54 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > +static inline bool res_counter_check_margin(struct res_counter *cnt, > + unsigned long bytes) > +{ > + bool ret; > + unsigned long flags; > + > + spin_lock_irqsave(&cnt->lock, flags); > + ret = cnt->limit - cnt->usage >= bytes; > + spin_unlock_irqrestore(&cnt->lock, flags); > + return ret; > +} > + > static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt) > { > bool ret; > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 73ea323..c28072f 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) > return false; > } > > +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) > +{ > + if (!res_counter_check_margin(&mem->res, bytes)) > + return false; > + if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes)) > + return false; > + return true; > +} argh. If you ever have a function with the string "check" in its name, it's a good sign that you did something wrong. Check what? Against what? Returning what? mem_cgroup_check_under_limit() isn't toooo bad - the name tells you what's being checked and tells you what to expect the return value to mean. But "res_counter_check_margin" and "mem_cgroup_check_margin" are just awful. Something like bool res_counter_may_charge(counter, bytes) would be much clearer. If we really want to stick with the "check" names (perhaps as an ironic reference to res_counter's past mistakes) then please at least document the sorry things? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-01-31 22:41 ` Andrew Morton @ 2011-01-31 23:50 ` KAMEZAWA Hiroyuki 2011-02-01 0:04 ` Johannes Weiner 1 sibling, 0 replies; 25+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-31 23:50 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, 31 Jan 2011 14:41:31 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Mon, 31 Jan 2011 15:03:54 +0100 > Johannes Weiner <hannes@cmpxchg.org> wrote: > > > +static inline bool res_counter_check_margin(struct res_counter *cnt, > > + unsigned long bytes) > > +{ > > + bool ret; > > + unsigned long flags; > > + > > + spin_lock_irqsave(&cnt->lock, flags); > > + ret = cnt->limit - cnt->usage >= bytes; > > + spin_unlock_irqrestore(&cnt->lock, flags); > > + return ret; > > +} > > + > > static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt) > > { > > bool ret; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 73ea323..c28072f 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) > > return false; > > } > > > > +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) > > +{ > > + if (!res_counter_check_margin(&mem->res, bytes)) > > + return false; > > + if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes)) > > + return false; > > + return true; > > +} > > argh. > > If you ever have a function with the string "check" in its name, it's a > good sign that you did something wrong. > > Check what? Against what? Returning what? > > mem_cgroup_check_under_limit() isn't toooo bad - the name tells you > what's being checked and tells you what to expect the return value to > mean. > > But "res_counter_check_margin" and "mem_cgroup_check_margin" are just > awful. Something like > > bool res_counter_may_charge(counter, bytes) > > would be much clearer. > > If we really want to stick with the "check" names (perhaps as an ironic > reference to res_counter's past mistakes) then please at least document > the sorry things? > Ah, I ack the concept of patch. Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Johannes, could you change name ? I'm sorry. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-01-31 22:41 ` Andrew Morton 2011-01-31 23:50 ` KAMEZAWA Hiroyuki @ 2011-02-01 0:04 ` Johannes Weiner 2011-02-01 0:24 ` Andrew Morton 1 sibling, 1 reply; 25+ messages in thread From: Johannes Weiner @ 2011-02-01 0:04 UTC (permalink / raw) To: Andrew Morton Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 02:41:31PM -0800, Andrew Morton wrote: > On Mon, 31 Jan 2011 15:03:54 +0100 > Johannes Weiner <hannes@cmpxchg.org> wrote: > > @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) > > return false; > > } > > > > +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) > > +{ > > + if (!res_counter_check_margin(&mem->res, bytes)) > > + return false; > > + if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes)) > > + return false; > > + return true; > > +} > > argh. > > If you ever have a function with the string "check" in its name, it's a > good sign that you did something wrong. > > Check what? Against what? Returning what? > > mem_cgroup_check_under_limit() isn't toooo bad - the name tells you > what's being checked and tells you what to expect the return value to > mean. > > But "res_counter_check_margin" and "mem_cgroup_check_margin" are just > awful. Something like > > bool res_counter_may_charge(counter, bytes) > > would be much clearer. That makes sense for the hard limit. But the oh-so-generic resource counters also have a soft limit, and you don't ask for that when you want to charge. Right now, I do not feel creative enough to come up with a symmetric-sounding counterpart. > If we really want to stick with the "check" names (perhaps as an ironic > reference to res_counter's past mistakes) then please at least document > the sorry things? I cowardly went with this option and have a patch below to fold into this fix. Maybe it would be better to use res_counter_margin(cnt) >= wanted throughout the code. Or still better, make memcg work on pages and res_counters on unsigned longs so the locking is no longer needed, together with an API for most obvious maths. I will work something out and submit it separately. --- Subject: [patch fixup] res_counter: document res_counter_check_margin() Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index 5cfd78a..a5930cb 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -182,6 +182,14 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt) return ret; } +/** + * res_counter_check_margin - check if the counter allows charging + * @cnt: the resource counter to check + * @bytes: the number of bytes to check the remaining space against + * + * Returns a boolean value on whether the counter can be charged + * @bytes or whether this would exceed the limit. + */ static inline bool res_counter_check_margin(struct res_counter *cnt, unsigned long bytes) { ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-02-01 0:04 ` Johannes Weiner @ 2011-02-01 0:24 ` Andrew Morton 2011-02-01 0:34 ` Johannes Weiner 2011-02-03 12:53 ` [patch 0/2] memcg: clean up limit checking Johannes Weiner 0 siblings, 2 replies; 25+ messages in thread From: Andrew Morton @ 2011-02-01 0:24 UTC (permalink / raw) To: Johannes Weiner Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Tue, 1 Feb 2011 01:04:55 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > Maybe it would be better to use res_counter_margin(cnt) >= wanted > throughout the code. yup. > --- a/include/linux/res_counter.h > +++ b/include/linux/res_counter.h > @@ -182,6 +182,14 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt) > return ret; > } > > +/** > + * res_counter_check_margin - check if the counter allows charging > + * @cnt: the resource counter to check > + * @bytes: the number of bytes to check the remaining space against > + * > + * Returns a boolean value on whether the counter can be charged > + * @bytes or whether this would exceed the limit. > + */ > static inline bool res_counter_check_margin(struct res_counter *cnt, > unsigned long bytes) > { mem_cgroup_check_margin() needs some lipstick too. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-02-01 0:24 ` Andrew Morton @ 2011-02-01 0:34 ` Johannes Weiner 2011-02-03 12:53 ` [patch 0/2] memcg: clean up limit checking Johannes Weiner 1 sibling, 0 replies; 25+ messages in thread From: Johannes Weiner @ 2011-02-01 0:34 UTC (permalink / raw) To: Andrew Morton Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 04:24:48PM -0800, Andrew Morton wrote: > On Tue, 1 Feb 2011 01:04:55 +0100 > Johannes Weiner <hannes@cmpxchg.org> wrote: > > @@ -182,6 +182,14 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt) > > return ret; > > } > > > > +/** > > + * res_counter_check_margin - check if the counter allows charging > > + * @cnt: the resource counter to check > > + * @bytes: the number of bytes to check the remaining space against > > + * > > + * Returns a boolean value on whether the counter can be charged > > + * @bytes or whether this would exceed the limit. > > + */ > > static inline bool res_counter_check_margin(struct res_counter *cnt, > > unsigned long bytes) > > { > > mem_cgroup_check_margin() needs some lipstick too. *oink* Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9e5de7c..6c07554 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1111,6 +1111,14 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) return false; } +/** + * mem_cgroup_check_margin - check if the memory cgroup allows charging + * @mem: memory cgroup to check + * @bytes: the number of bytes the caller intends to charge + * + * Returns a boolean value on whether @mem can be charged @bytes or + * whether this would exceed the limit. + */ static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) { if (!res_counter_check_margin(&mem->res, bytes)) ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [patch 0/2] memcg: clean up limit checking 2011-02-01 0:24 ` Andrew Morton 2011-02-01 0:34 ` Johannes Weiner @ 2011-02-03 12:53 ` Johannes Weiner 2011-02-03 12:54 ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner 2011-02-03 12:56 ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner 1 sibling, 2 replies; 25+ messages in thread From: Johannes Weiner @ 2011-02-03 12:53 UTC (permalink / raw) To: Andrew Morton Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 04:24:48PM -0800, Andrew Morton wrote: > On Tue, 1 Feb 2011 01:04:55 +0100 > Johannes Weiner <hannes@cmpxchg.org> wrote: > > > Maybe it would be better to use res_counter_margin(cnt) >= wanted > > throughout the code. > > yup. Okay, I cleaned it all up a bit for .39. While doing so, I also found that we are reclaiming one page too much when pushing back on behalf of soft limits. So 1/2 fixes the soft limit reclaim off-by-one-page, and 2/2 reduces all the limit checks to two functions. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch 1/2] memcg: soft limit reclaim should end at limit not below 2011-02-03 12:53 ` [patch 0/2] memcg: clean up limit checking Johannes Weiner @ 2011-02-03 12:54 ` Johannes Weiner 2011-02-03 23:41 ` KAMEZAWA Hiroyuki 2011-02-04 4:10 ` Balbir Singh 2011-02-03 12:56 ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner 1 sibling, 2 replies; 25+ messages in thread From: Johannes Weiner @ 2011-02-03 12:54 UTC (permalink / raw) To: Andrew Morton Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel Soft limit reclaim continues until the usage is below the current soft limit, but the documented semantics are actually that soft limit reclaim will push usage back until the soft limits are met again. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/linux/res_counter.h | 4 ++-- mm/memcontrol.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index a5930cb..bf1f01b 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -139,7 +139,7 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt) static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt) { - if (cnt->usage < cnt->soft_limit) + if (cnt->usage <= cnt->soft_limit) return true; return false; @@ -202,7 +202,7 @@ static inline bool res_counter_check_margin(struct res_counter *cnt, return ret; } -static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt) +static inline bool res_counter_check_within_soft_limit(struct res_counter *cnt) { bool ret; unsigned long flags; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 350bfd2..23b14188 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1451,7 +1451,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, return ret; total += ret; if (check_soft) { - if (res_counter_check_under_soft_limit(&root_mem->res)) + if (res_counter_check_within_soft_limit(&root_mem->res)) return total; } else if (mem_cgroup_check_under_limit(root_mem)) return 1 + total; -- 1.7.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [patch 1/2] memcg: soft limit reclaim should end at limit not below 2011-02-03 12:54 ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner @ 2011-02-03 23:41 ` KAMEZAWA Hiroyuki 2011-02-04 4:10 ` Balbir Singh 1 sibling, 0 replies; 25+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-02-03 23:41 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Thu, 3 Feb 2011 13:54:54 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > Soft limit reclaim continues until the usage is below the current soft > limit, but the documented semantics are actually that soft limit > reclaim will push usage back until the soft limits are met again. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch 1/2] memcg: soft limit reclaim should end at limit not below 2011-02-03 12:54 ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner 2011-02-03 23:41 ` KAMEZAWA Hiroyuki @ 2011-02-04 4:10 ` Balbir Singh 1 sibling, 0 replies; 25+ messages in thread From: Balbir Singh @ 2011-02-04 4:10 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, kamezawa.hiroyu, nishimura, minchan.kim, linux-mm, linux-kernel On Thu, Feb 3, 2011 at 6:24 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > Soft limit reclaim continues until the usage is below the current soft > limit, but the documented semantics are actually that soft limit > reclaim will push usage back until the soft limits are met again. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch 2/2] memcg: simplify the way memory limits are checked 2011-02-03 12:53 ` [patch 0/2] memcg: clean up limit checking Johannes Weiner 2011-02-03 12:54 ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner @ 2011-02-03 12:56 ` Johannes Weiner 2011-02-03 23:44 ` KAMEZAWA Hiroyuki 2011-02-04 4:12 ` Balbir Singh 1 sibling, 2 replies; 25+ messages in thread From: Johannes Weiner @ 2011-02-03 12:56 UTC (permalink / raw) To: Andrew Morton Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel Since transparent huge pages, checking whether memory cgroups are below their limits is no longer enough, but the actual amount of chargeable space is important. To not have more than one limit-checking interface, replace memory_cgroup_check_under_limit() and memory_cgroup_check_margin() with a single memory_cgroup_margin() that returns the chargeable space and leaves the comparison to the callsite. Soft limits are now checked the other way round, by using the already existing function that returns the amount by which soft limits are exceeded: res_counter_soft_limit_excess(). Also remove all the corresponding functions on the res_counter side that are now no longer used. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/linux/res_counter.h | 72 ++++++++---------------------------------- mm/memcontrol.c | 49 ++++++++++------------------- 2 files changed, 31 insertions(+), 90 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index bf1f01b..c9d625c 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -129,20 +129,22 @@ int __must_check res_counter_charge(struct res_counter *counter, void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); void res_counter_uncharge(struct res_counter *counter, unsigned long val); -static inline bool res_counter_limit_check_locked(struct res_counter *cnt) -{ - if (cnt->usage < cnt->limit) - return true; - - return false; -} - -static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt) +/** + * res_counter_margin - calculate chargeable space of a counter + * @cnt: the counter + * + * Returns the difference between the hard limit and the current usage + * of resource counter @cnt. + */ +static inline unsigned long long res_counter_margin(struct res_counter *cnt) { - if (cnt->usage <= cnt->soft_limit) - return true; + unsigned long long margin; + unsigned long flags; - return false; + spin_lock_irqsave(&cnt->lock, flags); + margin = cnt->limit - cnt->usage; + spin_unlock_irqrestore(&cnt->lock, flags); + return margin; } /** @@ -167,52 +169,6 @@ res_counter_soft_limit_excess(struct res_counter *cnt) return excess; } -/* - * Helper function to detect if the cgroup is within it's limit or - * not. It's currently called from cgroup_rss_prepare() - */ -static inline bool res_counter_check_under_limit(struct res_counter *cnt) -{ - bool ret; - unsigned long flags; - - spin_lock_irqsave(&cnt->lock, flags); - ret = res_counter_limit_check_locked(cnt); - spin_unlock_irqrestore(&cnt->lock, flags); - return ret; -} - -/** - * res_counter_check_margin - check if the counter allows charging - * @cnt: the resource counter to check - * @bytes: the number of bytes to check the remaining space against - * - * Returns a boolean value on whether the counter can be charged - * @bytes or whether this would exceed the limit. - */ -static inline bool res_counter_check_margin(struct res_counter *cnt, - unsigned long bytes) -{ - bool ret; - unsigned long flags; - - spin_lock_irqsave(&cnt->lock, flags); - ret = cnt->limit - cnt->usage >= bytes; - spin_unlock_irqrestore(&cnt->lock, flags); - return ret; -} - -static inline bool res_counter_check_within_soft_limit(struct res_counter *cnt) -{ - bool ret; - unsigned long flags; - - spin_lock_irqsave(&cnt->lock, flags); - ret = res_counter_soft_limit_check_locked(cnt); - spin_unlock_irqrestore(&cnt->lock, flags); - return ret; -} - static inline void res_counter_reset_max(struct res_counter *cnt) { unsigned long flags; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 23b14188..e1ab9c3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -504,11 +504,6 @@ static void mem_cgroup_remove_from_trees(struct mem_cgroup *mem) } } -static inline unsigned long mem_cgroup_get_excess(struct mem_cgroup *mem) -{ - return res_counter_soft_limit_excess(&mem->res) >> PAGE_SHIFT; -} - static struct mem_cgroup_per_zone * __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz) { @@ -1101,33 +1096,21 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan, #define mem_cgroup_from_res_counter(counter, member) \ container_of(counter, struct mem_cgroup, member) -static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) -{ - if (do_swap_account) { - if (res_counter_check_under_limit(&mem->res) && - res_counter_check_under_limit(&mem->memsw)) - return true; - } else - if (res_counter_check_under_limit(&mem->res)) - return true; - return false; -} - /** - * mem_cgroup_check_margin - check if the memory cgroup allows charging - * @mem: memory cgroup to check - * @bytes: the number of bytes the caller intends to charge + * mem_cgroup_margin - calculate chargeable space of a memory cgroup + * @mem: the memory cgroup * - * Returns a boolean value on whether @mem can be charged @bytes or - * whether this would exceed the limit. + * Returns the maximum amount of memory @mem can be charged with, in + * bytes. */ -static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) +static unsigned long long mem_cgroup_margin(struct mem_cgroup *mem) { - if (!res_counter_check_margin(&mem->res, bytes)) - return false; - if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes)) - return false; - return true; + unsigned long long margin; + + margin = res_counter_margin(&mem->res); + if (do_swap_account) + margin = min(margin, res_counter_margin(&mem->memsw)); + return margin; } static unsigned int get_swappiness(struct mem_cgroup *memcg) @@ -1394,7 +1377,9 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, bool noswap = reclaim_options & MEM_CGROUP_RECLAIM_NOSWAP; bool shrink = reclaim_options & MEM_CGROUP_RECLAIM_SHRINK; bool check_soft = reclaim_options & MEM_CGROUP_RECLAIM_SOFT; - unsigned long excess = mem_cgroup_get_excess(root_mem); + unsigned long excess; + + excess = res_counter_soft_limit_excess(&root_mem->res) >> PAGE_SHIFT; /* If memsw_is_minimum==1, swap-out is of-no-use. */ if (root_mem->memsw_is_minimum) @@ -1451,9 +1436,9 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, return ret; total += ret; if (check_soft) { - if (res_counter_check_within_soft_limit(&root_mem->res)) + if (!res_counter_soft_limit_excess(&root_mem->res)) return total; - } else if (mem_cgroup_check_under_limit(root_mem)) + } else if (mem_cgroup_margin(root_mem)) return 1 + total; } return total; @@ -1872,7 +1857,7 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask, ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL, gfp_mask, flags); - if (mem_cgroup_check_margin(mem_over_limit, csize)) + if (mem_cgroup_margin(mem_over_limit) >= csize) return CHARGE_RETRY; /* * Even though the limit is exceeded at this point, reclaim -- 1.7.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [patch 2/2] memcg: simplify the way memory limits are checked 2011-02-03 12:56 ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner @ 2011-02-03 23:44 ` KAMEZAWA Hiroyuki 2011-02-04 4:12 ` Balbir Singh 1 sibling, 0 replies; 25+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-02-03 23:44 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Thu, 3 Feb 2011 13:56:11 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > Since transparent huge pages, checking whether memory cgroups are > below their limits is no longer enough, but the actual amount of > chargeable space is important. > > To not have more than one limit-checking interface, replace > memory_cgroup_check_under_limit() and memory_cgroup_check_margin() > with a single memory_cgroup_margin() that returns the chargeable space > and leaves the comparison to the callsite. > > Soft limits are now checked the other way round, by using the already > existing function that returns the amount by which soft limits are > exceeded: res_counter_soft_limit_excess(). > > Also remove all the corresponding functions on the res_counter side > that are now no longer used. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch 2/2] memcg: simplify the way memory limits are checked 2011-02-03 12:56 ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner 2011-02-03 23:44 ` KAMEZAWA Hiroyuki @ 2011-02-04 4:12 ` Balbir Singh 1 sibling, 0 replies; 25+ messages in thread From: Balbir Singh @ 2011-02-04 4:12 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, kamezawa.hiroyu, nishimura, minchan.kim, linux-mm, linux-kernel On Thu, Feb 3, 2011 at 6:26 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > Since transparent huge pages, checking whether memory cgroups are > below their limits is no longer enough, but the actual amount of > chargeable space is important. > > To not have more than one limit-checking interface, replace > memory_cgroup_check_under_limit() and memory_cgroup_check_margin() > with a single memory_cgroup_margin() that returns the chargeable space > and leaves the comparison to the callsite. > > Soft limits are now checked the other way round, by using the already > existing function that returns the amount by which soft limits are > exceeded: res_counter_soft_limit_excess(). > > Also remove all the corresponding functions on the res_counter side > that are now no longer used. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner 2011-01-31 22:41 ` Andrew Morton @ 2011-01-31 22:42 ` Minchan Kim 1 sibling, 0 replies; 25+ messages in thread From: Minchan Kim @ 2011-01-31 22:42 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, kamezawa.hiroyu, nishimura, balbir, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 11:03 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > If reclaim after a failed charging was unsuccessful, the limits are > checked again, just in case they settled by means of other tasks. > > This is all fine as long as every charge is of size PAGE_SIZE, because > in that case, being below the limit means having at least PAGE_SIZE > bytes available. > > But with transparent huge pages, we may end up in an endless loop > where charging and reclaim fail, but we keep going because the limits > are not yet exceeded, although not allowing for a huge page. > > Fix this up by explicitely checking for enough room, not just whether > we are within limits. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch 3/3] memcg: never OOM when charging huge pages 2011-01-31 14:03 Johannes Weiner 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner 2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner @ 2011-01-31 14:03 ` Johannes Weiner 2011-01-31 22:52 ` Minchan Kim 2011-01-31 23:51 ` KAMEZAWA Hiroyuki 2 siblings, 2 replies; 25+ messages in thread From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw) To: akpm Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel Huge page coverage should obviously have less priority than the continued execution of a process. Never kill a process when charging it a huge page fails. Instead, give up after the first failed reclaim attempt and fall back to regular pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/memcontrol.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c28072f..9e5de7c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2343,13 +2343,19 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm, gfp_t gfp_mask, enum charge_type ctype) { struct mem_cgroup *mem = NULL; + int page_size = PAGE_SIZE; struct page_cgroup *pc; + bool oom = true; int ret; - int page_size = PAGE_SIZE; if (PageTransHuge(page)) { page_size <<= compound_order(page); VM_BUG_ON(!PageTransHuge(page)); + /* + * Never OOM-kill a process for a huge page. The + * fault handler will fall back to regular pages. + */ + oom = false; } pc = lookup_page_cgroup(page); @@ -2358,7 +2364,7 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm, return 0; prefetchw(pc); - ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, true, page_size); + ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, oom, page_size); if (ret || !mem) return ret; -- 1.7.3.5 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [patch 3/3] memcg: never OOM when charging huge pages 2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner @ 2011-01-31 22:52 ` Minchan Kim 2011-01-31 23:51 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 25+ messages in thread From: Minchan Kim @ 2011-01-31 22:52 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, kamezawa.hiroyu, nishimura, balbir, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 11:03 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > Huge page coverage should obviously have less priority than the > continued execution of a process. > > Never kill a process when charging it a huge page fails. Instead, > give up after the first failed reclaim attempt and fall back to > regular pages. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch 3/3] memcg: never OOM when charging huge pages 2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 2011-01-31 22:52 ` Minchan Kim @ 2011-01-31 23:51 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 25+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-31 23:51 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, 31 Jan 2011 15:03:55 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > Huge page coverage should obviously have less priority than the > continued execution of a process. > > Never kill a process when charging it a huge page fails. Instead, > give up after the first failed reclaim attempt and fall back to > regular pages. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu,com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 0/7] memcg : more fixes and clean up for 2.6.28-rc @ 2011-01-21 6:34 KAMEZAWA Hiroyuki 2011-01-21 6:44 ` [PATCH 4/7] memcg : fix charge function of THP allocation KAMEZAWA Hiroyuki 0 siblings, 1 reply; 25+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-21 6:34 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, nishimura, hannes, balbir, akpm This is a set of patches which I'm now testing, and it seems it passed small test. So I post this. Some are bug fixes and other are clean ups but I think these are for 2.6.38. Brief decription [1/7] remove buggy comment and use better name for mem_cgroup_move_parent() The fixes for mem_cgroup_move_parent() is already in mainline, this is an add-on. [2/7] a bug fix for a new function mem_cgroup_split_huge_fixup(), which was recently merged. [3/7] prepare for fixes in [4/7],[5/7]. This is an enhancement of function which is used now. [4/7] fix mem_cgroup_charge() for THP. By this, memory cgroup's charge function will handle THP request in sane way. [5/7] fix khugepaged scan condition for memcg. This is a fix for hang of processes under small/buzy memory cgroup. [6/7] rename vairable names to be page_size, nr_pages, bytes rather than ambiguous names. [7/7] some memcg function requires the caller to initialize variable before call. It's ugly and fix it. I think patch 1,2,3,4,5 is urgent ones. But I think patch "5" needs some good review. But without "5", stress-test on small memory cgroup will not run succesfully. Thanks, -Kame ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 4/7] memcg : fix charge function of THP allocation. 2011-01-21 6:34 [PATCH 0/7] memcg : more fixes and clean up for 2.6.28-rc KAMEZAWA Hiroyuki @ 2011-01-21 6:44 ` KAMEZAWA Hiroyuki 2011-01-27 10:34 ` Johannes Weiner 0 siblings, 1 reply; 25+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-21 6:44 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, hannes, balbir, akpm From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> When THP is used, Hugepage size charge can happen. It's not handled correctly in mem_cgroup_do_charge(). For example, THP can fallback to small page allocation when HUGEPAGE allocation seems difficult or busy, but memory cgroup doesn't understand it and continue to try HUGEPAGE charging. And the worst thing is memory cgroup believes 'memory reclaim succeeded' if limit - usage > PAGE_SIZE. By this, khugepaged etc...can goes into inifinite reclaim loop if tasks in memcg are busy. After this patch - Hugepage allocation will fail if 1st trial of page reclaim fails. - distinguish THP allocaton from Bached allocation. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memcontrol.c | 51 +++++++++++++++++++++++++++++++++++---------------- 1 file changed, 35 insertions(+), 16 deletions(-) Index: mmotm-0107/mm/memcontrol.c =================================================================== --- mmotm-0107.orig/mm/memcontrol.c +++ mmotm-0107/mm/memcontrol.c @@ -1812,24 +1812,25 @@ enum { CHARGE_OK, /* success */ CHARGE_RETRY, /* need to retry but retry is not bad */ CHARGE_NOMEM, /* we can't do more. return -ENOMEM */ + CHARGE_NEED_BREAK, /* big size allocation failure */ CHARGE_WOULDBLOCK, /* GFP_WAIT wasn't set and no enough res. */ CHARGE_OOM_DIE, /* the current is killed because of OOM */ }; static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask, - int csize, bool oom_check) + int page_size, bool do_reclaim, bool oom_check) { struct mem_cgroup *mem_over_limit; struct res_counter *fail_res; unsigned long flags = 0; int ret; - ret = res_counter_charge(&mem->res, csize, &fail_res); + ret = res_counter_charge(&mem->res, page_size, &fail_res); if (likely(!ret)) { if (!do_swap_account) return CHARGE_OK; - ret = res_counter_charge(&mem->memsw, csize, &fail_res); + ret = res_counter_charge(&mem->memsw, page_size, &fail_res); if (likely(!ret)) return CHARGE_OK; @@ -1838,14 +1839,14 @@ static int __mem_cgroup_do_charge(struct } else mem_over_limit = mem_cgroup_from_res_counter(fail_res, res); - if (csize > PAGE_SIZE) /* change csize and retry */ + if (!do_reclaim) return CHARGE_RETRY; if (!(gfp_mask & __GFP_WAIT)) return CHARGE_WOULDBLOCK; ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL, - gfp_mask, flags, csize); + gfp_mask, flags, page_size); /* * try_to_free_mem_cgroup_pages() might not give us a full * picture of reclaim. Some pages are reclaimed and might be @@ -1853,19 +1854,28 @@ static int __mem_cgroup_do_charge(struct * Check the limit again to see if the reclaim reduced the * current usage of the cgroup before giving up */ - if (ret || mem_cgroup_check_under_limit(mem_over_limit, csize)) + if (ret || mem_cgroup_check_under_limit(mem_over_limit, page_size)) return CHARGE_RETRY; /* + * When page_size > PAGE_SIZE, THP calls this function and it's + * ok to tell 'there are not enough pages for hugepage'. THP will + * fallback into PAGE_SIZE allocation. If we do reclaim eagerly, + * page splitting will occur and it seems much worse. + */ + if (page_size > PAGE_SIZE) + return CHARGE_NEED_BREAK; + + /* * At task move, charge accounts can be doubly counted. So, it's * better to wait until the end of task_move if something is going on. */ if (mem_cgroup_wait_acct_move(mem_over_limit)) return CHARGE_RETRY; - /* If we don't need to call oom-killer at el, return immediately */ if (!oom_check) return CHARGE_NOMEM; + /* check OOM */ if (!mem_cgroup_handle_oom(mem_over_limit, gfp_mask)) return CHARGE_OOM_DIE; @@ -1885,7 +1895,7 @@ static int __mem_cgroup_try_charge(struc int nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES; struct mem_cgroup *mem = NULL; int ret; - int csize = max(CHARGE_SIZE, (unsigned long) page_size); + bool use_pcp_cache = (page_size == PAGE_SIZE); /* * Unlike gloval-vm's OOM-kill, we're not in memory shortage @@ -1910,7 +1920,7 @@ again: VM_BUG_ON(css_is_removed(&mem->css)); if (mem_cgroup_is_root(mem)) goto done; - if (page_size == PAGE_SIZE && consume_stock(mem)) + if (use_pcp_cache && consume_stock(mem)) goto done; css_get(&mem->css); } else { @@ -1933,7 +1943,7 @@ again: rcu_read_unlock(); goto done; } - if (page_size == PAGE_SIZE && consume_stock(mem)) { + if (use_pcp_cache && consume_stock(mem)) { /* * It seems dagerous to access memcg without css_get(). * But considering how consume_stok works, it's not @@ -1967,17 +1977,26 @@ again: oom_check = true; nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES; } - - ret = __mem_cgroup_do_charge(mem, gfp_mask, csize, oom_check); + if (use_pcp_cache) + ret = __mem_cgroup_do_charge(mem, gfp_mask, + CHARGE_SIZE, false, oom_check); + else + ret = __mem_cgroup_do_charge(mem, gfp_mask, + page_size, true, oom_check); switch (ret) { case CHARGE_OK: break; case CHARGE_RETRY: /* not in OOM situation but retry */ - csize = page_size; + if (use_pcp_cache)/* need to reclaim pages */ + use_pcp_cache = false; css_put(&mem->css); mem = NULL; goto again; + case CHARGE_NEED_BREAK: /* page_size > PAGE_SIZE */ + css_put(&mem->css); + /* returning faiulre doesn't mean OOM for hugepages */ + goto nomem; case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */ css_put(&mem->css); goto nomem; @@ -1994,9 +2013,9 @@ again: goto bypass; } } while (ret != CHARGE_OK); - - if (csize > page_size) - refill_stock(mem, csize - page_size); + /* This flag is cleared when we fail CHAEGE_SIZE charge. */ + if (use_pcp_cache) + refill_stock(mem, CHARGE_SIZE - page_size); css_put(&mem->css); done: *memcg = mem; ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/7] memcg : fix charge function of THP allocation. 2011-01-21 6:44 ` [PATCH 4/7] memcg : fix charge function of THP allocation KAMEZAWA Hiroyuki @ 2011-01-27 10:34 ` Johannes Weiner 2011-01-27 13:47 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 0 siblings, 1 reply; 25+ messages in thread From: Johannes Weiner @ 2011-01-27 10:34 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, balbir, akpm On Fri, Jan 21, 2011 at 03:44:30PM +0900, KAMEZAWA Hiroyuki wrote: > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > When THP is used, Hugepage size charge can happen. It's not handled > correctly in mem_cgroup_do_charge(). For example, THP can fallback > to small page allocation when HUGEPAGE allocation seems difficult > or busy, but memory cgroup doesn't understand it and continue to > try HUGEPAGE charging. And the worst thing is memory cgroup > believes 'memory reclaim succeeded' if limit - usage > PAGE_SIZE. > > By this, khugepaged etc...can goes into inifinite reclaim loop > if tasks in memcg are busy. > > After this patch > - Hugepage allocation will fail if 1st trial of page reclaim fails. > - distinguish THP allocaton from Bached allocation. This does too many things at once. Can you split this into more patches where each one has a single objective? Thanks. Hannes ^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch 3/3] memcg: never OOM when charging huge pages 2011-01-27 10:34 ` Johannes Weiner @ 2011-01-27 13:47 ` Johannes Weiner 2011-01-27 23:44 ` KAMEZAWA Hiroyuki 2011-01-27 23:45 ` Daisuke Nishimura 0 siblings, 2 replies; 25+ messages in thread From: Johannes Weiner @ 2011-01-27 13:47 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, balbir, akpm Huge page coverage should obviously have less priority than the continued execution of a process. Never kill a process when charging it a huge page fails. Instead, give up after the first failed reclaim attempt and fall back to regular pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/memcontrol.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 17c4e36..2945649 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1890,6 +1890,13 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm, int csize = max(CHARGE_SIZE, (unsigned long) page_size); /* + * Do not OOM on huge pages. Fall back to regular pages after + * the first failed reclaim attempt. + */ + if (page_size > PAGE_SIZE) + oom = false; + + /* * Unlike gloval-vm's OOM-kill, we're not in memory shortage * in system level. So, allow to go ahead dying process in addition to * MEMDIE process. -- 1.7.3.5 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [patch 3/3] memcg: never OOM when charging huge pages 2011-01-27 13:47 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner @ 2011-01-27 23:44 ` KAMEZAWA Hiroyuki 2011-01-27 23:45 ` Daisuke Nishimura 1 sibling, 0 replies; 25+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-27 23:44 UTC (permalink / raw) To: Johannes Weiner; +Cc: linux-mm, linux-kernel, nishimura, balbir, akpm On Thu, 27 Jan 2011 14:47:03 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > Huge page coverage should obviously have less priority than the > continued execution of a process. > > Never kill a process when charging it a huge page fails. Instead, > give up after the first failed reclaim attempt and fall back to > regular pages. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Ah, I missed this. I'll merge this patch into my series. Thank you for pointing out. But what I have to point out here is khugepaged lock-up issue needs another fix for khugepaged itself. It should skip hopeless memcg. I think I can send patches, today. Thanks, -Kame > --- > mm/memcontrol.c | 7 +++++++ > 1 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 17c4e36..2945649 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1890,6 +1890,13 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm, > int csize = max(CHARGE_SIZE, (unsigned long) page_size); > > /* > + * Do not OOM on huge pages. Fall back to regular pages after > + * the first failed reclaim attempt. > + */ > + if (page_size > PAGE_SIZE) > + oom = false; > + > + /* > * Unlike gloval-vm's OOM-kill, we're not in memory shortage > * in system level. So, allow to go ahead dying process in addition to > * MEMDIE process. > -- > 1.7.3.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch 3/3] memcg: never OOM when charging huge pages 2011-01-27 13:47 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 2011-01-27 23:44 ` KAMEZAWA Hiroyuki @ 2011-01-27 23:45 ` Daisuke Nishimura 2011-01-27 23:49 ` KAMEZAWA Hiroyuki 1 sibling, 1 reply; 25+ messages in thread From: Daisuke Nishimura @ 2011-01-27 23:45 UTC (permalink / raw) To: Johannes Weiner Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, balbir, akpm, Daisuke Nishimura On Thu, 27 Jan 2011 14:47:03 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > Huge page coverage should obviously have less priority than the > continued execution of a process. > > Never kill a process when charging it a huge page fails. Instead, > give up after the first failed reclaim attempt and fall back to > regular pages. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > --- > mm/memcontrol.c | 7 +++++++ > 1 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 17c4e36..2945649 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1890,6 +1890,13 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm, > int csize = max(CHARGE_SIZE, (unsigned long) page_size); > > /* > + * Do not OOM on huge pages. Fall back to regular pages after > + * the first failed reclaim attempt. > + */ > + if (page_size > PAGE_SIZE) > + oom = false; > + > + /* > * Unlike gloval-vm's OOM-kill, we're not in memory shortage > * in system level. So, allow to go ahead dying process in addition to > * MEMDIE process. > -- > 1.7.3.5 > __mem_cgroup_try_charge() has "oom" switch already, so I prefer making callers use the switch properly by themselves. Thanks, Daisuke Nishimura. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch 3/3] memcg: never OOM when charging huge pages 2011-01-27 23:45 ` Daisuke Nishimura @ 2011-01-27 23:49 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 25+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-27 23:49 UTC (permalink / raw) To: Daisuke Nishimura; +Cc: Johannes Weiner, linux-mm, linux-kernel, balbir, akpm On Fri, 28 Jan 2011 08:45:40 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > On Thu, 27 Jan 2011 14:47:03 +0100 > Johannes Weiner <hannes@cmpxchg.org> wrote: > > > Huge page coverage should obviously have less priority than the > > continued execution of a process. > > > > Never kill a process when charging it a huge page fails. Instead, > > give up after the first failed reclaim attempt and fall back to > > regular pages. > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > --- > > mm/memcontrol.c | 7 +++++++ > > 1 files changed, 7 insertions(+), 0 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 17c4e36..2945649 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1890,6 +1890,13 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm, > > int csize = max(CHARGE_SIZE, (unsigned long) page_size); > > > > /* > > + * Do not OOM on huge pages. Fall back to regular pages after > > + * the first failed reclaim attempt. > > + */ > > + if (page_size > PAGE_SIZE) > > + oom = false; > > + > > + /* > > * Unlike gloval-vm's OOM-kill, we're not in memory shortage > > * in system level. So, allow to go ahead dying process in addition to > > * MEMDIE process. > > -- > > 1.7.3.5 > > > __mem_cgroup_try_charge() has "oom" switch already, so I prefer making callers > use the switch properly by themselves. > Okay, will do. -Kame ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2011-02-04 4:12 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-01-31 14:03 Johannes Weiner 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner 2011-01-31 22:27 ` Minchan Kim 2011-01-31 23:48 ` KAMEZAWA Hiroyuki 2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner 2011-01-31 22:41 ` Andrew Morton 2011-01-31 23:50 ` KAMEZAWA Hiroyuki 2011-02-01 0:04 ` Johannes Weiner 2011-02-01 0:24 ` Andrew Morton 2011-02-01 0:34 ` Johannes Weiner 2011-02-03 12:53 ` [patch 0/2] memcg: clean up limit checking Johannes Weiner 2011-02-03 12:54 ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner 2011-02-03 23:41 ` KAMEZAWA Hiroyuki 2011-02-04 4:10 ` Balbir Singh 2011-02-03 12:56 ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner 2011-02-03 23:44 ` KAMEZAWA Hiroyuki 2011-02-04 4:12 ` Balbir Singh 2011-01-31 22:42 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Minchan Kim 2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 2011-01-31 22:52 ` Minchan Kim 2011-01-31 23:51 ` KAMEZAWA Hiroyuki -- strict thread matches above, loose matches on Subject: below -- 2011-01-21 6:34 [PATCH 0/7] memcg : more fixes and clean up for 2.6.28-rc KAMEZAWA Hiroyuki 2011-01-21 6:44 ` [PATCH 4/7] memcg : fix charge function of THP allocation KAMEZAWA Hiroyuki 2011-01-27 10:34 ` Johannes Weiner 2011-01-27 13:47 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 2011-01-27 23:44 ` KAMEZAWA Hiroyuki 2011-01-27 23:45 ` Daisuke Nishimura 2011-01-27 23:49 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).