LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* (no subject) @ 2011-01-31 14:03 Johannes Weiner 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw) To: akpm Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel Hi, these three patches are small fixes for urgent issues with memory cgroups and transparent huge pages. They are inspired by 4/7 of KAMEZAWA-san's recent series 'memcg : more fixes and clean up for 2.6.28-rc' to make memory cgroups work with THP. 2/3 fixes a bug that is first uncovered through 1/3. Minchan suggested this order for review purposes, but they should probably be applied the other way round (2, then 1) for better bisectability. If everybody agrees, I would prefer these going into -mmotm now, and have further cleanups and optimizations based on them. Hannes ^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 1/3] memcg: prevent endless loop when charging huge pages 2011-01-31 14:03 Johannes Weiner @ 2011-01-31 14:03 ` Johannes Weiner 2011-01-31 22:27 ` Minchan Kim 2011-01-31 23:48 ` KAMEZAWA Hiroyuki 2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner 2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 2 siblings, 2 replies; 21+ messages in thread From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw) To: akpm Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel The charging code can encounter a charge size that is bigger than a regular page in two situations: one is a batched charge to fill the per-cpu stocks, the other is a huge page charge. This code is distributed over two functions, however, and only the outer one is aware of huge pages. In case the charging fails, the inner function will tell the outer function to retry if the charge size is bigger than regular pages--assuming batched charging is the only case. And the outer function will retry forever charging a huge page. This patch makes sure the inner function can distinguish between batch charging and a single huge page charge. It will only signal another attempt if batch charging failed, and go into regular reclaim when it is called on behalf of a huge page. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/memcontrol.c | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d572102..73ea323 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1837,8 +1837,15 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask, flags |= MEM_CGROUP_RECLAIM_NOSWAP; } else mem_over_limit = mem_cgroup_from_res_counter(fail_res, res); - - if (csize > PAGE_SIZE) /* change csize and retry */ + /* + * csize can be either a huge page (HPAGE_SIZE), a batch of + * regular pages (CHARGE_SIZE), or a single regular page + * (PAGE_SIZE). + * + * Never reclaim on behalf of optional batching, retry with a + * single page instead. + */ + if (csize == CHARGE_SIZE) return CHARGE_RETRY; if (!(gfp_mask & __GFP_WAIT)) -- 1.7.3.5 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [patch 1/3] memcg: prevent endless loop when charging huge pages 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner @ 2011-01-31 22:27 ` Minchan Kim 2011-01-31 23:48 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 21+ messages in thread From: Minchan Kim @ 2011-01-31 22:27 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, kamezawa.hiroyu, nishimura, balbir, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 11:03 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > The charging code can encounter a charge size that is bigger than a > regular page in two situations: one is a batched charge to fill the > per-cpu stocks, the other is a huge page charge. > > This code is distributed over two functions, however, and only the > outer one is aware of huge pages. In case the charging fails, the > inner function will tell the outer function to retry if the charge > size is bigger than regular pages--assuming batched charging is the > only case. And the outer function will retry forever charging a huge > page. > > This patch makes sure the inner function can distinguish between batch > charging and a single huge page charge. It will only signal another > attempt if batch charging failed, and go into regular reclaim when it > is called on behalf of a huge page. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/3] memcg: prevent endless loop when charging huge pages 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner 2011-01-31 22:27 ` Minchan Kim @ 2011-01-31 23:48 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 21+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-31 23:48 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, 31 Jan 2011 15:03:53 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > The charging code can encounter a charge size that is bigger than a > regular page in two situations: one is a batched charge to fill the > per-cpu stocks, the other is a huge page charge. > > This code is distributed over two functions, however, and only the > outer one is aware of huge pages. In case the charging fails, the > inner function will tell the outer function to retry if the charge > size is bigger than regular pages--assuming batched charging is the > only case. And the outer function will retry forever charging a huge > page. > > This patch makes sure the inner function can distinguish between batch > charging and a single huge page charge. It will only signal another > attempt if batch charging failed, and go into regular reclaim when it > is called on behalf of a huge page. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Thank you very much. Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > mm/memcontrol.c | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index d572102..73ea323 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1837,8 +1837,15 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask, > flags |= MEM_CGROUP_RECLAIM_NOSWAP; > } else > mem_over_limit = mem_cgroup_from_res_counter(fail_res, res); > - > - if (csize > PAGE_SIZE) /* change csize and retry */ > + /* > + * csize can be either a huge page (HPAGE_SIZE), a batch of > + * regular pages (CHARGE_SIZE), or a single regular page > + * (PAGE_SIZE). > + * > + * Never reclaim on behalf of optional batching, retry with a > + * single page instead. > + */ > + if (csize == CHARGE_SIZE) > return CHARGE_RETRY; > > if (!(gfp_mask & __GFP_WAIT)) > -- > 1.7.3.5 > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-01-31 14:03 Johannes Weiner 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner @ 2011-01-31 14:03 ` Johannes Weiner 2011-01-31 22:41 ` Andrew Morton 2011-01-31 22:42 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Minchan Kim 2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 2 siblings, 2 replies; 21+ messages in thread From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw) To: akpm Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel If reclaim after a failed charging was unsuccessful, the limits are checked again, just in case they settled by means of other tasks. This is all fine as long as every charge is of size PAGE_SIZE, because in that case, being below the limit means having at least PAGE_SIZE bytes available. But with transparent huge pages, we may end up in an endless loop where charging and reclaim fail, but we keep going because the limits are not yet exceeded, although not allowing for a huge page. Fix this up by explicitely checking for enough room, not just whether we are within limits. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/linux/res_counter.h | 12 ++++++++++++ mm/memcontrol.c | 27 ++++++++++++++++++++------- 2 files changed, 32 insertions(+), 7 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index fcb9884..5cfd78a 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -182,6 +182,18 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt) return ret; } +static inline bool res_counter_check_margin(struct res_counter *cnt, + unsigned long bytes) +{ + bool ret; + unsigned long flags; + + spin_lock_irqsave(&cnt->lock, flags); + ret = cnt->limit - cnt->usage >= bytes; + spin_unlock_irqrestore(&cnt->lock, flags); + return ret; +} + static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt) { bool ret; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 73ea323..c28072f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) return false; } +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) +{ + if (!res_counter_check_margin(&mem->res, bytes)) + return false; + if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes)) + return false; + return true; +} + static unsigned int get_swappiness(struct mem_cgroup *memcg) { struct cgroup *cgrp = memcg->css.cgroup; @@ -1852,15 +1861,19 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask, return CHARGE_WOULDBLOCK; ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL, - gfp_mask, flags); + gfp_mask, flags); + if (mem_cgroup_check_margin(mem_over_limit, csize)) + return CHARGE_RETRY; /* - * try_to_free_mem_cgroup_pages() might not give us a full - * picture of reclaim. Some pages are reclaimed and might be - * moved to swap cache or just unmapped from the cgroup. - * Check the limit again to see if the reclaim reduced the - * current usage of the cgroup before giving up + * Even though the limit is exceeded at this point, reclaim + * may have been able to free some pages. Retry the charge + * before killing the task. + * + * Only for regular pages, though: huge pages are rather + * unlikely to succeed so close to the limit, and we fall back + * to regular pages anyway in case of failure. */ - if (ret || mem_cgroup_check_under_limit(mem_over_limit)) + if (csize == PAGE_SIZE && ret) return CHARGE_RETRY; /* -- 1.7.3.5 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner @ 2011-01-31 22:41 ` Andrew Morton 2011-01-31 23:50 ` KAMEZAWA Hiroyuki 2011-02-01 0:04 ` Johannes Weiner 2011-01-31 22:42 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Minchan Kim 1 sibling, 2 replies; 21+ messages in thread From: Andrew Morton @ 2011-01-31 22:41 UTC (permalink / raw) To: Johannes Weiner Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, 31 Jan 2011 15:03:54 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > +static inline bool res_counter_check_margin(struct res_counter *cnt, > + unsigned long bytes) > +{ > + bool ret; > + unsigned long flags; > + > + spin_lock_irqsave(&cnt->lock, flags); > + ret = cnt->limit - cnt->usage >= bytes; > + spin_unlock_irqrestore(&cnt->lock, flags); > + return ret; > +} > + > static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt) > { > bool ret; > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 73ea323..c28072f 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) > return false; > } > > +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) > +{ > + if (!res_counter_check_margin(&mem->res, bytes)) > + return false; > + if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes)) > + return false; > + return true; > +} argh. If you ever have a function with the string "check" in its name, it's a good sign that you did something wrong. Check what? Against what? Returning what? mem_cgroup_check_under_limit() isn't toooo bad - the name tells you what's being checked and tells you what to expect the return value to mean. But "res_counter_check_margin" and "mem_cgroup_check_margin" are just awful. Something like bool res_counter_may_charge(counter, bytes) would be much clearer. If we really want to stick with the "check" names (perhaps as an ironic reference to res_counter's past mistakes) then please at least document the sorry things? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-01-31 22:41 ` Andrew Morton @ 2011-01-31 23:50 ` KAMEZAWA Hiroyuki 2011-02-01 0:04 ` Johannes Weiner 1 sibling, 0 replies; 21+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-31 23:50 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, 31 Jan 2011 14:41:31 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Mon, 31 Jan 2011 15:03:54 +0100 > Johannes Weiner <hannes@cmpxchg.org> wrote: > > > +static inline bool res_counter_check_margin(struct res_counter *cnt, > > + unsigned long bytes) > > +{ > > + bool ret; > > + unsigned long flags; > > + > > + spin_lock_irqsave(&cnt->lock, flags); > > + ret = cnt->limit - cnt->usage >= bytes; > > + spin_unlock_irqrestore(&cnt->lock, flags); > > + return ret; > > +} > > + > > static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt) > > { > > bool ret; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 73ea323..c28072f 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) > > return false; > > } > > > > +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) > > +{ > > + if (!res_counter_check_margin(&mem->res, bytes)) > > + return false; > > + if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes)) > > + return false; > > + return true; > > +} > > argh. > > If you ever have a function with the string "check" in its name, it's a > good sign that you did something wrong. > > Check what? Against what? Returning what? > > mem_cgroup_check_under_limit() isn't toooo bad - the name tells you > what's being checked and tells you what to expect the return value to > mean. > > But "res_counter_check_margin" and "mem_cgroup_check_margin" are just > awful. Something like > > bool res_counter_may_charge(counter, bytes) > > would be much clearer. > > If we really want to stick with the "check" names (perhaps as an ironic > reference to res_counter's past mistakes) then please at least document > the sorry things? > Ah, I ack the concept of patch. Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Johannes, could you change name ? I'm sorry. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-01-31 22:41 ` Andrew Morton 2011-01-31 23:50 ` KAMEZAWA Hiroyuki @ 2011-02-01 0:04 ` Johannes Weiner 2011-02-01 0:24 ` Andrew Morton 1 sibling, 1 reply; 21+ messages in thread From: Johannes Weiner @ 2011-02-01 0:04 UTC (permalink / raw) To: Andrew Morton Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 02:41:31PM -0800, Andrew Morton wrote: > On Mon, 31 Jan 2011 15:03:54 +0100 > Johannes Weiner <hannes@cmpxchg.org> wrote: > > @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) > > return false; > > } > > > > +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) > > +{ > > + if (!res_counter_check_margin(&mem->res, bytes)) > > + return false; > > + if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes)) > > + return false; > > + return true; > > +} > > argh. > > If you ever have a function with the string "check" in its name, it's a > good sign that you did something wrong. > > Check what? Against what? Returning what? > > mem_cgroup_check_under_limit() isn't toooo bad - the name tells you > what's being checked and tells you what to expect the return value to > mean. > > But "res_counter_check_margin" and "mem_cgroup_check_margin" are just > awful. Something like > > bool res_counter_may_charge(counter, bytes) > > would be much clearer. That makes sense for the hard limit. But the oh-so-generic resource counters also have a soft limit, and you don't ask for that when you want to charge. Right now, I do not feel creative enough to come up with a symmetric-sounding counterpart. > If we really want to stick with the "check" names (perhaps as an ironic > reference to res_counter's past mistakes) then please at least document > the sorry things? I cowardly went with this option and have a patch below to fold into this fix. Maybe it would be better to use res_counter_margin(cnt) >= wanted throughout the code. Or still better, make memcg work on pages and res_counters on unsigned longs so the locking is no longer needed, together with an API for most obvious maths. I will work something out and submit it separately. --- Subject: [patch fixup] res_counter: document res_counter_check_margin() Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index 5cfd78a..a5930cb 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -182,6 +182,14 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt) return ret; } +/** + * res_counter_check_margin - check if the counter allows charging + * @cnt: the resource counter to check + * @bytes: the number of bytes to check the remaining space against + * + * Returns a boolean value on whether the counter can be charged + * @bytes or whether this would exceed the limit. + */ static inline bool res_counter_check_margin(struct res_counter *cnt, unsigned long bytes) { ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-02-01 0:04 ` Johannes Weiner @ 2011-02-01 0:24 ` Andrew Morton 2011-02-01 0:34 ` Johannes Weiner 2011-02-03 12:53 ` [patch 0/2] memcg: clean up limit checking Johannes Weiner 0 siblings, 2 replies; 21+ messages in thread From: Andrew Morton @ 2011-02-01 0:24 UTC (permalink / raw) To: Johannes Weiner Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Tue, 1 Feb 2011 01:04:55 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > Maybe it would be better to use res_counter_margin(cnt) >= wanted > throughout the code. yup. > --- a/include/linux/res_counter.h > +++ b/include/linux/res_counter.h > @@ -182,6 +182,14 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt) > return ret; > } > > +/** > + * res_counter_check_margin - check if the counter allows charging > + * @cnt: the resource counter to check > + * @bytes: the number of bytes to check the remaining space against > + * > + * Returns a boolean value on whether the counter can be charged > + * @bytes or whether this would exceed the limit. > + */ > static inline bool res_counter_check_margin(struct res_counter *cnt, > unsigned long bytes) > { mem_cgroup_check_margin() needs some lipstick too. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-02-01 0:24 ` Andrew Morton @ 2011-02-01 0:34 ` Johannes Weiner 2011-02-03 12:53 ` [patch 0/2] memcg: clean up limit checking Johannes Weiner 1 sibling, 0 replies; 21+ messages in thread From: Johannes Weiner @ 2011-02-01 0:34 UTC (permalink / raw) To: Andrew Morton Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 04:24:48PM -0800, Andrew Morton wrote: > On Tue, 1 Feb 2011 01:04:55 +0100 > Johannes Weiner <hannes@cmpxchg.org> wrote: > > @@ -182,6 +182,14 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt) > > return ret; > > } > > > > +/** > > + * res_counter_check_margin - check if the counter allows charging > > + * @cnt: the resource counter to check > > + * @bytes: the number of bytes to check the remaining space against > > + * > > + * Returns a boolean value on whether the counter can be charged > > + * @bytes or whether this would exceed the limit. > > + */ > > static inline bool res_counter_check_margin(struct res_counter *cnt, > > unsigned long bytes) > > { > > mem_cgroup_check_margin() needs some lipstick too. *oink* Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9e5de7c..6c07554 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1111,6 +1111,14 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) return false; } +/** + * mem_cgroup_check_margin - check if the memory cgroup allows charging + * @mem: memory cgroup to check + * @bytes: the number of bytes the caller intends to charge + * + * Returns a boolean value on whether @mem can be charged @bytes or + * whether this would exceed the limit. + */ static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) { if (!res_counter_check_margin(&mem->res, bytes)) ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [patch 0/2] memcg: clean up limit checking 2011-02-01 0:24 ` Andrew Morton 2011-02-01 0:34 ` Johannes Weiner @ 2011-02-03 12:53 ` Johannes Weiner 2011-02-03 12:54 ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner 2011-02-03 12:56 ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner 1 sibling, 2 replies; 21+ messages in thread From: Johannes Weiner @ 2011-02-03 12:53 UTC (permalink / raw) To: Andrew Morton Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 04:24:48PM -0800, Andrew Morton wrote: > On Tue, 1 Feb 2011 01:04:55 +0100 > Johannes Weiner <hannes@cmpxchg.org> wrote: > > > Maybe it would be better to use res_counter_margin(cnt) >= wanted > > throughout the code. > > yup. Okay, I cleaned it all up a bit for .39. While doing so, I also found that we are reclaiming one page too much when pushing back on behalf of soft limits. So 1/2 fixes the soft limit reclaim off-by-one-page, and 2/2 reduces all the limit checks to two functions. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 1/2] memcg: soft limit reclaim should end at limit not below 2011-02-03 12:53 ` [patch 0/2] memcg: clean up limit checking Johannes Weiner @ 2011-02-03 12:54 ` Johannes Weiner 2011-02-03 23:41 ` KAMEZAWA Hiroyuki 2011-02-04 4:10 ` Balbir Singh 2011-02-03 12:56 ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner 1 sibling, 2 replies; 21+ messages in thread From: Johannes Weiner @ 2011-02-03 12:54 UTC (permalink / raw) To: Andrew Morton Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel Soft limit reclaim continues until the usage is below the current soft limit, but the documented semantics are actually that soft limit reclaim will push usage back until the soft limits are met again. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/linux/res_counter.h | 4 ++-- mm/memcontrol.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index a5930cb..bf1f01b 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -139,7 +139,7 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt) static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt) { - if (cnt->usage < cnt->soft_limit) + if (cnt->usage <= cnt->soft_limit) return true; return false; @@ -202,7 +202,7 @@ static inline bool res_counter_check_margin(struct res_counter *cnt, return ret; } -static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt) +static inline bool res_counter_check_within_soft_limit(struct res_counter *cnt) { bool ret; unsigned long flags; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 350bfd2..23b14188 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1451,7 +1451,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, return ret; total += ret; if (check_soft) { - if (res_counter_check_under_soft_limit(&root_mem->res)) + if (res_counter_check_within_soft_limit(&root_mem->res)) return total; } else if (mem_cgroup_check_under_limit(root_mem)) return 1 + total; -- 1.7.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [patch 1/2] memcg: soft limit reclaim should end at limit not below 2011-02-03 12:54 ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner @ 2011-02-03 23:41 ` KAMEZAWA Hiroyuki 2011-02-04 4:10 ` Balbir Singh 1 sibling, 0 replies; 21+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-02-03 23:41 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Thu, 3 Feb 2011 13:54:54 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > Soft limit reclaim continues until the usage is below the current soft > limit, but the documented semantics are actually that soft limit > reclaim will push usage back until the soft limits are met again. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 1/2] memcg: soft limit reclaim should end at limit not below 2011-02-03 12:54 ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner 2011-02-03 23:41 ` KAMEZAWA Hiroyuki @ 2011-02-04 4:10 ` Balbir Singh 1 sibling, 0 replies; 21+ messages in thread From: Balbir Singh @ 2011-02-04 4:10 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, kamezawa.hiroyu, nishimura, minchan.kim, linux-mm, linux-kernel On Thu, Feb 3, 2011 at 6:24 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > Soft limit reclaim continues until the usage is below the current soft > limit, but the documented semantics are actually that soft limit > reclaim will push usage back until the soft limits are met again. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 2/2] memcg: simplify the way memory limits are checked 2011-02-03 12:53 ` [patch 0/2] memcg: clean up limit checking Johannes Weiner 2011-02-03 12:54 ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner @ 2011-02-03 12:56 ` Johannes Weiner 2011-02-03 23:44 ` KAMEZAWA Hiroyuki 2011-02-04 4:12 ` Balbir Singh 1 sibling, 2 replies; 21+ messages in thread From: Johannes Weiner @ 2011-02-03 12:56 UTC (permalink / raw) To: Andrew Morton Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel Since transparent huge pages, checking whether memory cgroups are below their limits is no longer enough, but the actual amount of chargeable space is important. To not have more than one limit-checking interface, replace memory_cgroup_check_under_limit() and memory_cgroup_check_margin() with a single memory_cgroup_margin() that returns the chargeable space and leaves the comparison to the callsite. Soft limits are now checked the other way round, by using the already existing function that returns the amount by which soft limits are exceeded: res_counter_soft_limit_excess(). Also remove all the corresponding functions on the res_counter side that are now no longer used. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/linux/res_counter.h | 72 ++++++++---------------------------------- mm/memcontrol.c | 49 ++++++++++------------------- 2 files changed, 31 insertions(+), 90 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index bf1f01b..c9d625c 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -129,20 +129,22 @@ int __must_check res_counter_charge(struct res_counter *counter, void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); void res_counter_uncharge(struct res_counter *counter, unsigned long val); -static inline bool res_counter_limit_check_locked(struct res_counter *cnt) -{ - if (cnt->usage < cnt->limit) - return true; - - return false; -} - -static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt) +/** + * res_counter_margin - calculate chargeable space of a counter + * @cnt: the counter + * + * Returns the difference between the hard limit and the current usage + * of resource counter @cnt. + */ +static inline unsigned long long res_counter_margin(struct res_counter *cnt) { - if (cnt->usage <= cnt->soft_limit) - return true; + unsigned long long margin; + unsigned long flags; - return false; + spin_lock_irqsave(&cnt->lock, flags); + margin = cnt->limit - cnt->usage; + spin_unlock_irqrestore(&cnt->lock, flags); + return margin; } /** @@ -167,52 +169,6 @@ res_counter_soft_limit_excess(struct res_counter *cnt) return excess; } -/* - * Helper function to detect if the cgroup is within it's limit or - * not. It's currently called from cgroup_rss_prepare() - */ -static inline bool res_counter_check_under_limit(struct res_counter *cnt) -{ - bool ret; - unsigned long flags; - - spin_lock_irqsave(&cnt->lock, flags); - ret = res_counter_limit_check_locked(cnt); - spin_unlock_irqrestore(&cnt->lock, flags); - return ret; -} - -/** - * res_counter_check_margin - check if the counter allows charging - * @cnt: the resource counter to check - * @bytes: the number of bytes to check the remaining space against - * - * Returns a boolean value on whether the counter can be charged - * @bytes or whether this would exceed the limit. - */ -static inline bool res_counter_check_margin(struct res_counter *cnt, - unsigned long bytes) -{ - bool ret; - unsigned long flags; - - spin_lock_irqsave(&cnt->lock, flags); - ret = cnt->limit - cnt->usage >= bytes; - spin_unlock_irqrestore(&cnt->lock, flags); - return ret; -} - -static inline bool res_counter_check_within_soft_limit(struct res_counter *cnt) -{ - bool ret; - unsigned long flags; - - spin_lock_irqsave(&cnt->lock, flags); - ret = res_counter_soft_limit_check_locked(cnt); - spin_unlock_irqrestore(&cnt->lock, flags); - return ret; -} - static inline void res_counter_reset_max(struct res_counter *cnt) { unsigned long flags; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 23b14188..e1ab9c3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -504,11 +504,6 @@ static void mem_cgroup_remove_from_trees(struct mem_cgroup *mem) } } -static inline unsigned long mem_cgroup_get_excess(struct mem_cgroup *mem) -{ - return res_counter_soft_limit_excess(&mem->res) >> PAGE_SHIFT; -} - static struct mem_cgroup_per_zone * __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz) { @@ -1101,33 +1096,21 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan, #define mem_cgroup_from_res_counter(counter, member) \ container_of(counter, struct mem_cgroup, member) -static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem) -{ - if (do_swap_account) { - if (res_counter_check_under_limit(&mem->res) && - res_counter_check_under_limit(&mem->memsw)) - return true; - } else - if (res_counter_check_under_limit(&mem->res)) - return true; - return false; -} - /** - * mem_cgroup_check_margin - check if the memory cgroup allows charging - * @mem: memory cgroup to check - * @bytes: the number of bytes the caller intends to charge + * mem_cgroup_margin - calculate chargeable space of a memory cgroup + * @mem: the memory cgroup * - * Returns a boolean value on whether @mem can be charged @bytes or - * whether this would exceed the limit. + * Returns the maximum amount of memory @mem can be charged with, in + * bytes. */ -static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes) +static unsigned long long mem_cgroup_margin(struct mem_cgroup *mem) { - if (!res_counter_check_margin(&mem->res, bytes)) - return false; - if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes)) - return false; - return true; + unsigned long long margin; + + margin = res_counter_margin(&mem->res); + if (do_swap_account) + margin = min(margin, res_counter_margin(&mem->memsw)); + return margin; } static unsigned int get_swappiness(struct mem_cgroup *memcg) @@ -1394,7 +1377,9 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, bool noswap = reclaim_options & MEM_CGROUP_RECLAIM_NOSWAP; bool shrink = reclaim_options & MEM_CGROUP_RECLAIM_SHRINK; bool check_soft = reclaim_options & MEM_CGROUP_RECLAIM_SOFT; - unsigned long excess = mem_cgroup_get_excess(root_mem); + unsigned long excess; + + excess = res_counter_soft_limit_excess(&root_mem->res) >> PAGE_SHIFT; /* If memsw_is_minimum==1, swap-out is of-no-use. */ if (root_mem->memsw_is_minimum) @@ -1451,9 +1436,9 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, return ret; total += ret; if (check_soft) { - if (res_counter_check_within_soft_limit(&root_mem->res)) + if (!res_counter_soft_limit_excess(&root_mem->res)) return total; - } else if (mem_cgroup_check_under_limit(root_mem)) + } else if (mem_cgroup_margin(root_mem)) return 1 + total; } return total; @@ -1872,7 +1857,7 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask, ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL, gfp_mask, flags); - if (mem_cgroup_check_margin(mem_over_limit, csize)) + if (mem_cgroup_margin(mem_over_limit) >= csize) return CHARGE_RETRY; /* * Even though the limit is exceeded at this point, reclaim -- 1.7.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [patch 2/2] memcg: simplify the way memory limits are checked 2011-02-03 12:56 ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner @ 2011-02-03 23:44 ` KAMEZAWA Hiroyuki 2011-02-04 4:12 ` Balbir Singh 1 sibling, 0 replies; 21+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-02-03 23:44 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Thu, 3 Feb 2011 13:56:11 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > Since transparent huge pages, checking whether memory cgroups are > below their limits is no longer enough, but the actual amount of > chargeable space is important. > > To not have more than one limit-checking interface, replace > memory_cgroup_check_under_limit() and memory_cgroup_check_margin() > with a single memory_cgroup_margin() that returns the chargeable space > and leaves the comparison to the callsite. > > Soft limits are now checked the other way round, by using the already > existing function that returns the amount by which soft limits are > exceeded: res_counter_soft_limit_excess(). > > Also remove all the corresponding functions on the res_counter side > that are now no longer used. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/2] memcg: simplify the way memory limits are checked 2011-02-03 12:56 ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner 2011-02-03 23:44 ` KAMEZAWA Hiroyuki @ 2011-02-04 4:12 ` Balbir Singh 1 sibling, 0 replies; 21+ messages in thread From: Balbir Singh @ 2011-02-04 4:12 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, kamezawa.hiroyu, nishimura, minchan.kim, linux-mm, linux-kernel On Thu, Feb 3, 2011 at 6:26 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > Since transparent huge pages, checking whether memory cgroups are > below their limits is no longer enough, but the actual amount of > chargeable space is important. > > To not have more than one limit-checking interface, replace > memory_cgroup_check_under_limit() and memory_cgroup_check_margin() > with a single memory_cgroup_margin() that returns the chargeable space > and leaves the comparison to the callsite. > > Soft limits are now checked the other way round, by using the already > existing function that returns the amount by which soft limits are > exceeded: res_counter_soft_limit_excess(). > > Also remove all the corresponding functions on the res_counter side > that are now no longer used. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group 2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner 2011-01-31 22:41 ` Andrew Morton @ 2011-01-31 22:42 ` Minchan Kim 1 sibling, 0 replies; 21+ messages in thread From: Minchan Kim @ 2011-01-31 22:42 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, kamezawa.hiroyu, nishimura, balbir, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 11:03 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > If reclaim after a failed charging was unsuccessful, the limits are > checked again, just in case they settled by means of other tasks. > > This is all fine as long as every charge is of size PAGE_SIZE, because > in that case, being below the limit means having at least PAGE_SIZE > bytes available. > > But with transparent huge pages, we may end up in an endless loop > where charging and reclaim fail, but we keep going because the limits > are not yet exceeded, although not allowing for a huge page. > > Fix this up by explicitely checking for enough room, not just whether > we are within limits. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 3/3] memcg: never OOM when charging huge pages 2011-01-31 14:03 Johannes Weiner 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner 2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner @ 2011-01-31 14:03 ` Johannes Weiner 2011-01-31 22:52 ` Minchan Kim 2011-01-31 23:51 ` KAMEZAWA Hiroyuki 2 siblings, 2 replies; 21+ messages in thread From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw) To: akpm Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel Huge page coverage should obviously have less priority than the continued execution of a process. Never kill a process when charging it a huge page fails. Instead, give up after the first failed reclaim attempt and fall back to regular pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/memcontrol.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c28072f..9e5de7c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2343,13 +2343,19 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm, gfp_t gfp_mask, enum charge_type ctype) { struct mem_cgroup *mem = NULL; + int page_size = PAGE_SIZE; struct page_cgroup *pc; + bool oom = true; int ret; - int page_size = PAGE_SIZE; if (PageTransHuge(page)) { page_size <<= compound_order(page); VM_BUG_ON(!PageTransHuge(page)); + /* + * Never OOM-kill a process for a huge page. The + * fault handler will fall back to regular pages. + */ + oom = false; } pc = lookup_page_cgroup(page); @@ -2358,7 +2364,7 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm, return 0; prefetchw(pc); - ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, true, page_size); + ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, oom, page_size); if (ret || !mem) return ret; -- 1.7.3.5 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [patch 3/3] memcg: never OOM when charging huge pages 2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner @ 2011-01-31 22:52 ` Minchan Kim 2011-01-31 23:51 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 21+ messages in thread From: Minchan Kim @ 2011-01-31 22:52 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, kamezawa.hiroyu, nishimura, balbir, linux-mm, linux-kernel On Mon, Jan 31, 2011 at 11:03 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > Huge page coverage should obviously have less priority than the > continued execution of a process. > > Never kill a process when charging it a huge page fails. Instead, > give up after the first failed reclaim attempt and fall back to > regular pages. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 3/3] memcg: never OOM when charging huge pages 2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 2011-01-31 22:52 ` Minchan Kim @ 2011-01-31 23:51 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 21+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-31 23:51 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, nishimura, balbir, minchan.kim, linux-mm, linux-kernel On Mon, 31 Jan 2011 15:03:55 +0100 Johannes Weiner <hannes@cmpxchg.org> wrote: > Huge page coverage should obviously have less priority than the > continued execution of a process. > > Never kill a process when charging it a huge page fails. Instead, > give up after the first failed reclaim attempt and fall back to > regular pages. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu,com> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2011-02-04 4:12 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-01-31 14:03 Johannes Weiner 2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner 2011-01-31 22:27 ` Minchan Kim 2011-01-31 23:48 ` KAMEZAWA Hiroyuki 2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner 2011-01-31 22:41 ` Andrew Morton 2011-01-31 23:50 ` KAMEZAWA Hiroyuki 2011-02-01 0:04 ` Johannes Weiner 2011-02-01 0:24 ` Andrew Morton 2011-02-01 0:34 ` Johannes Weiner 2011-02-03 12:53 ` [patch 0/2] memcg: clean up limit checking Johannes Weiner 2011-02-03 12:54 ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner 2011-02-03 23:41 ` KAMEZAWA Hiroyuki 2011-02-04 4:10 ` Balbir Singh 2011-02-03 12:56 ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner 2011-02-03 23:44 ` KAMEZAWA Hiroyuki 2011-02-04 4:12 ` Balbir Singh 2011-01-31 22:42 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Minchan Kim 2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner 2011-01-31 22:52 ` Minchan Kim 2011-01-31 23:51 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).