LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* (no subject)
@ 2011-01-31 14:03 Johannes Weiner
  2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw)
  To: akpm
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

Hi,

these three patches are small fixes for urgent issues with memory
cgroups and transparent huge pages.  They are inspired by 4/7 of
KAMEZAWA-san's recent series 'memcg : more fixes and clean up for
2.6.28-rc' to make memory cgroups work with THP.

2/3 fixes a bug that is first uncovered through 1/3.  Minchan
suggested this order for review purposes, but they should probably be
applied the other way round (2, then 1) for better bisectability.

If everybody agrees, I would prefer these going into -mmotm now, and
have further cleanups and optimizations based on them.

	Hannes


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 1/3] memcg: prevent endless loop when charging huge pages
  2011-01-31 14:03 Johannes Weiner
@ 2011-01-31 14:03 ` Johannes Weiner
  2011-01-31 22:27   ` Minchan Kim
  2011-01-31 23:48   ` KAMEZAWA Hiroyuki
  2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner
  2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner
  2 siblings, 2 replies; 25+ messages in thread
From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw)
  To: akpm
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

The charging code can encounter a charge size that is bigger than a
regular page in two situations: one is a batched charge to fill the
per-cpu stocks, the other is a huge page charge.

This code is distributed over two functions, however, and only the
outer one is aware of huge pages.  In case the charging fails, the
inner function will tell the outer function to retry if the charge
size is bigger than regular pages--assuming batched charging is the
only case.  And the outer function will retry forever charging a huge
page.

This patch makes sure the inner function can distinguish between batch
charging and a single huge page charge.  It will only signal another
attempt if batch charging failed, and go into regular reclaim when it
is called on behalf of a huge page.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d572102..73ea323 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1837,8 +1837,15 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 		flags |= MEM_CGROUP_RECLAIM_NOSWAP;
 	} else
 		mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
-
-	if (csize > PAGE_SIZE) /* change csize and retry */
+	/*
+	 * csize can be either a huge page (HPAGE_SIZE), a batch of
+	 * regular pages (CHARGE_SIZE), or a single regular page
+	 * (PAGE_SIZE).
+	 *
+	 * Never reclaim on behalf of optional batching, retry with a
+	 * single page instead.
+	 */
+	if (csize == CHARGE_SIZE)
 		return CHARGE_RETRY;
 
 	if (!(gfp_mask & __GFP_WAIT))
-- 
1.7.3.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group
  2011-01-31 14:03 Johannes Weiner
  2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner
@ 2011-01-31 14:03 ` Johannes Weiner
  2011-01-31 22:41   ` Andrew Morton
  2011-01-31 22:42   ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Minchan Kim
  2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner
  2 siblings, 2 replies; 25+ messages in thread
From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw)
  To: akpm
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

If reclaim after a failed charging was unsuccessful, the limits are
checked again, just in case they settled by means of other tasks.

This is all fine as long as every charge is of size PAGE_SIZE, because
in that case, being below the limit means having at least PAGE_SIZE
bytes available.

But with transparent huge pages, we may end up in an endless loop
where charging and reclaim fail, but we keep going because the limits
are not yet exceeded, although not allowing for a huge page.

Fix this up by explicitely checking for enough room, not just whether
we are within limits.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/res_counter.h |   12 ++++++++++++
 mm/memcontrol.c             |   27 ++++++++++++++++++++-------
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index fcb9884..5cfd78a 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -182,6 +182,18 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
 	return ret;
 }
 
+static inline bool res_counter_check_margin(struct res_counter *cnt,
+					    unsigned long bytes)
+{
+	bool ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	ret = cnt->limit - cnt->usage >= bytes;
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
 static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt)
 {
 	bool ret;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 73ea323..c28072f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem)
 	return false;
 }
 
+static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes)
+{
+	if (!res_counter_check_margin(&mem->res, bytes))
+		return false;
+	if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes))
+		return false;
+	return true;
+}
+
 static unsigned int get_swappiness(struct mem_cgroup *memcg)
 {
 	struct cgroup *cgrp = memcg->css.cgroup;
@@ -1852,15 +1861,19 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 		return CHARGE_WOULDBLOCK;
 
 	ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL,
-					gfp_mask, flags);
+					      gfp_mask, flags);
+	if (mem_cgroup_check_margin(mem_over_limit, csize))
+		return CHARGE_RETRY;
 	/*
-	 * try_to_free_mem_cgroup_pages() might not give us a full
-	 * picture of reclaim. Some pages are reclaimed and might be
-	 * moved to swap cache or just unmapped from the cgroup.
-	 * Check the limit again to see if the reclaim reduced the
-	 * current usage of the cgroup before giving up
+	 * Even though the limit is exceeded at this point, reclaim
+	 * may have been able to free some pages.  Retry the charge
+	 * before killing the task.
+	 *
+	 * Only for regular pages, though: huge pages are rather
+	 * unlikely to succeed so close to the limit, and we fall back
+	 * to regular pages anyway in case of failure.
 	 */
-	if (ret || mem_cgroup_check_under_limit(mem_over_limit))
+	if (csize == PAGE_SIZE && ret)
 		return CHARGE_RETRY;
 
 	/*
-- 
1.7.3.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 3/3] memcg: never OOM when charging huge pages
  2011-01-31 14:03 Johannes Weiner
  2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner
  2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner
@ 2011-01-31 14:03 ` Johannes Weiner
  2011-01-31 22:52   ` Minchan Kim
  2011-01-31 23:51   ` KAMEZAWA Hiroyuki
  2 siblings, 2 replies; 25+ messages in thread
From: Johannes Weiner @ 2011-01-31 14:03 UTC (permalink / raw)
  To: akpm
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

Huge page coverage should obviously have less priority than the
continued execution of a process.

Never kill a process when charging it a huge page fails.  Instead,
give up after the first failed reclaim attempt and fall back to
regular pages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c28072f..9e5de7c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2343,13 +2343,19 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
 				gfp_t gfp_mask, enum charge_type ctype)
 {
 	struct mem_cgroup *mem = NULL;
+	int page_size = PAGE_SIZE;
 	struct page_cgroup *pc;
+	bool oom = true;
 	int ret;
-	int page_size = PAGE_SIZE;
 
 	if (PageTransHuge(page)) {
 		page_size <<= compound_order(page);
 		VM_BUG_ON(!PageTransHuge(page));
+		/*
+		 * Never OOM-kill a process for a huge page.  The
+		 * fault handler will fall back to regular pages.
+		 */
+		oom = false;
 	}
 
 	pc = lookup_page_cgroup(page);
@@ -2358,7 +2364,7 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
 		return 0;
 	prefetchw(pc);
 
-	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, true, page_size);
+	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, oom, page_size);
 	if (ret || !mem)
 		return ret;
 
-- 
1.7.3.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/3] memcg: prevent endless loop when charging huge pages
  2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner
@ 2011-01-31 22:27   ` Minchan Kim
  2011-01-31 23:48   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 25+ messages in thread
From: Minchan Kim @ 2011-01-31 22:27 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, kamezawa.hiroyu, nishimura, balbir, linux-mm, linux-kernel

On Mon, Jan 31, 2011 at 11:03 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> The charging code can encounter a charge size that is bigger than a
> regular page in two situations: one is a batched charge to fill the
> per-cpu stocks, the other is a huge page charge.
>
> This code is distributed over two functions, however, and only the
> outer one is aware of huge pages.  In case the charging fails, the
> inner function will tell the outer function to retry if the charge
> size is bigger than regular pages--assuming batched charging is the
> only case.  And the outer function will retry forever charging a huge
> page.
>
> This patch makes sure the inner function can distinguish between batch
> charging and a single huge page charge.  It will only signal another
> attempt if batch charging failed, and go into regular reclaim when it
> is called on behalf of a huge page.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group
  2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner
@ 2011-01-31 22:41   ` Andrew Morton
  2011-01-31 23:50     ` KAMEZAWA Hiroyuki
  2011-02-01  0:04     ` Johannes Weiner
  2011-01-31 22:42   ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Minchan Kim
  1 sibling, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2011-01-31 22:41 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

On Mon, 31 Jan 2011 15:03:54 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> +static inline bool res_counter_check_margin(struct res_counter *cnt,
> +					    unsigned long bytes)
> +{
> +	bool ret;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cnt->lock, flags);
> +	ret = cnt->limit - cnt->usage >= bytes;
> +	spin_unlock_irqrestore(&cnt->lock, flags);
> +	return ret;
> +}
> +
>  static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt)
>  {
>  	bool ret;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 73ea323..c28072f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem)
>  	return false;
>  }
>  
> +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes)
> +{
> +	if (!res_counter_check_margin(&mem->res, bytes))
> +		return false;
> +	if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes))
> +		return false;
> +	return true;
> +}

argh.

If you ever have a function with the string "check" in its name, it's a
good sign that you did something wrong.

Check what?  Against what?  Returning what?

mem_cgroup_check_under_limit() isn't toooo bad - the name tells you
what's being checked and tells you what to expect the return value to
mean.

But "res_counter_check_margin" and "mem_cgroup_check_margin" are just
awful.  Something like

	bool res_counter_may_charge(counter, bytes)

would be much clearer.

If we really want to stick with the "check" names (perhaps as an ironic
reference to res_counter's past mistakes) then please at least document
the sorry things?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group
  2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner
  2011-01-31 22:41   ` Andrew Morton
@ 2011-01-31 22:42   ` Minchan Kim
  1 sibling, 0 replies; 25+ messages in thread
From: Minchan Kim @ 2011-01-31 22:42 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, kamezawa.hiroyu, nishimura, balbir, linux-mm, linux-kernel

On Mon, Jan 31, 2011 at 11:03 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> If reclaim after a failed charging was unsuccessful, the limits are
> checked again, just in case they settled by means of other tasks.
>
> This is all fine as long as every charge is of size PAGE_SIZE, because
> in that case, being below the limit means having at least PAGE_SIZE
> bytes available.
>
> But with transparent huge pages, we may end up in an endless loop
> where charging and reclaim fail, but we keep going because the limits
> are not yet exceeded, although not allowing for a huge page.
>
> Fix this up by explicitely checking for enough room, not just whether
> we are within limits.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 3/3] memcg: never OOM when charging huge pages
  2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner
@ 2011-01-31 22:52   ` Minchan Kim
  2011-01-31 23:51   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 25+ messages in thread
From: Minchan Kim @ 2011-01-31 22:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, kamezawa.hiroyu, nishimura, balbir, linux-mm, linux-kernel

On Mon, Jan 31, 2011 at 11:03 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> Huge page coverage should obviously have less priority than the
> continued execution of a process.
>
> Never kill a process when charging it a huge page fails.  Instead,
> give up after the first failed reclaim attempt and fall back to
> regular pages.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/3] memcg: prevent endless loop when charging huge pages
  2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner
  2011-01-31 22:27   ` Minchan Kim
@ 2011-01-31 23:48   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-31 23:48 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

On Mon, 31 Jan 2011 15:03:53 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> The charging code can encounter a charge size that is bigger than a
> regular page in two situations: one is a batched charge to fill the
> per-cpu stocks, the other is a huge page charge.
> 
> This code is distributed over two functions, however, and only the
> outer one is aware of huge pages.  In case the charging fails, the
> inner function will tell the outer function to retry if the charge
> size is bigger than regular pages--assuming batched charging is the
> only case.  And the outer function will retry forever charging a huge
> page.
> 
> This patch makes sure the inner function can distinguish between batch
> charging and a single huge page charge.  It will only signal another
> attempt if batch charging failed, and go into regular reclaim when it
> is called on behalf of a huge page.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Thank you very much.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

> ---
>  mm/memcontrol.c |   11 +++++++++--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d572102..73ea323 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1837,8 +1837,15 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
>  		flags |= MEM_CGROUP_RECLAIM_NOSWAP;
>  	} else
>  		mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
> -
> -	if (csize > PAGE_SIZE) /* change csize and retry */
> +	/*
> +	 * csize can be either a huge page (HPAGE_SIZE), a batch of
> +	 * regular pages (CHARGE_SIZE), or a single regular page
> +	 * (PAGE_SIZE).
> +	 *
> +	 * Never reclaim on behalf of optional batching, retry with a
> +	 * single page instead.
> +	 */
> +	if (csize == CHARGE_SIZE)
>  		return CHARGE_RETRY;
>  
>  	if (!(gfp_mask & __GFP_WAIT))
> -- 
> 1.7.3.5
> 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group
  2011-01-31 22:41   ` Andrew Morton
@ 2011-01-31 23:50     ` KAMEZAWA Hiroyuki
  2011-02-01  0:04     ` Johannes Weiner
  1 sibling, 0 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-31 23:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

On Mon, 31 Jan 2011 14:41:31 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 31 Jan 2011 15:03:54 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > +static inline bool res_counter_check_margin(struct res_counter *cnt,
> > +					    unsigned long bytes)
> > +{
> > +	bool ret;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&cnt->lock, flags);
> > +	ret = cnt->limit - cnt->usage >= bytes;
> > +	spin_unlock_irqrestore(&cnt->lock, flags);
> > +	return ret;
> > +}
> > +
> >  static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt)
> >  {
> >  	bool ret;
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 73ea323..c28072f 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem)
> >  	return false;
> >  }
> >  
> > +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes)
> > +{
> > +	if (!res_counter_check_margin(&mem->res, bytes))
> > +		return false;
> > +	if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes))
> > +		return false;
> > +	return true;
> > +}
> 
> argh.
> 
> If you ever have a function with the string "check" in its name, it's a
> good sign that you did something wrong.
> 
> Check what?  Against what?  Returning what?
> 
> mem_cgroup_check_under_limit() isn't toooo bad - the name tells you
> what's being checked and tells you what to expect the return value to
> mean.
> 
> But "res_counter_check_margin" and "mem_cgroup_check_margin" are just
> awful.  Something like
> 
> 	bool res_counter_may_charge(counter, bytes)
> 
> would be much clearer.
> 
> If we really want to stick with the "check" names (perhaps as an ironic
> reference to res_counter's past mistakes) then please at least document
> the sorry things?
> 

Ah, I ack the concept of patch.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Johannes, could you change name ? I'm sorry.



 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 3/3] memcg: never OOM when charging huge pages
  2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner
  2011-01-31 22:52   ` Minchan Kim
@ 2011-01-31 23:51   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-31 23:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

On Mon, 31 Jan 2011 15:03:55 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> Huge page coverage should obviously have less priority than the
> continued execution of a process.
> 
> Never kill a process when charging it a huge page fails.  Instead,
> give up after the first failed reclaim attempt and fall back to
> regular pages.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu,com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group
  2011-01-31 22:41   ` Andrew Morton
  2011-01-31 23:50     ` KAMEZAWA Hiroyuki
@ 2011-02-01  0:04     ` Johannes Weiner
  2011-02-01  0:24       ` Andrew Morton
  1 sibling, 1 reply; 25+ messages in thread
From: Johannes Weiner @ 2011-02-01  0:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

On Mon, Jan 31, 2011 at 02:41:31PM -0800, Andrew Morton wrote:
> On Mon, 31 Jan 2011 15:03:54 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> > @@ -1111,6 +1111,15 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem)
> >  	return false;
> >  }
> >  
> > +static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes)
> > +{
> > +	if (!res_counter_check_margin(&mem->res, bytes))
> > +		return false;
> > +	if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes))
> > +		return false;
> > +	return true;
> > +}
> 
> argh.
> 
> If you ever have a function with the string "check" in its name, it's a
> good sign that you did something wrong.
> 
> Check what?  Against what?  Returning what?
> 
> mem_cgroup_check_under_limit() isn't toooo bad - the name tells you
> what's being checked and tells you what to expect the return value to
> mean.
> 
> But "res_counter_check_margin" and "mem_cgroup_check_margin" are just
> awful.  Something like
> 
> 	bool res_counter_may_charge(counter, bytes)
> 
> would be much clearer.

That makes sense for the hard limit.  But the oh-so-generic resource
counters also have a soft limit, and you don't ask for that when you
want to charge.  Right now, I do not feel creative enough to come up
with a symmetric-sounding counterpart.

> If we really want to stick with the "check" names (perhaps as an ironic
> reference to res_counter's past mistakes) then please at least document
> the sorry things?

I cowardly went with this option and have a patch below to fold into
this fix.

Maybe it would be better to use res_counter_margin(cnt) >= wanted
throughout the code.  Or still better, make memcg work on pages and
res_counters on unsigned longs so the locking is no longer needed,
together with an API for most obvious maths.  I will work something
out and submit it separately.

---
Subject: [patch fixup] res_counter: document res_counter_check_margin()

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index 5cfd78a..a5930cb 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -182,6 +182,14 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
 	return ret;
 }
 
+/**
+ * res_counter_check_margin - check if the counter allows charging
+ * @cnt: the resource counter to check
+ * @bytes: the number of bytes to check the remaining space against
+ *
+ * Returns a boolean value on whether the counter can be charged
+ * @bytes or whether this would exceed the limit.
+ */
 static inline bool res_counter_check_margin(struct res_counter *cnt,
 					    unsigned long bytes)
 {

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group
  2011-02-01  0:04     ` Johannes Weiner
@ 2011-02-01  0:24       ` Andrew Morton
  2011-02-01  0:34         ` Johannes Weiner
  2011-02-03 12:53         ` [patch 0/2] memcg: clean up limit checking Johannes Weiner
  0 siblings, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2011-02-01  0:24 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

On Tue, 1 Feb 2011 01:04:55 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> Maybe it would be better to use res_counter_margin(cnt) >= wanted
> throughout the code.

yup.

> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -182,6 +182,14 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
>  	return ret;
>  }
>  
> +/**
> + * res_counter_check_margin - check if the counter allows charging
> + * @cnt: the resource counter to check
> + * @bytes: the number of bytes to check the remaining space against
> + *
> + * Returns a boolean value on whether the counter can be charged
> + * @bytes or whether this would exceed the limit.
> + */
>  static inline bool res_counter_check_margin(struct res_counter *cnt,
>  					    unsigned long bytes)
>  {

mem_cgroup_check_margin() needs some lipstick too.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group
  2011-02-01  0:24       ` Andrew Morton
@ 2011-02-01  0:34         ` Johannes Weiner
  2011-02-03 12:53         ` [patch 0/2] memcg: clean up limit checking Johannes Weiner
  1 sibling, 0 replies; 25+ messages in thread
From: Johannes Weiner @ 2011-02-01  0:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

On Mon, Jan 31, 2011 at 04:24:48PM -0800, Andrew Morton wrote:
> On Tue, 1 Feb 2011 01:04:55 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> > @@ -182,6 +182,14 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
> >  	return ret;
> >  }
> >  
> > +/**
> > + * res_counter_check_margin - check if the counter allows charging
> > + * @cnt: the resource counter to check
> > + * @bytes: the number of bytes to check the remaining space against
> > + *
> > + * Returns a boolean value on whether the counter can be charged
> > + * @bytes or whether this would exceed the limit.
> > + */
> >  static inline bool res_counter_check_margin(struct res_counter *cnt,
> >  					    unsigned long bytes)
> >  {
> 
> mem_cgroup_check_margin() needs some lipstick too.

*oink*

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9e5de7c..6c07554 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1111,6 +1111,14 @@ static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem)
 	return false;
 }
 
+/**
+ * mem_cgroup_check_margin - check if the memory cgroup allows charging
+ * @mem: memory cgroup to check
+ * @bytes: the number of bytes the caller intends to charge
+ *
+ * Returns a boolean value on whether @mem can be charged @bytes or
+ * whether this would exceed the limit.
+ */
 static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes)
 {
 	if (!res_counter_check_margin(&mem->res, bytes))

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 0/2] memcg: clean up limit checking
  2011-02-01  0:24       ` Andrew Morton
  2011-02-01  0:34         ` Johannes Weiner
@ 2011-02-03 12:53         ` Johannes Weiner
  2011-02-03 12:54           ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner
  2011-02-03 12:56           ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner
  1 sibling, 2 replies; 25+ messages in thread
From: Johannes Weiner @ 2011-02-03 12:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

On Mon, Jan 31, 2011 at 04:24:48PM -0800, Andrew Morton wrote:
> On Tue, 1 Feb 2011 01:04:55 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Maybe it would be better to use res_counter_margin(cnt) >= wanted
> > throughout the code.
> 
> yup.

Okay, I cleaned it all up a bit for .39.  While doing so, I also found
that we are reclaiming one page too much when pushing back on behalf
of soft limits.

So 1/2 fixes the soft limit reclaim off-by-one-page, and 2/2 reduces
all the limit checks to two functions.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 1/2] memcg: soft limit reclaim should end at limit not below
  2011-02-03 12:53         ` [patch 0/2] memcg: clean up limit checking Johannes Weiner
@ 2011-02-03 12:54           ` Johannes Weiner
  2011-02-03 23:41             ` KAMEZAWA Hiroyuki
  2011-02-04  4:10             ` Balbir Singh
  2011-02-03 12:56           ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner
  1 sibling, 2 replies; 25+ messages in thread
From: Johannes Weiner @ 2011-02-03 12:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

Soft limit reclaim continues until the usage is below the current soft
limit, but the documented semantics are actually that soft limit
reclaim will push usage back until the soft limits are met again.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/res_counter.h |    4 ++--
 mm/memcontrol.c             |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index a5930cb..bf1f01b 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -139,7 +139,7 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
 
 static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
 {
-	if (cnt->usage < cnt->soft_limit)
+	if (cnt->usage <= cnt->soft_limit)
 		return true;
 
 	return false;
@@ -202,7 +202,7 @@ static inline bool res_counter_check_margin(struct res_counter *cnt,
 	return ret;
 }
 
-static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt)
+static inline bool res_counter_check_within_soft_limit(struct res_counter *cnt)
 {
 	bool ret;
 	unsigned long flags;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 350bfd2..23b14188 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1451,7 +1451,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 			return ret;
 		total += ret;
 		if (check_soft) {
-			if (res_counter_check_under_soft_limit(&root_mem->res))
+			if (res_counter_check_within_soft_limit(&root_mem->res))
 				return total;
 		} else if (mem_cgroup_check_under_limit(root_mem))
 			return 1 + total;
-- 
1.7.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 2/2] memcg: simplify the way memory limits are checked
  2011-02-03 12:53         ` [patch 0/2] memcg: clean up limit checking Johannes Weiner
  2011-02-03 12:54           ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner
@ 2011-02-03 12:56           ` Johannes Weiner
  2011-02-03 23:44             ` KAMEZAWA Hiroyuki
  2011-02-04  4:12             ` Balbir Singh
  1 sibling, 2 replies; 25+ messages in thread
From: Johannes Weiner @ 2011-02-03 12:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kamezawa.hiroyu, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

Since transparent huge pages, checking whether memory cgroups are
below their limits is no longer enough, but the actual amount of
chargeable space is important.

To not have more than one limit-checking interface, replace
memory_cgroup_check_under_limit() and memory_cgroup_check_margin()
with a single memory_cgroup_margin() that returns the chargeable space
and leaves the comparison to the callsite.

Soft limits are now checked the other way round, by using the already
existing function that returns the amount by which soft limits are
exceeded: res_counter_soft_limit_excess().

Also remove all the corresponding functions on the res_counter side
that are now no longer used.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/res_counter.h |   72 ++++++++----------------------------------
 mm/memcontrol.c             |   49 ++++++++++-------------------
 2 files changed, 31 insertions(+), 90 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index bf1f01b..c9d625c 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -129,20 +129,22 @@ int __must_check res_counter_charge(struct res_counter *counter,
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
 void res_counter_uncharge(struct res_counter *counter, unsigned long val);
 
-static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
-{
-	if (cnt->usage < cnt->limit)
-		return true;
-
-	return false;
-}
-
-static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
+/**
+ * res_counter_margin - calculate chargeable space of a counter
+ * @cnt: the counter
+ *
+ * Returns the difference between the hard limit and the current usage
+ * of resource counter @cnt.
+ */
+static inline unsigned long long res_counter_margin(struct res_counter *cnt)
 {
-	if (cnt->usage <= cnt->soft_limit)
-		return true;
+	unsigned long long margin;
+	unsigned long flags;
 
-	return false;
+	spin_lock_irqsave(&cnt->lock, flags);
+	margin = cnt->limit - cnt->usage;
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return margin;
 }
 
 /**
@@ -167,52 +169,6 @@ res_counter_soft_limit_excess(struct res_counter *cnt)
 	return excess;
 }
 
-/*
- * Helper function to detect if the cgroup is within it's limit or
- * not. It's currently called from cgroup_rss_prepare()
- */
-static inline bool res_counter_check_under_limit(struct res_counter *cnt)
-{
-	bool ret;
-	unsigned long flags;
-
-	spin_lock_irqsave(&cnt->lock, flags);
-	ret = res_counter_limit_check_locked(cnt);
-	spin_unlock_irqrestore(&cnt->lock, flags);
-	return ret;
-}
-
-/**
- * res_counter_check_margin - check if the counter allows charging
- * @cnt: the resource counter to check
- * @bytes: the number of bytes to check the remaining space against
- *
- * Returns a boolean value on whether the counter can be charged
- * @bytes or whether this would exceed the limit.
- */
-static inline bool res_counter_check_margin(struct res_counter *cnt,
-					    unsigned long bytes)
-{
-	bool ret;
-	unsigned long flags;
-
-	spin_lock_irqsave(&cnt->lock, flags);
-	ret = cnt->limit - cnt->usage >= bytes;
-	spin_unlock_irqrestore(&cnt->lock, flags);
-	return ret;
-}
-
-static inline bool res_counter_check_within_soft_limit(struct res_counter *cnt)
-{
-	bool ret;
-	unsigned long flags;
-
-	spin_lock_irqsave(&cnt->lock, flags);
-	ret = res_counter_soft_limit_check_locked(cnt);
-	spin_unlock_irqrestore(&cnt->lock, flags);
-	return ret;
-}
-
 static inline void res_counter_reset_max(struct res_counter *cnt)
 {
 	unsigned long flags;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 23b14188..e1ab9c3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -504,11 +504,6 @@ static void mem_cgroup_remove_from_trees(struct mem_cgroup *mem)
 	}
 }
 
-static inline unsigned long mem_cgroup_get_excess(struct mem_cgroup *mem)
-{
-	return res_counter_soft_limit_excess(&mem->res) >> PAGE_SHIFT;
-}
-
 static struct mem_cgroup_per_zone *
 __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
 {
@@ -1101,33 +1096,21 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
 #define mem_cgroup_from_res_counter(counter, member)	\
 	container_of(counter, struct mem_cgroup, member)
 
-static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem)
-{
-	if (do_swap_account) {
-		if (res_counter_check_under_limit(&mem->res) &&
-			res_counter_check_under_limit(&mem->memsw))
-			return true;
-	} else
-		if (res_counter_check_under_limit(&mem->res))
-			return true;
-	return false;
-}
-
 /**
- * mem_cgroup_check_margin - check if the memory cgroup allows charging
- * @mem: memory cgroup to check
- * @bytes: the number of bytes the caller intends to charge
+ * mem_cgroup_margin - calculate chargeable space of a memory cgroup
+ * @mem: the memory cgroup
  *
- * Returns a boolean value on whether @mem can be charged @bytes or
- * whether this would exceed the limit.
+ * Returns the maximum amount of memory @mem can be charged with, in
+ * bytes.
  */
-static bool mem_cgroup_check_margin(struct mem_cgroup *mem, unsigned long bytes)
+static unsigned long long mem_cgroup_margin(struct mem_cgroup *mem)
 {
-	if (!res_counter_check_margin(&mem->res, bytes))
-		return false;
-	if (do_swap_account && !res_counter_check_margin(&mem->memsw, bytes))
-		return false;
-	return true;
+	unsigned long long margin;
+
+	margin = res_counter_margin(&mem->res);
+	if (do_swap_account)
+		margin = min(margin, res_counter_margin(&mem->memsw));
+	return margin;
 }
 
 static unsigned int get_swappiness(struct mem_cgroup *memcg)
@@ -1394,7 +1377,9 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 	bool noswap = reclaim_options & MEM_CGROUP_RECLAIM_NOSWAP;
 	bool shrink = reclaim_options & MEM_CGROUP_RECLAIM_SHRINK;
 	bool check_soft = reclaim_options & MEM_CGROUP_RECLAIM_SOFT;
-	unsigned long excess = mem_cgroup_get_excess(root_mem);
+	unsigned long excess;
+
+	excess = res_counter_soft_limit_excess(&root_mem->res) >> PAGE_SHIFT;
 
 	/* If memsw_is_minimum==1, swap-out is of-no-use. */
 	if (root_mem->memsw_is_minimum)
@@ -1451,9 +1436,9 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 			return ret;
 		total += ret;
 		if (check_soft) {
-			if (res_counter_check_within_soft_limit(&root_mem->res))
+			if (!res_counter_soft_limit_excess(&root_mem->res))
 				return total;
-		} else if (mem_cgroup_check_under_limit(root_mem))
+		} else if (mem_cgroup_margin(root_mem))
 			return 1 + total;
 	}
 	return total;
@@ -1872,7 +1857,7 @@ static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 
 	ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL,
 					      gfp_mask, flags);
-	if (mem_cgroup_check_margin(mem_over_limit, csize))
+	if (mem_cgroup_margin(mem_over_limit) >= csize)
 		return CHARGE_RETRY;
 	/*
 	 * Even though the limit is exceeded at this point, reclaim
-- 
1.7.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/2] memcg: soft limit reclaim should end at limit not below
  2011-02-03 12:54           ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner
@ 2011-02-03 23:41             ` KAMEZAWA Hiroyuki
  2011-02-04  4:10             ` Balbir Singh
  1 sibling, 0 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-03 23:41 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

On Thu, 3 Feb 2011 13:54:54 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> Soft limit reclaim continues until the usage is below the current soft
> limit, but the documented semantics are actually that soft limit
> reclaim will push usage back until the soft limits are met again.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 2/2] memcg: simplify the way memory limits are checked
  2011-02-03 12:56           ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner
@ 2011-02-03 23:44             ` KAMEZAWA Hiroyuki
  2011-02-04  4:12             ` Balbir Singh
  1 sibling, 0 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-03 23:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, nishimura, balbir, minchan.kim, linux-mm, linux-kernel

On Thu, 3 Feb 2011 13:56:11 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> Since transparent huge pages, checking whether memory cgroups are
> below their limits is no longer enough, but the actual amount of
> chargeable space is important.
> 
> To not have more than one limit-checking interface, replace
> memory_cgroup_check_under_limit() and memory_cgroup_check_margin()
> with a single memory_cgroup_margin() that returns the chargeable space
> and leaves the comparison to the callsite.
> 
> Soft limits are now checked the other way round, by using the already
> existing function that returns the amount by which soft limits are
> exceeded: res_counter_soft_limit_excess().
> 
> Also remove all the corresponding functions on the res_counter side
> that are now no longer used.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/2] memcg: soft limit reclaim should end at limit not below
  2011-02-03 12:54           ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner
  2011-02-03 23:41             ` KAMEZAWA Hiroyuki
@ 2011-02-04  4:10             ` Balbir Singh
  1 sibling, 0 replies; 25+ messages in thread
From: Balbir Singh @ 2011-02-04  4:10 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, kamezawa.hiroyu, nishimura, minchan.kim, linux-mm,
	linux-kernel

On Thu, Feb 3, 2011 at 6:24 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> Soft limit reclaim continues until the usage is below the current soft
> limit, but the documented semantics are actually that soft limit
> reclaim will push usage back until the soft limits are met again.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 2/2] memcg: simplify the way memory limits are checked
  2011-02-03 12:56           ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner
  2011-02-03 23:44             ` KAMEZAWA Hiroyuki
@ 2011-02-04  4:12             ` Balbir Singh
  1 sibling, 0 replies; 25+ messages in thread
From: Balbir Singh @ 2011-02-04  4:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, kamezawa.hiroyu, nishimura, minchan.kim, linux-mm,
	linux-kernel

On Thu, Feb 3, 2011 at 6:26 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> Since transparent huge pages, checking whether memory cgroups are
> below their limits is no longer enough, but the actual amount of
> chargeable space is important.
>
> To not have more than one limit-checking interface, replace
> memory_cgroup_check_under_limit() and memory_cgroup_check_margin()
> with a single memory_cgroup_margin() that returns the chargeable space
> and leaves the comparison to the callsite.
>
> Soft limits are now checked the other way round, by using the already
> existing function that returns the amount by which soft limits are
> exceeded: res_counter_soft_limit_excess().
>
> Also remove all the corresponding functions on the res_counter side
> that are now no longer used.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 3/3] memcg: never OOM when charging huge pages
  2011-01-27 23:45       ` Daisuke Nishimura
@ 2011-01-27 23:49         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-27 23:49 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: Johannes Weiner, linux-mm, linux-kernel, balbir, akpm

On Fri, 28 Jan 2011 08:45:40 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Thu, 27 Jan 2011 14:47:03 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Huge page coverage should obviously have less priority than the
> > continued execution of a process.
> > 
> > Never kill a process when charging it a huge page fails.  Instead,
> > give up after the first failed reclaim attempt and fall back to
> > regular pages.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/memcontrol.c |    7 +++++++
> >  1 files changed, 7 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 17c4e36..2945649 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1890,6 +1890,13 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
> >  	int csize = max(CHARGE_SIZE, (unsigned long) page_size);
> >  
> >  	/*
> > +	 * Do not OOM on huge pages.  Fall back to regular pages after
> > +	 * the first failed reclaim attempt.
> > +	 */
> > +	if (page_size > PAGE_SIZE)
> > +		oom = false;
> > +
> > +	/*
> >  	 * Unlike gloval-vm's OOM-kill, we're not in memory shortage
> >  	 * in system level. So, allow to go ahead dying process in addition to
> >  	 * MEMDIE process.
> > -- 
> > 1.7.3.5
> > 
> __mem_cgroup_try_charge() has "oom" switch already, so I prefer making callers
> use the switch properly by themselves.
> 

Okay, will do.

-Kame


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 3/3] memcg: never OOM when charging huge pages
  2011-01-27 13:47     ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner
  2011-01-27 23:44       ` KAMEZAWA Hiroyuki
@ 2011-01-27 23:45       ` Daisuke Nishimura
  2011-01-27 23:49         ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 25+ messages in thread
From: Daisuke Nishimura @ 2011-01-27 23:45 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, balbir, akpm,
	Daisuke Nishimura

On Thu, 27 Jan 2011 14:47:03 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> Huge page coverage should obviously have less priority than the
> continued execution of a process.
> 
> Never kill a process when charging it a huge page fails.  Instead,
> give up after the first failed reclaim attempt and fall back to
> regular pages.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/memcontrol.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 17c4e36..2945649 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1890,6 +1890,13 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
>  	int csize = max(CHARGE_SIZE, (unsigned long) page_size);
>  
>  	/*
> +	 * Do not OOM on huge pages.  Fall back to regular pages after
> +	 * the first failed reclaim attempt.
> +	 */
> +	if (page_size > PAGE_SIZE)
> +		oom = false;
> +
> +	/*
>  	 * Unlike gloval-vm's OOM-kill, we're not in memory shortage
>  	 * in system level. So, allow to go ahead dying process in addition to
>  	 * MEMDIE process.
> -- 
> 1.7.3.5
> 
__mem_cgroup_try_charge() has "oom" switch already, so I prefer making callers
use the switch properly by themselves.

Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 3/3] memcg: never OOM when charging huge pages
  2011-01-27 13:47     ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner
@ 2011-01-27 23:44       ` KAMEZAWA Hiroyuki
  2011-01-27 23:45       ` Daisuke Nishimura
  1 sibling, 0 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-27 23:44 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, linux-kernel, nishimura, balbir, akpm

On Thu, 27 Jan 2011 14:47:03 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> Huge page coverage should obviously have less priority than the
> continued execution of a process.
> 
> Never kill a process when charging it a huge page fails.  Instead,
> give up after the first failed reclaim attempt and fall back to
> regular pages.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Ah, I missed this. I'll merge this patch into my series. 
Thank you for pointing out.

But what I have to point out here is khugepaged lock-up issue needs
another fix for khugepaged itself. It should skip hopeless memcg.
I think I can send patches, today.

Thanks,
-Kame


> ---
>  mm/memcontrol.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 17c4e36..2945649 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1890,6 +1890,13 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
>  	int csize = max(CHARGE_SIZE, (unsigned long) page_size);
>  
>  	/*
> +	 * Do not OOM on huge pages.  Fall back to regular pages after
> +	 * the first failed reclaim attempt.
> +	 */
> +	if (page_size > PAGE_SIZE)
> +		oom = false;
> +
> +	/*
>  	 * Unlike gloval-vm's OOM-kill, we're not in memory shortage
>  	 * in system level. So, allow to go ahead dying process in addition to
>  	 * MEMDIE process.
> -- 
> 1.7.3.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 3/3] memcg: never OOM when charging huge pages
  2011-01-27 10:34   ` Johannes Weiner
@ 2011-01-27 13:47     ` Johannes Weiner
  2011-01-27 23:44       ` KAMEZAWA Hiroyuki
  2011-01-27 23:45       ` Daisuke Nishimura
  0 siblings, 2 replies; 25+ messages in thread
From: Johannes Weiner @ 2011-01-27 13:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, balbir, akpm

Huge page coverage should obviously have less priority than the
continued execution of a process.

Never kill a process when charging it a huge page fails.  Instead,
give up after the first failed reclaim attempt and fall back to
regular pages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 17c4e36..2945649 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1890,6 +1890,13 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
 	int csize = max(CHARGE_SIZE, (unsigned long) page_size);
 
 	/*
+	 * Do not OOM on huge pages.  Fall back to regular pages after
+	 * the first failed reclaim attempt.
+	 */
+	if (page_size > PAGE_SIZE)
+		oom = false;
+
+	/*
 	 * Unlike gloval-vm's OOM-kill, we're not in memory shortage
 	 * in system level. So, allow to go ahead dying process in addition to
 	 * MEMDIE process.
-- 
1.7.3.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-02-04  4:12 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-31 14:03 Johannes Weiner
2011-01-31 14:03 ` [patch 1/3] memcg: prevent endless loop when charging huge pages Johannes Weiner
2011-01-31 22:27   ` Minchan Kim
2011-01-31 23:48   ` KAMEZAWA Hiroyuki
2011-01-31 14:03 ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Johannes Weiner
2011-01-31 22:41   ` Andrew Morton
2011-01-31 23:50     ` KAMEZAWA Hiroyuki
2011-02-01  0:04     ` Johannes Weiner
2011-02-01  0:24       ` Andrew Morton
2011-02-01  0:34         ` Johannes Weiner
2011-02-03 12:53         ` [patch 0/2] memcg: clean up limit checking Johannes Weiner
2011-02-03 12:54           ` [patch 1/2] memcg: soft limit reclaim should end at limit not below Johannes Weiner
2011-02-03 23:41             ` KAMEZAWA Hiroyuki
2011-02-04  4:10             ` Balbir Singh
2011-02-03 12:56           ` [patch 2/2] memcg: simplify the way memory limits are checked Johannes Weiner
2011-02-03 23:44             ` KAMEZAWA Hiroyuki
2011-02-04  4:12             ` Balbir Singh
2011-01-31 22:42   ` [patch 2/3] memcg: prevent endless loop when charging huge pages to near-limit group Minchan Kim
2011-01-31 14:03 ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner
2011-01-31 22:52   ` Minchan Kim
2011-01-31 23:51   ` KAMEZAWA Hiroyuki
  -- strict thread matches above, loose matches on Subject: below --
2011-01-21  6:34 [PATCH 0/7] memcg : more fixes and clean up for 2.6.28-rc KAMEZAWA Hiroyuki
2011-01-21  6:44 ` [PATCH 4/7] memcg : fix charge function of THP allocation KAMEZAWA Hiroyuki
2011-01-27 10:34   ` Johannes Weiner
2011-01-27 13:47     ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner
2011-01-27 23:44       ` KAMEZAWA Hiroyuki
2011-01-27 23:45       ` Daisuke Nishimura
2011-01-27 23:49         ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).