LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	"menage@google.com" <menage@google.com>
Subject: Re: [RFC][PATCH 5/6] memcg: mem+swap controller
Date: Fri, 7 Nov 2008 18:19:32 +0900	[thread overview]
Message-ID: <20081107181932.94e6f307.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20081107180248.39251a80.nishimura@mxp.nes.nec.co.jp>

On Fri, 7 Nov 2008 18:02:48 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Wed, 5 Nov 2008 17:23:16 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Mem+Swap controller core.
> > 
> > This patch implements per cgroup limit for usage of memory+swap.
> > However there are SwapCache, double counting of swap-cache and
> > swap-entry is avoided.
> > 
> > Mem+Swap controller works as following.
> >   - memory usage is limited by memory.limit_in_bytes.
> >   - memory + swap usage is limited by memory.memsw_limit_in_bytes.
> > 
> > 
> > This has following benefits.
> >   - A user can limit total resource usage of mem+swap.
> > 
> >     Without this, because memory resource controller doesn't take care of
> >     usage of swap, a process can exhaust all the swap (by memory leak.)
> >     We can avoid this case.
> > 
> >     And Swap is shared resource but it cannot be reclaimed (goes back to memory)
> >     until it's used. This characteristic can be trouble when the memory
> >     is divided into some parts by cpuset or memcg.
> >     Assume group A and group B.
> >     After some application executes, the system can be..
> >     
> >     Group A -- very large free memory space but occupy 99% of swap.
> >     Group B -- under memory shortage but cannot use swap...it's nearly full.
> > 
> >     Ability to set appropriate swap limit for each group is required.
> >       
> > Maybe someone wonder "why not swap but mem+swap ?"
> > 
> >   - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
> >     to move account from memory to swap...there is no change in usage of
> >     mem+swap.
> > 
> >     In other words, when we want to limit the usage of swap without affecting
> >     global LRU, mem+swap limit is better than just limiting swap.
> > 
> > 
> > Accounting target information is stored in swap_cgroup which is
> > per swap entry record.
> > 
> > Charge is done as following.
> >   map
> >     - charge  page and memsw.
> > 
> >   unmap
> >     - uncharge page/memsw if not SwapCache.
> > 
> >   swap-out (__delete_from_swap_cache)
> >     - uncharge page
> >     - record mem_cgroup information to swap_cgroup.
> > 
> >   swap-in (do_swap_page)
> >     - charged as page and memsw.
> >       record in swap_cgroup is cleared.
> >       memsw accounting is decremented.
> > 
> >   swap-free (swap_free())
> >     - if swap entry is freed, memsw is uncharged by PAGE_SIZE.
> > 
> > 
> > After this, usual memory resource controller handles SwapCache.
> > (It was lacked(ignored) feature in current memcg but must be handled.)
> > 
> SwapCache has been handled in [2/6] already :)
> 
yes. I'll rewrite this.


> (snip)
> > @@ -514,12 +534,25 @@ static int __mem_cgroup_try_charge(struc
> >  		css_get(&mem->css);
> >  	}
> >  
> > +	while (1) {
> > +		int ret;
> > +		bool noswap = false;
> >  
> > -	while (unlikely(res_counter_charge(&mem->res, PAGE_SIZE))) {
> > +		ret = res_counter_charge(&mem->res, PAGE_SIZE);
> > +		if (likely(!ret)) {
> > +			if (!do_swap_account)
> > +				break;
> > +			ret = res_counter_charge(&mem->memsw, PAGE_SIZE);
> > +			if (likely(!ret))
> > +				break;
> > +			/* mem+swap counter fails */
> > +			res_counter_uncharge(&mem->res, PAGE_SIZE);
> > +			noswap = true;
> > +		}
> >  		if (!(gfp_mask & __GFP_WAIT))
> >  			goto nomem;
> >  
> > -		if (try_to_free_mem_cgroup_pages(mem, gfp_mask))
> > +		if (try_to_free_mem_cgroup_pages(mem, gfp_mask, noswap))
> >  			continue;
> >  
> >  		/*
> I have two comment about try_charge.
> 
> 1. It would be better if possible to avoid charging memsw at swapin (and uncharging
>    it again at mem_cgroup_cache_charge_swapin/mem_cgroup_commit_charge_swapin).
>    How about adding a new argument "charge_memsw" ? (it has many args already now...)

Hmm, maybe possible and good. I'll cosider this again.

> 2. Should we use swap when exceeding mem.limit but mem.limit == memsw.limit ?
> 
I'd like to put that special case into "TODO" list. Hmm...
maybe set noswap=true in that case is enough. but we have to be careful.


> (snip)
> >  void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
> > @@ -838,6 +947,7 @@ void mem_cgroup_cancel_charge_swapin(str
> >  	if (!mem)
> >  		return;
> >  	res_counter_uncharge(&mem->res, PAGE_SIZE);
> > +	res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> >  	css_put(&mem->css);
> >  }
> >  
> "if (do_swap_account)" is needed before uncharging memsw.
> 
good catch !

> (snip)
> >  static struct cftype mem_cgroup_files[] = {
> >  	{
> >  		.name = "usage_in_bytes",
> > -		.private = RES_USAGE,
> > +		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
> >  		.read_u64 = mem_cgroup_read,
> >  	},
> >  	{
> >  		.name = "max_usage_in_bytes",
> > -		.private = RES_MAX_USAGE,
> > +		.private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE),
> >  		.trigger = mem_cgroup_reset,
> >  		.read_u64 = mem_cgroup_read,
> >  	},
> >  	{
> >  		.name = "limit_in_bytes",
> > -		.private = RES_LIMIT,
> > +		.private = MEMFILE_PRIVATE(_MEM, RES_LIMIT),
> >  		.write_string = mem_cgroup_write,
> >  		.read_u64 = mem_cgroup_read,
> >  	},
> >  	{
> >  		.name = "failcnt",
> > -		.private = RES_FAILCNT,
> > +		.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
> >  		.trigger = mem_cgroup_reset,
> >  		.read_u64 = mem_cgroup_read,
> >  	},
> > @@ -1317,6 +1541,31 @@ static struct cftype mem_cgroup_files[] 
> >  		.name = "stat",
> >  		.read_map = mem_control_stat_show,
> >  	},
> > +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> > +	{
> > +		.name = "memsw.usage_in_bytes",
> > +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
> > +		.read_u64 = mem_cgroup_read,
> > +	},
> > +	{
> > +		.name = "memsw.max_usage_in_bytes",
> > +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE),
> > +		.trigger = mem_cgroup_reset,
> > +		.read_u64 = mem_cgroup_read,
> > +	},
> > +	{
> > +		.name = "memsw.limit_in_bytes",
> > +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT),
> > +		.write_string = mem_cgroup_write,
> > +		.read_u64 = mem_cgroup_read,
> > +	},
> > +	{
> > +		.name = "memsw.failcnt",
> > +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT),
> > +		.trigger = mem_cgroup_reset,
> > +		.read_u64 = mem_cgroup_read,
> > +	},
> > +#endif
> >  };
> >  
> IMHO, it would be better to define those "memsw.*" files as memsw_cgroup_files[],
> and change mem_cgroup_populate() like:
> 
> static int mem_cgroup_populate(struct cgroup_subsys *ss,
> 				struct cgroup *cont)
> {
> 	int ret;
> 
> 	ret = cgroup_add_files(cont, ss, mem_cgroup_files,
> 					ARRAY_SIZE(mem_cgroup_files));
> 	if (!ret && do_swap_account)
> 		ret = cgroup_add_files(cont, ss, memsw_cgroup_files,
> 					ARRAY_SIZE(memsw_cgroup_files));
> 
> 	return ret;
> }
> 
> so that those files appear only when swap accounting is enabled.
> 

Nice idea. I'll try that. 

Thanks,
-Kame


  reply	other threads:[~2008-11-07  9:20 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-05  8:16 [RFC][PATCH 0/6] memcg updates (05/Nov) KAMEZAWA Hiroyuki
2008-11-05  8:18 ` [RFC][PATCH 1/6] memcg: move all accounts to parent at rmdir() KAMEZAWA Hiroyuki
2008-11-05  8:20 ` [RFC][PATCH 2/6] memcg: handle swap cache KAMEZAWA Hiroyuki
2008-11-07  8:53   ` Daisuke Nishimura
2008-11-07  9:13     ` KAMEZAWA Hiroyuki
2008-11-05  8:20 ` [RFC][PATCH 3/6] memcg : mem+swap controller kconfig KAMEZAWA Hiroyuki
2008-11-06 11:07   ` Daisuke Nishimura
2008-11-05  8:21 ` [RFC][PATCH 4/6] memcg : swap cgroup KAMEZAWA Hiroyuki
2008-11-06 11:25   ` Daisuke Nishimura
2008-11-06 12:44     ` KAMEZAWA Hiroyuki
2008-11-07  1:19       ` Daisuke Nishimura
2008-11-05  8:23 ` [RFC][PATCH 5/6] memcg: mem+swap controller KAMEZAWA Hiroyuki
2008-11-07  9:02   ` Daisuke Nishimura
2008-11-07  9:19     ` KAMEZAWA Hiroyuki [this message]
2008-11-07 13:30       ` Daisuke Nishimura
2008-11-07 13:21   ` Daisuke Nishimura
2008-11-10  4:30   ` Daisuke Nishimura
2008-11-10  7:03     ` KAMEZAWA Hiroyuki
2008-11-05  8:24 ` [RFC][PATCH 6/6] memcg: synchronized LRU KAMEZAWA Hiroyuki
2008-11-06  6:54 ` [RFC][PATCH 0/6] memcg updates (05/Nov) Balbir Singh
2008-11-06  7:03   ` KAMEZAWA Hiroyuki
2008-11-06 10:41   ` [RFC][PATCH 7/6] memcg: add atribute (for change bahavior of rmdir) KAMEZAWA Hiroyuki
2008-11-06 11:59     ` Hugh Dickins
2008-11-06 12:47       ` [RFC][PATCH 7/6] memcg: add atribute (for change bahavior ofrmdir) KAMEZAWA Hiroyuki
2008-11-06 13:46     ` [RFC][PATCH 7/6] memcg: add atribute (for change bahavior of rmdir) Balbir Singh
2008-11-06 14:30       ` [RFC][PATCH 7/6] memcg: add atribute (for change bahavior ofrmdir) KAMEZAWA Hiroyuki
2008-11-07  1:12         ` KAMEZAWA Hiroyuki
2008-11-12  3:26 [RFC][PATCH 0/6] memcg updates (12/Nov/2008) KAMEZAWA Hiroyuki
2008-11-12  3:30 ` [RFC][PATCH 5/6] memcg: mem+swap controller KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081107181932.94e6f307.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --subject='Re: [RFC][PATCH 5/6] memcg: mem+swap controller' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).