LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	"menage@google.com" <menage@google.com>,
	nishimura@mxp.nes.nec.co.jp
Subject: Re: [RFC][PATCH 5/6] memcg: mem+swap controller
Date: Fri, 7 Nov 2008 18:02:48 +0900	[thread overview]
Message-ID: <20081107180248.39251a80.nishimura@mxp.nes.nec.co.jp> (raw)
In-Reply-To: <20081105172316.354c00fb.kamezawa.hiroyu@jp.fujitsu.com>

On Wed, 5 Nov 2008 17:23:16 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Mem+Swap controller core.
> 
> This patch implements per cgroup limit for usage of memory+swap.
> However there are SwapCache, double counting of swap-cache and
> swap-entry is avoided.
> 
> Mem+Swap controller works as following.
>   - memory usage is limited by memory.limit_in_bytes.
>   - memory + swap usage is limited by memory.memsw_limit_in_bytes.
> 
> 
> This has following benefits.
>   - A user can limit total resource usage of mem+swap.
> 
>     Without this, because memory resource controller doesn't take care of
>     usage of swap, a process can exhaust all the swap (by memory leak.)
>     We can avoid this case.
> 
>     And Swap is shared resource but it cannot be reclaimed (goes back to memory)
>     until it's used. This characteristic can be trouble when the memory
>     is divided into some parts by cpuset or memcg.
>     Assume group A and group B.
>     After some application executes, the system can be..
>     
>     Group A -- very large free memory space but occupy 99% of swap.
>     Group B -- under memory shortage but cannot use swap...it's nearly full.
> 
>     Ability to set appropriate swap limit for each group is required.
>       
> Maybe someone wonder "why not swap but mem+swap ?"
> 
>   - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
>     to move account from memory to swap...there is no change in usage of
>     mem+swap.
> 
>     In other words, when we want to limit the usage of swap without affecting
>     global LRU, mem+swap limit is better than just limiting swap.
> 
> 
> Accounting target information is stored in swap_cgroup which is
> per swap entry record.
> 
> Charge is done as following.
>   map
>     - charge  page and memsw.
> 
>   unmap
>     - uncharge page/memsw if not SwapCache.
> 
>   swap-out (__delete_from_swap_cache)
>     - uncharge page
>     - record mem_cgroup information to swap_cgroup.
> 
>   swap-in (do_swap_page)
>     - charged as page and memsw.
>       record in swap_cgroup is cleared.
>       memsw accounting is decremented.
> 
>   swap-free (swap_free())
>     - if swap entry is freed, memsw is uncharged by PAGE_SIZE.
> 
> 
> After this, usual memory resource controller handles SwapCache.
> (It was lacked(ignored) feature in current memcg but must be handled.)
> 
SwapCache has been handled in [2/6] already :)

(snip)
> @@ -514,12 +534,25 @@ static int __mem_cgroup_try_charge(struc
>  		css_get(&mem->css);
>  	}
>  
> +	while (1) {
> +		int ret;
> +		bool noswap = false;
>  
> -	while (unlikely(res_counter_charge(&mem->res, PAGE_SIZE))) {
> +		ret = res_counter_charge(&mem->res, PAGE_SIZE);
> +		if (likely(!ret)) {
> +			if (!do_swap_account)
> +				break;
> +			ret = res_counter_charge(&mem->memsw, PAGE_SIZE);
> +			if (likely(!ret))
> +				break;
> +			/* mem+swap counter fails */
> +			res_counter_uncharge(&mem->res, PAGE_SIZE);
> +			noswap = true;
> +		}
>  		if (!(gfp_mask & __GFP_WAIT))
>  			goto nomem;
>  
> -		if (try_to_free_mem_cgroup_pages(mem, gfp_mask))
> +		if (try_to_free_mem_cgroup_pages(mem, gfp_mask, noswap))
>  			continue;
>  
>  		/*
I have two comment about try_charge.

1. It would be better if possible to avoid charging memsw at swapin (and uncharging
   it again at mem_cgroup_cache_charge_swapin/mem_cgroup_commit_charge_swapin).
   How about adding a new argument "charge_memsw" ? (it has many args already now...)
2. Should we use swap when exceeding mem.limit but mem.limit == memsw.limit ?

(snip)
>  void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
> @@ -838,6 +947,7 @@ void mem_cgroup_cancel_charge_swapin(str
>  	if (!mem)
>  		return;
>  	res_counter_uncharge(&mem->res, PAGE_SIZE);
> +	res_counter_uncharge(&mem->memsw, PAGE_SIZE);
>  	css_put(&mem->css);
>  }
>  
"if (do_swap_account)" is needed before uncharging memsw.

(snip)
>  static struct cftype mem_cgroup_files[] = {
>  	{
>  		.name = "usage_in_bytes",
> -		.private = RES_USAGE,
> +		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
>  		.read_u64 = mem_cgroup_read,
>  	},
>  	{
>  		.name = "max_usage_in_bytes",
> -		.private = RES_MAX_USAGE,
> +		.private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE),
>  		.trigger = mem_cgroup_reset,
>  		.read_u64 = mem_cgroup_read,
>  	},
>  	{
>  		.name = "limit_in_bytes",
> -		.private = RES_LIMIT,
> +		.private = MEMFILE_PRIVATE(_MEM, RES_LIMIT),
>  		.write_string = mem_cgroup_write,
>  		.read_u64 = mem_cgroup_read,
>  	},
>  	{
>  		.name = "failcnt",
> -		.private = RES_FAILCNT,
> +		.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
>  		.trigger = mem_cgroup_reset,
>  		.read_u64 = mem_cgroup_read,
>  	},
> @@ -1317,6 +1541,31 @@ static struct cftype mem_cgroup_files[] 
>  		.name = "stat",
>  		.read_map = mem_control_stat_show,
>  	},
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> +	{
> +		.name = "memsw.usage_in_bytes",
> +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
> +		.read_u64 = mem_cgroup_read,
> +	},
> +	{
> +		.name = "memsw.max_usage_in_bytes",
> +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE),
> +		.trigger = mem_cgroup_reset,
> +		.read_u64 = mem_cgroup_read,
> +	},
> +	{
> +		.name = "memsw.limit_in_bytes",
> +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT),
> +		.write_string = mem_cgroup_write,
> +		.read_u64 = mem_cgroup_read,
> +	},
> +	{
> +		.name = "memsw.failcnt",
> +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT),
> +		.trigger = mem_cgroup_reset,
> +		.read_u64 = mem_cgroup_read,
> +	},
> +#endif
>  };
>  
IMHO, it would be better to define those "memsw.*" files as memsw_cgroup_files[],
and change mem_cgroup_populate() like:

static int mem_cgroup_populate(struct cgroup_subsys *ss,
				struct cgroup *cont)
{
	int ret;

	ret = cgroup_add_files(cont, ss, mem_cgroup_files,
					ARRAY_SIZE(mem_cgroup_files));
	if (!ret && do_swap_account)
		ret = cgroup_add_files(cont, ss, memsw_cgroup_files,
					ARRAY_SIZE(memsw_cgroup_files));

	return ret;
}

so that those files appear only when swap accounting is enabled.


Thanks,
Daisuke Nishimura.

  reply	other threads:[~2008-11-07  9:08 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-05  8:16 [RFC][PATCH 0/6] memcg updates (05/Nov) KAMEZAWA Hiroyuki
2008-11-05  8:18 ` [RFC][PATCH 1/6] memcg: move all accounts to parent at rmdir() KAMEZAWA Hiroyuki
2008-11-05  8:20 ` [RFC][PATCH 2/6] memcg: handle swap cache KAMEZAWA Hiroyuki
2008-11-07  8:53   ` Daisuke Nishimura
2008-11-07  9:13     ` KAMEZAWA Hiroyuki
2008-11-05  8:20 ` [RFC][PATCH 3/6] memcg : mem+swap controller kconfig KAMEZAWA Hiroyuki
2008-11-06 11:07   ` Daisuke Nishimura
2008-11-05  8:21 ` [RFC][PATCH 4/6] memcg : swap cgroup KAMEZAWA Hiroyuki
2008-11-06 11:25   ` Daisuke Nishimura
2008-11-06 12:44     ` KAMEZAWA Hiroyuki
2008-11-07  1:19       ` Daisuke Nishimura
2008-11-05  8:23 ` [RFC][PATCH 5/6] memcg: mem+swap controller KAMEZAWA Hiroyuki
2008-11-07  9:02   ` Daisuke Nishimura [this message]
2008-11-07  9:19     ` KAMEZAWA Hiroyuki
2008-11-07 13:30       ` Daisuke Nishimura
2008-11-07 13:21   ` Daisuke Nishimura
2008-11-10  4:30   ` Daisuke Nishimura
2008-11-10  7:03     ` KAMEZAWA Hiroyuki
2008-11-05  8:24 ` [RFC][PATCH 6/6] memcg: synchronized LRU KAMEZAWA Hiroyuki
2008-11-06  6:54 ` [RFC][PATCH 0/6] memcg updates (05/Nov) Balbir Singh
2008-11-06  7:03   ` KAMEZAWA Hiroyuki
2008-11-06 10:41   ` [RFC][PATCH 7/6] memcg: add atribute (for change bahavior of rmdir) KAMEZAWA Hiroyuki
2008-11-06 11:59     ` Hugh Dickins
2008-11-06 12:47       ` [RFC][PATCH 7/6] memcg: add atribute (for change bahavior ofrmdir) KAMEZAWA Hiroyuki
2008-11-06 13:46     ` [RFC][PATCH 7/6] memcg: add atribute (for change bahavior of rmdir) Balbir Singh
2008-11-06 14:30       ` [RFC][PATCH 7/6] memcg: add atribute (for change bahavior ofrmdir) KAMEZAWA Hiroyuki
2008-11-07  1:12         ` KAMEZAWA Hiroyuki
2008-11-12  3:26 [RFC][PATCH 0/6] memcg updates (12/Nov/2008) KAMEZAWA Hiroyuki
2008-11-12  3:30 ` [RFC][PATCH 5/6] memcg: mem+swap controller KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081107180248.39251a80.nishimura@mxp.nes.nec.co.jp \
    --to=nishimura@mxp.nes.nec.co.jp \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --subject='Re: [RFC][PATCH 5/6] memcg: mem+swap controller' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).