LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC][0/3] Virtual address space control for cgroups
@ 2008-03-16 17:29 Balbir Singh
  2008-03-16 17:29 ` [RFC][1/3] Add user interface for virtual address space control Balbir Singh
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Balbir Singh @ 2008-03-16 17:29 UTC (permalink / raw)
  To: linux-mm
  Cc: Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki

This is an early patchset for virtual address space control for cgroups.
The patches are against 2.6.25-rc5-mm1 and have been tested on top of
User Mode Linux.

The first patch adds the user interface. The second patch adds accounting
and control. The third patch updates documentation.

Review suggestions would be appreciated along the lines of

1. What is missing? Are all virtual address space expansion points covered?
2. Do we need to account and control address space at insert_special_mapping?
3. Address space accounting may contain duplications. Do we need to avoid it?

Comments?

series

memory-controller-virtual-address-space-control-user-interface
memory-controller-virtual-address-space-accounting-and-control
memory-controller-virtual-address-control-documentation



-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC][1/3] Add user interface for virtual address space control
  2008-03-16 17:29 [RFC][0/3] Virtual address space control for cgroups Balbir Singh
@ 2008-03-16 17:29 ` Balbir Singh
  2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2008-03-16 17:29 UTC (permalink / raw)
  To: linux-mm
  Cc: Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki



Add as_usage_in_bytes and as_limit_in_bytes interfaces. These provide
control over the total address space that the processes combined together
in the cgroup can grow upto. This functionality is analogous to
the RLIMIT_AS function of the getrlimit(2) and setrlimit(2) calls.
A as_res resource counter is added to the mem_cgroup structure. The
as_res counter handles all the accounting associated with the virtual
address space accounting and control of cgroups.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 mm/memcontrol.c |   31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff -puN mm/memcontrol.c~memory-controller-virtual-address-space-control-user-interface mm/memcontrol.c
--- linux-2.6.25-rc5/mm/memcontrol.c~memory-controller-virtual-address-space-control-user-interface	2008-03-16 22:57:38.000000000 +0530
+++ linux-2.6.25-rc5-balbir/mm/memcontrol.c	2008-03-16 22:57:38.000000000 +0530
@@ -128,6 +128,10 @@ struct mem_cgroup {
 	 */
 	struct res_counter res;
 	/*
+	 * Address space limits
+	 */
+	struct res_counter as_res;
+	/*
 	 * Per cgroup active and inactive list, similar to the
 	 * per zone LRU lists.
 	 */
@@ -870,6 +874,21 @@ static ssize_t mem_cgroup_write(struct c
 				mem_cgroup_write_strategy);
 }
 
+static u64 mem_cgroup_as_read(struct cgroup *cont, struct cftype *cft)
+{
+	return res_counter_read_u64(&mem_cgroup_from_cont(cont)->as_res,
+				    cft->private);
+}
+
+static ssize_t mem_cgroup_as_write(struct cgroup *cont, struct cftype *cft,
+				struct file *file, const char __user *userbuf,
+				size_t nbytes, loff_t *ppos)
+{
+	return res_counter_write(&mem_cgroup_from_cont(cont)->as_res,
+				cft->private, userbuf, nbytes, ppos,
+				mem_cgroup_write_strategy);
+}
+
 static ssize_t mem_force_empty_write(struct cgroup *cont,
 				struct cftype *cft, struct file *file,
 				const char __user *userbuf,
@@ -931,6 +950,17 @@ static struct cftype mem_cgroup_files[] 
 		.read_u64 = mem_cgroup_read,
 	},
 	{
+		.name = "as_usage_in_bytes",
+		.private = RES_USAGE,
+		.read_u64 = mem_cgroup_as_read,
+	},
+	{
+		.name = "as_limit_in_bytes",
+		.private = RES_LIMIT,
+		.write = mem_cgroup_as_write,
+		.read_u64 = mem_cgroup_as_read,
+	},
+	{
 		.name = "failcnt",
 		.private = RES_FAILCNT,
 		.read_u64 = mem_cgroup_read,
@@ -999,6 +1029,7 @@ mem_cgroup_create(struct cgroup_subsys *
 		return ERR_PTR(-ENOMEM);
 
 	res_counter_init(&mem->res);
+	res_counter_init(&mem->as_res);
 
 	memset(&mem->info, 0, sizeof(mem->info));
 
diff -puN include/linux/memcontrol.h~memory-controller-virtual-address-space-control-user-interface include/linux/memcontrol.h
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC][2/3] Account and control virtual address space allocations
  2008-03-16 17:29 [RFC][0/3] Virtual address space control for cgroups Balbir Singh
  2008-03-16 17:29 ` [RFC][1/3] Add user interface for virtual address space control Balbir Singh
@ 2008-03-16 17:30 ` Balbir Singh
  2008-03-17  2:02   ` Paul Menage
                     ` (3 more replies)
  2008-03-16 17:30 ` [RFC][3/3] Update documentation for virtual address space control Balbir Singh
  2008-03-16 23:26 ` [RFC][0/3] Virtual address space control for cgroups Paul Menage
  3 siblings, 4 replies; 30+ messages in thread
From: Balbir Singh @ 2008-03-16 17:30 UTC (permalink / raw)
  To: linux-mm
  Cc: Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki



This patch implements accounting and control of virtual address space.
Accounting is done when the virtual address space of any task/mm_struct
belonging to the cgroup is incremented or decremented. This patch
fails the expansion if the cgroup goes over its limit. A new function
mem_cgroup_update_as() is added to deal with the accounting of the virtual
address space usage of cgroups.

TODOs

1. IA64 has code in perfmon.c pfm_smpl_buffer_alloc(), which increments
   the total_vm of the mm_struct. This code has not yet been brought into
   virtual address space control
2. Only when CONFIG_MMU is enabled, is the virtual address space control
   enabled. Should we do this for nommu cases as well? My suspicion is
   that we don't have to.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 arch/x86/kernel/ptrace.c   |   10 +++++++++-
 include/linux/memcontrol.h |    7 +++++++
 init/Kconfig               |    4 +++-
 kernel/fork.c              |    9 +++++++--
 mm/memcontrol.c            |   37 +++++++++++++++++++++++++++++++++++++
 mm/memory.c                |    5 +++++
 mm/mmap.c                  |   22 ++++++++++++++++++++--
 mm/mremap.c                |   21 ++++++++++++++++++---
 8 files changed, 106 insertions(+), 9 deletions(-)

diff -puN mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control mm/memcontrol.c
--- linux-2.6.25-rc5/mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control	2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/mm/memcontrol.c	2008-03-16 22:57:40.000000000 +0530
@@ -525,6 +525,32 @@ unsigned long mem_cgroup_isolate_pages(u
 }
 
 /*
+ * Check if the current cgroup exceeds its address space limit.
+ * Returns 0 on success and 1 on failure.
+ */
+int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
+{
+	int ret = 0;
+	struct mem_cgroup *mem;
+	if (mem_cgroup_subsys.disabled)
+		return ret;
+
+	rcu_read_lock();
+	mem = rcu_dereference(mm->mem_cgroup);
+	css_get(&mem->css);
+	rcu_read_unlock();
+
+	if (nr_pages > 0) {
+		if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
+			ret = 1;
+	} else
+		res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
+
+	css_put(&mem->css);
+	return ret;
+}
+
+/*
  * Charge the memory controller for page usage.
  * Return
  * 0 if the charge was successful
@@ -1103,6 +1129,17 @@ static void mem_cgroup_move_task(struct 
 		goto out;
 
 	css_get(&mem->css);
+	/*
+	 * For address space accounting, the charges are migrated.
+	 * We need to migrate it since all the future uncharge/charge will
+	 * now happen to the new cgroup. For consistency, we need to migrate
+	 * all charges, otherwise we could end up dropping charges from
+	 * the new cgroup (even though they were incurred in the current
+	 * group).
+	 */
+	if (res_counter_charge(&mem->as_res, mm->total_vm))
+		goto out;
+	res_counter_uncharge(&old_mem->as_res, mm->total_vm);
 	rcu_assign_pointer(mm->mem_cgroup, mem);
 	css_put(&old_mem->css);
 
diff -puN include/linux/memcontrol.h~memory-controller-virtual-address-space-accounting-and-control include/linux/memcontrol.h
--- linux-2.6.25-rc5/include/linux/memcontrol.h~memory-controller-virtual-address-space-accounting-and-control	2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/include/linux/memcontrol.h	2008-03-16 22:57:40.000000000 +0530
@@ -54,6 +54,7 @@ int task_in_mem_cgroup(struct task_struc
 extern int mem_cgroup_prepare_migration(struct page *page);
 extern void mem_cgroup_end_migration(struct page *page);
 extern void mem_cgroup_page_migration(struct page *page, struct page *newpage);
+extern int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages);
 
 /*
  * For memory reclaim.
@@ -172,6 +173,12 @@ static inline long mem_cgroup_calc_recla
 {
 	return 0;
 }
+
+static inline int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
+{
+	return 0;
+}
+
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff -puN mm/mmap.c~memory-controller-virtual-address-space-accounting-and-control mm/mmap.c
--- linux-2.6.25-rc5/mm/mmap.c~memory-controller-virtual-address-space-accounting-and-control	2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/mm/mmap.c	2008-03-16 22:57:40.000000000 +0530
@@ -1117,6 +1117,9 @@ munmap_back:
 		}
 	}
 
+	if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
+		return -ENOMEM;
+
 	/*
 	 * Can we just expand an old private anonymous mapping?
 	 * The VM_SHARED test is necessary because shmem_zero_setup
@@ -1226,8 +1229,11 @@ unmap_and_free_vma:
 free_vma:
 	kmem_cache_free(vm_area_cachep, vma);
 unacct_error:
-	if (charged)
+	if (charged) {
+		mem_cgroup_update_as(mm, -charged);
 		vm_unacct_memory(charged);
+	}
+unacct_as_error:
 	return error;
 }
 
@@ -1555,6 +1561,9 @@ static int acct_stack_growth(struct vm_a
 	if (security_vm_enough_memory(grow))
 		return -ENOMEM;
 
+	if (mem_cgroup_update_as(mm, grow))
+		return -ENOMEM;
+
 	/* Ok, everything looks good - let it rip */
 	mm->total_vm += grow;
 	if (vma->vm_flags & VM_LOCKED)
@@ -2003,9 +2012,14 @@ unsigned long do_brk(unsigned long addr,
 	if (mm->map_count > sysctl_max_map_count)
 		return -ENOMEM;
 
-	if (security_vm_enough_memory(len >> PAGE_SHIFT))
+	if (mem_cgroup_update_as(mm, (len >> PAGE_SHIFT)))
 		return -ENOMEM;
 
+	if (security_vm_enough_memory(len >> PAGE_SHIFT)) {
+		mem_cgroup_update_as(mm, -(len >> PAGE_SHIFT));
+		return -ENOMEM;
+	}
+
 	/* Can we just expand an old private anonymous mapping? */
 	if (vma_merge(mm, prev, addr, addr + len, flags,
 					NULL, NULL, pgoff, NULL))
@@ -2236,6 +2250,9 @@ int install_special_mapping(struct mm_st
 	if (unlikely(vma == NULL))
 		return -ENOMEM;
 
+	if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
+		return -ENOMEM;
+
 	vma->vm_mm = mm;
 	vma->vm_start = addr;
 	vma->vm_end = addr + len;
@@ -2248,6 +2265,7 @@ int install_special_mapping(struct mm_st
 
 	if (unlikely(insert_vm_struct(mm, vma))) {
 		kmem_cache_free(vm_area_cachep, vma);
+		mem_cgroup_update_as(mm, -(len >> PAGE_SHIFT));
 		return -ENOMEM;
 	}
 
diff -puN arch/x86/kernel/ptrace.c~memory-controller-virtual-address-space-accounting-and-control arch/x86/kernel/ptrace.c
--- linux-2.6.25-rc5/arch/x86/kernel/ptrace.c~memory-controller-virtual-address-space-accounting-and-control	2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/arch/x86/kernel/ptrace.c	2008-03-16 22:57:40.000000000 +0530
@@ -20,6 +20,7 @@
 #include <linux/audit.h>
 #include <linux/seccomp.h>
 #include <linux/signal.h>
+#include <linux/memcontrol.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -787,6 +788,8 @@ static int ptrace_bts_realloc(struct tas
 	current->mm->total_vm  -= old_size;
 	current->mm->locked_vm -= old_size;
 
+	mem_cgroup_update_as(current->mm, -old_size);
+
 	if (size == 0)
 		goto out;
 
@@ -816,10 +819,15 @@ static int ptrace_bts_realloc(struct tas
 			goto out;
 	}
 
+	if (mem_cgroup_update_as(current->mm, size))
+		goto out;
+
 	ret = ds_allocate((void **)&child->thread.ds_area_msr,
 			  size << PAGE_SHIFT);
-	if (ret < 0)
+	if (ret < 0) {
+		mem_cgroup_update_as(current->mm, -size);
 		goto out;
+	}
 
 	current->mm->total_vm  += size;
 	current->mm->locked_vm += size;
diff -puN kernel/fork.c~memory-controller-virtual-address-space-accounting-and-control kernel/fork.c
--- linux-2.6.25-rc5/kernel/fork.c~memory-controller-virtual-address-space-accounting-and-control	2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/kernel/fork.c	2008-03-16 22:57:40.000000000 +0530
@@ -53,6 +53,7 @@
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
+#include <linux/memcontrol.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -237,6 +238,7 @@ static int dup_mmap(struct mm_struct *mm
 
 	for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
 		struct file *file;
+		unsigned int len = vma_pages(mpnt);
 
 		if (mpnt->vm_flags & VM_DONTCOPY) {
 			long pages = vma_pages(mpnt);
@@ -247,11 +249,12 @@ static int dup_mmap(struct mm_struct *mm
 		}
 		charge = 0;
 		if (mpnt->vm_flags & VM_ACCOUNT) {
-			unsigned int len = (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT;
 			if (security_vm_enough_memory(len))
 				goto fail_nomem;
 			charge = len;
 		}
+		if (mem_cgroup_update_as(mm, len))
+			goto fail_nomem_as;
 		tmp = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
 		if (!tmp)
 			goto fail_nomem;
@@ -311,8 +314,10 @@ out:
 fail_nomem_policy:
 	kmem_cache_free(vm_area_cachep, tmp);
 fail_nomem:
-	retval = -ENOMEM;
+	mem_cgroup_update_as(mm, -charge);
 	vm_unacct_memory(charge);
+fail_nomem_as:
+	retval = -ENOMEM;
 	goto out;
 }
 
diff -puN mm/mremap.c~memory-controller-virtual-address-space-accounting-and-control mm/mremap.c
--- linux-2.6.25-rc5/mm/mremap.c~memory-controller-virtual-address-space-accounting-and-control	2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/mm/mremap.c	2008-03-16 22:57:40.000000000 +0530
@@ -174,10 +174,15 @@ static unsigned long move_vma(struct vm_
 	if (mm->map_count >= sysctl_max_map_count - 3)
 		return -ENOMEM;
 
+	if (mem_cgroup_update_as(mm, new_len >> PAGE_SHIFT))
+		return -ENOMEM;
+
 	new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT);
 	new_vma = copy_vma(&vma, new_addr, new_len, new_pgoff);
-	if (!new_vma)
+	if (!new_vma) {
+		mem_cgroup_update_as(mm, -(new_len >> PAGE_SHIFT));
 		return -ENOMEM;
+	}
 
 	moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len);
 	if (moved_len < old_len) {
@@ -187,6 +192,7 @@ static unsigned long move_vma(struct vm_
 		 * and then proceed to unmap new area instead of old.
 		 */
 		move_page_tables(new_vma, new_addr, vma, old_addr, moved_len);
+		mem_cgroup_update_as(mm, -(new_len >> PAGE_SHIFT));
 		vma = new_vma;
 		old_len = new_len;
 		old_addr = new_addr;
@@ -347,10 +353,17 @@ unsigned long do_mremap(unsigned long ad
 		goto out;
 	}
 
+	if (mem_cgroup_update_as(mm, (new_len - old_len) >> PAGE_SHIFT)) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	if (vma->vm_flags & VM_ACCOUNT) {
 		charged = (new_len - old_len) >> PAGE_SHIFT;
-		if (security_vm_enough_memory(charged))
+		if (security_vm_enough_memory(charged)) {
+			mem_cgroup_update_as(mm, -charged);
 			goto out_nc;
+		}
 	}
 
 	/* old_len exactly to the end of the area..
@@ -406,8 +419,10 @@ unsigned long do_mremap(unsigned long ad
 		ret = move_vma(vma, addr, old_len, new_len, new_addr);
 	}
 out:
-	if (ret & ~PAGE_MASK)
+	if (ret & ~PAGE_MASK) {
 		vm_unacct_memory(charged);
+		mem_cgroup_update_as(mm, -charged);
+	}
 out_nc:
 	return ret;
 }
diff -puN init/Kconfig~memory-controller-virtual-address-space-accounting-and-control init/Kconfig
--- linux-2.6.25-rc5/init/Kconfig~memory-controller-virtual-address-space-accounting-and-control	2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/init/Kconfig	2008-03-16 22:57:40.000000000 +0530
@@ -369,7 +369,9 @@ config CGROUP_MEM_RES_CTLR
 	depends on CGROUPS && RESOURCE_COUNTERS
 	help
 	  Provides a memory resource controller that manages both page cache and
-	  RSS memory.
+	  RSS memory. It also provide accounting and control of address
+	  space allocations (along the lines of RLIMIT_AS) for cgroups
+	  when CONFIG_MMU is enabled.
 
 	  Note that setting this option increases fixed memory overhead
 	  associated with each page of memory in the system by 4/8 bytes
diff -puN mm/swapfile.c~memory-controller-virtual-address-space-accounting-and-control mm/swapfile.c
diff -puN mm/memory.c~memory-controller-virtual-address-space-accounting-and-control mm/memory.c
--- linux-2.6.25-rc5/mm/memory.c~memory-controller-virtual-address-space-accounting-and-control	2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/mm/memory.c	2008-03-16 22:57:40.000000000 +0530
@@ -838,6 +838,11 @@ unsigned long unmap_vmas(struct mmu_gath
 
 		if (vma->vm_flags & VM_ACCOUNT)
 			*nr_accounted += (end - start) >> PAGE_SHIFT;
+		/*
+		 * Unaccount used virtual memory for cgroups
+		 */
+		mem_cgroup_update_as(vma->vm_mm,
+					((long)(start - end)) >> PAGE_SHIFT);
 
 		while (start != end) {
 			if (!tlb_start_valid) {
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC][3/3] Update documentation for virtual address space control
  2008-03-16 17:29 [RFC][0/3] Virtual address space control for cgroups Balbir Singh
  2008-03-16 17:29 ` [RFC][1/3] Add user interface for virtual address space control Balbir Singh
  2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh
@ 2008-03-16 17:30 ` Balbir Singh
  2008-03-16 18:32   ` Randy Dunlap
  2008-03-16 23:26 ` [RFC][0/3] Virtual address space control for cgroups Paul Menage
  3 siblings, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2008-03-16 17:30 UTC (permalink / raw)
  To: linux-mm
  Cc: Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki



This patch adds documentation for virtual address space control.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 Documentation/controllers/memory.txt |   26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff -puN Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation Documentation/controllers/memory.txt
--- linux-2.6.25-rc5/Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation	2008-03-16 22:57:44.000000000 +0530
+++ linux-2.6.25-rc5-balbir/Documentation/controllers/memory.txt	2008-03-16 22:57:44.000000000 +0530
@@ -237,7 +237,31 @@ cgroup might have some charge associated
 tasks have migrated away from it. Such charges are automatically dropped at
 rmdir() if there are no tasks.
 
-5. TODO
+5. Virtual address space accounting
+
+A new resource counter controls the address space expansion of the tasks in
+the cgroup. Address space control is provided along the same lines as
+RLIMIT_AS control, which is available via getrlimit(2)/setrlimit(2).
+The interface for controlling address space is provided through
+"as_limit_in_bytes". The file is similar to "limit_in_bytes" w.r.t the user
+interface. Please see section 3 for more details on how to use the user
+interface to get and set values.
+
+The "as_usage_in_bytes" file provides information about the total address
+space usage of the cgroup in bytes.
+
+5.1 Advantages of providing this feature
+
+1. Control over virtual address space allows for a cgroup to fail gracefully
+   i.e, via a malloc or mmap failure as compared to OOM kill when no
+   pages can be reclaimed
+2. It provides better control over how many pages can be swapped out when
+   the cgroup goes over it's limit. A badly setup cgroup can cause excessive
+   swapping. Providing control over the address space allocations ensures
+   that the system administrator has control over the total swapping that
+   can take place.
+
+6. TODO
 
 1. Add support for accounting huge pages (as a separate controller)
 2. Make per-cgroup scanner reclaim not-shared pages first
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][3/3] Update documentation for virtual address space control
  2008-03-16 17:30 ` [RFC][3/3] Update documentation for virtual address space control Balbir Singh
@ 2008-03-16 18:32   ` Randy Dunlap
  2008-03-17  1:33     ` Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: Randy Dunlap @ 2008-03-16 18:32 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

On Sun, 16 Mar 2008 23:00:17 +0530 Balbir Singh wrote:

> This patch adds documentation for virtual address space control.
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  Documentation/controllers/memory.txt |   26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff -puN Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation Documentation/controllers/memory.txt
> --- linux-2.6.25-rc5/Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation	2008-03-16 22:57:44.000000000 +0530
> +++ linux-2.6.25-rc5-balbir/Documentation/controllers/memory.txt	2008-03-16 22:57:44.000000000 +0530
> @@ -237,7 +237,31 @@ cgroup might have some charge associated
>  tasks have migrated away from it. Such charges are automatically dropped at
>  rmdir() if there are no tasks.
>  
> -5. TODO
> +5. Virtual address space accounting
> +
> +A new resource counter controls the address space expansion of the tasks in
> +the cgroup. Address space control is provided along the same lines as
> +RLIMIT_AS control, which is available via getrlimit(2)/setrlimit(2).
> +The interface for controlling address space is provided through
> +"as_limit_in_bytes". The file is similar to "limit_in_bytes" w.r.t the user

                                                                w.r.t.
  or even spelled out.

> +interface. Please see section 3 for more details on how to use the user
> +interface to get and set values.
> +
> +The "as_usage_in_bytes" file provides information about the total address
> +space usage of the cgroup in bytes.
> +
> +5.1 Advantages of providing this feature
> +
> +1. Control over virtual address space allows for a cgroup to fail gracefully
> +   i.e, via a malloc or mmap failure as compared to OOM kill when no

      i.e.,

> +   pages can be reclaimed

end with period.

> +2. It provides better control over how many pages can be swapped out when
> +   the cgroup goes over it's limit. A badly setup cgroup can cause excessive

                           its (not "it is")

> +   swapping. Providing control over the address space allocations ensures
> +   that the system administrator has control over the total swapping that
> +   can take place.
> +
> +6. TODO
>  
>  1. Add support for accounting huge pages (as a separate controller)
>  2. Make per-cgroup scanner reclaim not-shared pages first
> _

---
~Randy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][0/3] Virtual address space control for cgroups
  2008-03-16 17:29 [RFC][0/3] Virtual address space control for cgroups Balbir Singh
                   ` (2 preceding siblings ...)
  2008-03-16 17:30 ` [RFC][3/3] Update documentation for virtual address space control Balbir Singh
@ 2008-03-16 23:26 ` Paul Menage
  2008-03-17  1:47   ` Li Zefan
  2008-03-17  1:50   ` Balbir Singh
  3 siblings, 2 replies; 30+ messages in thread
From: Paul Menage @ 2008-03-16 23:26 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Andrew Morton, KAMEZAWA Hiroyuki

On Mon, Mar 17, 2008 at 1:29 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> This is an early patchset for virtual address space control for cgroups.
>  The patches are against 2.6.25-rc5-mm1 and have been tested on top of
>  User Mode Linux.

What's the performance hit of doing these accounting checks on every
mmap/munmap? If it's not totally lost in the noise, couldn't it be
made a separate control group, so that it could be just enabled (and
the performance hit taken) for users that actually want it?

Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][3/3] Update documentation for virtual address space control
  2008-03-16 18:32   ` Randy Dunlap
@ 2008-03-17  1:33     ` Balbir Singh
  0 siblings, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2008-03-17  1:33 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

Randy Dunlap wrote:
> On Sun, 16 Mar 2008 23:00:17 +0530 Balbir Singh wrote:
> 
>> This patch adds documentation for virtual address space control.
>>
>> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
>> ---
>>
>>  Documentation/controllers/memory.txt |   26 +++++++++++++++++++++++++-
>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff -puN Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation Documentation/controllers/memory.txt
>> --- linux-2.6.25-rc5/Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation	2008-03-16 22:57:44.000000000 +0530
>> +++ linux-2.6.25-rc5-balbir/Documentation/controllers/memory.txt	2008-03-16 22:57:44.000000000 +0530
>> @@ -237,7 +237,31 @@ cgroup might have some charge associated
>>  tasks have migrated away from it. Such charges are automatically dropped at
>>  rmdir() if there are no tasks.
>>  
>> -5. TODO
>> +5. Virtual address space accounting
>> +
>> +A new resource counter controls the address space expansion of the tasks in
>> +the cgroup. Address space control is provided along the same lines as
>> +RLIMIT_AS control, which is available via getrlimit(2)/setrlimit(2).
>> +The interface for controlling address space is provided through
>> +"as_limit_in_bytes". The file is similar to "limit_in_bytes" w.r.t the user
> 
>                                                                 w.r.t.
>   or even spelled out.
> 

Will spell out.

>> +interface. Please see section 3 for more details on how to use the user
>> +interface to get and set values.
>> +
>> +The "as_usage_in_bytes" file provides information about the total address
>> +space usage of the cgroup in bytes.
>> +
>> +5.1 Advantages of providing this feature
>> +
>> +1. Control over virtual address space allows for a cgroup to fail gracefully
>> +   i.e, via a malloc or mmap failure as compared to OOM kill when no
> 
>       i.e.,
> 
>> +   pages can be reclaimed
> 
> end with period.

Will fix

> 
>> +2. It provides better control over how many pages can be swapped out when
>> +   the cgroup goes over it's limit. A badly setup cgroup can cause excessive
> 
>                            its (not "it is")
> 

Will fix :)

>> +   swapping. Providing control over the address space allocations ensures
>> +   that the system administrator has control over the total swapping that
>> +   can take place.
>> +
>> +6. TODO
>>  
>>  1. Add support for accounting huge pages (as a separate controller)
>>  2. Make per-cgroup scanner reclaim not-shared pages first
>> _
> 
> ---
> ~Randy
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][0/3] Virtual address space control for cgroups
  2008-03-16 23:26 ` [RFC][0/3] Virtual address space control for cgroups Paul Menage
@ 2008-03-17  1:47   ` Li Zefan
  2008-03-17  1:57     ` Paul Menage
  2008-03-17  1:50   ` Balbir Singh
  1 sibling, 1 reply; 30+ messages in thread
From: Li Zefan @ 2008-03-17  1:47 UTC (permalink / raw)
  To: Paul Menage
  Cc: Balbir Singh, linux-mm, Hugh Dickins, Sudhir Kumar,
	YAMAMOTO Takashi, linux-kernel, taka, David Rientjes,
	Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Mon, Mar 17, 2008 at 1:29 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> This is an early patchset for virtual address space control for cgroups.
>>  The patches are against 2.6.25-rc5-mm1 and have been tested on top of
>>  User Mode Linux.
> 
> What's the performance hit of doing these accounting checks on every
> mmap/munmap? If it's not totally lost in the noise, couldn't it be
> made a separate control group, so that it could be just enabled (and
> the performance hit taken) for users that actually want it?
> 

It will be code duplication to make it a new subsystem, and it will be useful
to control both of them, am I right? :)

So could we just add a CONFIG to this patch series, like:
	CONFIG_CGROUP_MEM_RES_AS_CTLR
?

> Paul
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][0/3] Virtual address space control for cgroups
  2008-03-16 23:26 ` [RFC][0/3] Virtual address space control for cgroups Paul Menage
  2008-03-17  1:47   ` Li Zefan
@ 2008-03-17  1:50   ` Balbir Singh
  2008-03-17  1:55     ` Paul Menage
  1 sibling, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2008-03-17  1:50 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Andrew Morton, KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Mon, Mar 17, 2008 at 1:29 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> This is an early patchset for virtual address space control for cgroups.
>>  The patches are against 2.6.25-rc5-mm1 and have been tested on top of
>>  User Mode Linux.
> 
> What's the performance hit of doing these accounting checks on every
> mmap/munmap? If it's not totally lost in the noise, couldn't it be
> made a separate control group, so that it could be just enabled (and
> the performance hit taken) for users that actually want it?
> 

I am yet to measure the performance overhead of the accounting checks. I'll try
and get started on that today. I did not consider making it a separate system,
because I suspect that anybody wanting memory control would also want address
space control (for the advantages listed in the documentation). I am not against
the idea of making it a separate subsystem, but first let me get back with the
numbers.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][0/3] Virtual address space control for cgroups
  2008-03-17  1:50   ` Balbir Singh
@ 2008-03-17  1:55     ` Paul Menage
  2008-03-17  3:12       ` Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Menage @ 2008-03-17  1:55 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Andrew Morton, KAMEZAWA Hiroyuki

On Mon, Mar 17, 2008 at 9:50 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>  I am yet to measure the performance overhead of the accounting checks. I'll try
>  and get started on that today. I did not consider making it a separate system,
>  because I suspect that anybody wanting memory control would also want address
>  space control (for the advantages listed in the documentation).

I'm a counter-example to your suspicion :-)

Trying to control virtual address space is a complete nightmare in the
presence of anything that uses large sparsely-populated mappings
(mmaps of large files, or large sparse heaps such as the JVM uses.)

If we want to control the effect of swapping, the right way to do it
is to control disk I/O, and ensure that the swapping is accounted to
that. Or simply just not give apps much swap space.

Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][0/3] Virtual address space control for cgroups
  2008-03-17  1:47   ` Li Zefan
@ 2008-03-17  1:57     ` Paul Menage
  2008-03-17  5:08       ` Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Menage @ 2008-03-17  1:57 UTC (permalink / raw)
  To: Li Zefan
  Cc: Balbir Singh, linux-mm, Hugh Dickins, Sudhir Kumar,
	YAMAMOTO Takashi, linux-kernel, taka, David Rientjes,
	Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

On Mon, Mar 17, 2008 at 9:47 AM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>
>  It will be code duplication to make it a new subsystem,

Would it? Other than the basic cgroup boilerplate, the only real
duplication that I could see would be that there'd need to be an
additional per-mm pointer back to the cgroup. (Which could be avoided
if we added a single per-mm pointer back to the "owning" task, which
would generally be the mm's thread group leader, so that you could go
quickly from an mm to a set of cgroup subsystems).

And the advantage would that you'd be able to more easily pick/choose
which bits of control you use (and pay for).

Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh
@ 2008-03-17  2:02   ` Paul Menage
  2008-03-17  2:57     ` Balbir Singh
  2008-03-17 11:36   ` Pavel Emelyanov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Paul Menage @ 2008-03-17  2:02 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Andrew Morton, KAMEZAWA Hiroyuki

On Mon, Mar 17, 2008 at 1:30 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>   /*
>  + * Check if the current cgroup exceeds its address space limit.
>  + * Returns 0 on success and 1 on failure.
>  + */
>  +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>  +{
>  +       int ret = 0;
>  +       struct mem_cgroup *mem;
>  +       if (mem_cgroup_subsys.disabled)
>  +               return ret;
>  +
>  +       rcu_read_lock();
>  +       mem = rcu_dereference(mm->mem_cgroup);
>  +       css_get(&mem->css);
>  +       rcu_read_unlock();
>  +

How about if this function avoided charging the root cgroup? You'd
save 4 atomic operations on a global data structure on every
mmap/munmap when the virtual address limit cgroup wasn't in use, which
could be significant on a large system. And I don't see situations
where you really need to limit the address space of the root cgroup.

Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-17  2:02   ` Paul Menage
@ 2008-03-17  2:57     ` Balbir Singh
  2008-03-17  3:03       ` Paul Menage
  0 siblings, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2008-03-17  2:57 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Andrew Morton, KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Mon, Mar 17, 2008 at 1:30 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>   /*
>>  + * Check if the current cgroup exceeds its address space limit.
>>  + * Returns 0 on success and 1 on failure.
>>  + */
>>  +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>>  +{
>>  +       int ret = 0;
>>  +       struct mem_cgroup *mem;
>>  +       if (mem_cgroup_subsys.disabled)
>>  +               return ret;
>>  +
>>  +       rcu_read_lock();
>>  +       mem = rcu_dereference(mm->mem_cgroup);
>>  +       css_get(&mem->css);
>>  +       rcu_read_unlock();
>>  +
> 
> How about if this function avoided charging the root cgroup? You'd
> save 4 atomic operations on a global data structure on every
> mmap/munmap when the virtual address limit cgroup wasn't in use, which
> could be significant on a large system. And I don't see situations
> where you really need to limit the address space of the root cgroup.

4 atomic operations is very tempting, but we want to account for root usage due
to the following reasons:

1. We want to be able to support hierarchial accounting and control
2. We want to track usage of the root cgroup and report it back to the user
3. We don't want to treat the root cgroup as a special case.



-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-17  2:57     ` Balbir Singh
@ 2008-03-17  3:03       ` Paul Menage
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Menage @ 2008-03-17  3:03 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Andrew Morton, KAMEZAWA Hiroyuki

On Mon, Mar 17, 2008 at 10:57 AM, Balbir Singh
<balbir@linux.vnet.ibm.com> wrote:
>
>  1. We want to be able to support hierarchial accounting and control

>  2. We want to track usage of the root cgroup and report it back to the user

What use cases do you have for that?

>  3. We don't want to treat the root cgroup as a special case.

Why? It is a special case, in that in a lot of machines there's only
going to be the root cgroup, and the subsystem won't be mounted. So in
those cases, paying any overhead is a cost without a benefit.

Alternatively, how about you skip tracking virtual address space
changes if the virtual address cgroup isn't mounted on any hierarchy?
When you mount it, you can do a pass across all mms and set the root
cgroup usage to their total.

Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][0/3] Virtual address space control for cgroups
  2008-03-17  1:55     ` Paul Menage
@ 2008-03-17  3:12       ` Balbir Singh
  0 siblings, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2008-03-17  3:12 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Andrew Morton, KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Mon, Mar 17, 2008 at 9:50 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>  I am yet to measure the performance overhead of the accounting checks. I'll try
>>  and get started on that today. I did not consider making it a separate system,
>>  because I suspect that anybody wanting memory control would also want address
>>  space control (for the advantages listed in the documentation).
> 
> I'm a counter-example to your suspicion :-)
> 
> Trying to control virtual address space is a complete nightmare in the
> presence of anything that uses large sparsely-populated mappings
> (mmaps of large files, or large sparse heaps such as the JVM uses.)
> 

Not really. Virtual limits are more gentle than an OOM kill that can occur if
the cgroup runs out of memory. Please also see
http://linux-vserver.org/Memory_Limits

> If we want to control the effect of swapping, the right way to do it
> is to control disk I/O, and ensure that the swapping is accounted to
> that. Or simply just not give apps much swap space.

Yes, a disk I/O and swap I/O controller are being developed (not by us, but
others in the community). How does one restrict swap space for a particular
application? I can think of RLIMIT_AS for a process and something similar to
what I've posted for cgroups. Not enabling swap is an option, but not very
practical IMHO.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][0/3] Virtual address space control for cgroups
  2008-03-17  1:57     ` Paul Menage
@ 2008-03-17  5:08       ` Balbir Singh
  2008-03-17  5:22         ` Paul Menage
  0 siblings, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2008-03-17  5:08 UTC (permalink / raw)
  To: Paul Menage
  Cc: Li Zefan, linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Andrew Morton, KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Mon, Mar 17, 2008 at 9:47 AM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>>  It will be code duplication to make it a new subsystem,
> 
> Would it? Other than the basic cgroup boilerplate, the only real
> duplication that I could see would be that there'd need to be an
> additional per-mm pointer back to the cgroup. (Which could be avoided
> if we added a single per-mm pointer back to the "owning" task, which
> would generally be the mm's thread group leader, so that you could go
> quickly from an mm to a set of cgroup subsystems).
> 

I understand the per-mm pointer overhead back to the cgroup. I don't understand
the part about adding a per-mm pointer back to the "owning" task. We already
have task->mm. BTW, the reason by we directly add the mm_struct to mem_cgroup
mapping is that there are contexts from where only the mm_struct is known (when
we charge/uncharge). Assuming that current->mm's mem_cgorup is the one we want
to charge/uncharge is incorrect.

> And the advantage would that you'd be able to more easily pick/choose
> which bits of control you use (and pay for).

I am not sure I understand your proposal fully. But, if it can help provide the
flexibility you are referring to, I am all ears.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][0/3] Virtual address space control for cgroups
  2008-03-17  5:08       ` Balbir Singh
@ 2008-03-17  5:22         ` Paul Menage
  2008-03-17 15:15           ` Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Menage @ 2008-03-17  5:22 UTC (permalink / raw)
  To: balbir
  Cc: Li Zefan, linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Andrew Morton, KAMEZAWA Hiroyuki

On Mon, Mar 17, 2008 at 1:08 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>  I understand the per-mm pointer overhead back to the cgroup. I don't understand
>  the part about adding a per-mm pointer back to the "owning" task. We already
>  have task->mm.

Yes, but we don't have mm->owner, which is what I was proposing -
mm->owner would be a pointer typically to the mm's thread group
leader. It would remove the need to have to have pointers for the
various different cgroup subsystems that need to act on an mm rather
than a task_struct, since then you could use
mm->owner->cgroups[subsys_id].

But this is kind of orthogonal to whether virtual address space limits
should be a separate cgroup subsystem.

Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh
  2008-03-17  2:02   ` Paul Menage
@ 2008-03-17 11:36   ` Pavel Emelyanov
  2008-03-17 12:29     ` Balbir Singh
  2008-03-17 16:53   ` Dave Hansen
  2008-03-17 23:35   ` YAMAMOTO Takashi
  3 siblings, 1 reply; 30+ messages in thread
From: Pavel Emelyanov @ 2008-03-17 11:36 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

[snip]

> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
> +{
> +	int ret = 0;
> +	struct mem_cgroup *mem;
> +	if (mem_cgroup_subsys.disabled)
> +		return ret;
> +
> +	rcu_read_lock();
> +	mem = rcu_dereference(mm->mem_cgroup);
> +	css_get(&mem->css);
> +	rcu_read_unlock();
> +
> +	if (nr_pages > 0) {
> +		if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
> +			ret = 1;
> +	} else
> +		res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));

No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.

[snip]

> @@ -1117,6 +1117,9 @@ munmap_back:
>  		}
>  	}
>  
> +	if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
> +		return -ENOMEM;
> +

Why not use existintg cap_vm_enough_memory and co?

[snip]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-17 11:36   ` Pavel Emelyanov
@ 2008-03-17 12:29     ` Balbir Singh
  2008-03-17 12:40       ` Pavel Emelyanov
  0 siblings, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2008-03-17 12:29 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki

Pavel Emelyanov wrote:
> [snip]
> 
>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>> +{
>> +	int ret = 0;
>> +	struct mem_cgroup *mem;
>> +	if (mem_cgroup_subsys.disabled)
>> +		return ret;
>> +
>> +	rcu_read_lock();
>> +	mem = rcu_dereference(mm->mem_cgroup);
>> +	css_get(&mem->css);
>> +	rcu_read_unlock();
>> +
>> +	if (nr_pages > 0) {
>> +		if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
>> +			ret = 1;
>> +	} else
>> +		res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
> 
> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.
> 
> [snip]
> 

Yes, sure :)

>> @@ -1117,6 +1117,9 @@ munmap_back:
>>  		}
>>  	}
>>  
>> +	if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
>> +		return -ENOMEM;
>> +
> 
> Why not use existintg cap_vm_enough_memory and co?
> 

I thought about it and almost used may_expand_vm(), but there is a slight catch
there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are
called after total_vm has been calculated. In our case we need to keep the
cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-17 12:29     ` Balbir Singh
@ 2008-03-17 12:40       ` Pavel Emelyanov
  2008-03-17 12:51         ` Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: Pavel Emelyanov @ 2008-03-17 12:40 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki

Balbir Singh wrote:
> Pavel Emelyanov wrote:
>> [snip]
>>
>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>>> +{
>>> +	int ret = 0;
>>> +	struct mem_cgroup *mem;
>>> +	if (mem_cgroup_subsys.disabled)
>>> +		return ret;
>>> +
>>> +	rcu_read_lock();
>>> +	mem = rcu_dereference(mm->mem_cgroup);
>>> +	css_get(&mem->css);
>>> +	rcu_read_unlock();
>>> +
>>> +	if (nr_pages > 0) {
>>> +		if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
>>> +			ret = 1;
>>> +	} else
>>> +		res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.
>>
>> [snip]
>>
> 
> Yes, sure :)

Thanks :)

>>> @@ -1117,6 +1117,9 @@ munmap_back:
>>>  		}
>>>  	}
>>>  
>>> +	if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
>>> +		return -ENOMEM;
>>> +
>> Why not use existintg cap_vm_enough_memory and co?
>>
> 
> I thought about it and almost used may_expand_vm(), but there is a slight catch
> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are
> called after total_vm has been calculated. In our case we need to keep the
> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as.

So? What prevents us from using these hooks? :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-17 12:40       ` Pavel Emelyanov
@ 2008-03-17 12:51         ` Balbir Singh
  2008-03-17 13:01           ` Pavel Emelyanov
  0 siblings, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2008-03-17 12:51 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki

Pavel Emelyanov wrote:
> Balbir Singh wrote:
>> Pavel Emelyanov wrote:
>>> [snip]
>>>
>>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>>>> +{
>>>> +	int ret = 0;
>>>> +	struct mem_cgroup *mem;
>>>> +	if (mem_cgroup_subsys.disabled)
>>>> +		return ret;
>>>> +
>>>> +	rcu_read_lock();
>>>> +	mem = rcu_dereference(mm->mem_cgroup);
>>>> +	css_get(&mem->css);
>>>> +	rcu_read_unlock();
>>>> +
>>>> +	if (nr_pages > 0) {
>>>> +		if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
>>>> +			ret = 1;
>>>> +	} else
>>>> +		res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
>>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.
>>>
>>> [snip]
>>>
>> Yes, sure :)
> 
> Thanks :)
> 
>>>> @@ -1117,6 +1117,9 @@ munmap_back:
>>>>  		}
>>>>  	}
>>>>  
>>>> +	if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
>>>> +		return -ENOMEM;
>>>> +
>>> Why not use existintg cap_vm_enough_memory and co?
>>>
>> I thought about it and almost used may_expand_vm(), but there is a slight catch
>> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are
>> called after total_vm has been calculated. In our case we need to keep the
>> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as.
> 
> So? What prevents us from using these hooks? :)

1. We need to account total_vm usage of the task anyway. So why have two places,
   one for accounting and second for control?
2. These hooks are activated for conditionally invoked for vma's with VM_ACCOUNT
   set.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-17 12:51         ` Balbir Singh
@ 2008-03-17 13:01           ` Pavel Emelyanov
  2008-03-17 14:39             ` Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: Pavel Emelyanov @ 2008-03-17 13:01 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki

Balbir Singh wrote:
> Pavel Emelyanov wrote:
>> Balbir Singh wrote:
>>> Pavel Emelyanov wrote:
>>>> [snip]
>>>>
>>>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>>>>> +{
>>>>> +	int ret = 0;
>>>>> +	struct mem_cgroup *mem;
>>>>> +	if (mem_cgroup_subsys.disabled)
>>>>> +		return ret;
>>>>> +
>>>>> +	rcu_read_lock();
>>>>> +	mem = rcu_dereference(mm->mem_cgroup);
>>>>> +	css_get(&mem->css);
>>>>> +	rcu_read_unlock();
>>>>> +
>>>>> +	if (nr_pages > 0) {
>>>>> +		if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
>>>>> +			ret = 1;
>>>>> +	} else
>>>>> +		res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
>>>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.
>>>>
>>>> [snip]
>>>>
>>> Yes, sure :)
>> Thanks :)
>>
>>>>> @@ -1117,6 +1117,9 @@ munmap_back:
>>>>>  		}
>>>>>  	}
>>>>>  
>>>>> +	if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
>>>>> +		return -ENOMEM;
>>>>> +
>>>> Why not use existintg cap_vm_enough_memory and co?
>>>>
>>> I thought about it and almost used may_expand_vm(), but there is a slight catch
>>> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are
>>> called after total_vm has been calculated. In our case we need to keep the
>>> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as.
>> So? What prevents us from using these hooks? :)
> 
> 1. We need to account total_vm usage of the task anyway. So why have two places,
>    one for accounting and second for control?

We still have two of them even placing hooks in each place manually.

Besides, putting the mem_cgroup_(un)charge_as() in these vm hooks will
1. save the number of places to patch
2. help keeping memcgroup consistent in case someone adds more places
   that expand tasks vm (arches, drivers) - in case we have our hooks
   celled from inside vm ones, we won't have to patch more.

> 2. These hooks are activated for conditionally invoked for vma's with VM_ACCOUNT
>    set.

This is a good point against. But, wrt my previous comment, can we handle 
this somehow?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-17 13:01           ` Pavel Emelyanov
@ 2008-03-17 14:39             ` Balbir Singh
  0 siblings, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2008-03-17 14:39 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki

Pavel Emelyanov wrote:
> Balbir Singh wrote:
>> Pavel Emelyanov wrote:
>>> Balbir Singh wrote:
>>>> Pavel Emelyanov wrote:
>>>>> [snip]
>>>>>
>>>>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>>>>>> +{
>>>>>> +	int ret = 0;
>>>>>> +	struct mem_cgroup *mem;
>>>>>> +	if (mem_cgroup_subsys.disabled)
>>>>>> +		return ret;
>>>>>> +
>>>>>> +	rcu_read_lock();
>>>>>> +	mem = rcu_dereference(mm->mem_cgroup);
>>>>>> +	css_get(&mem->css);
>>>>>> +	rcu_read_unlock();
>>>>>> +
>>>>>> +	if (nr_pages > 0) {
>>>>>> +		if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
>>>>>> +			ret = 1;
>>>>>> +	} else
>>>>>> +		res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
>>>>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.
>>>>>
>>>>> [snip]
>>>>>
>>>> Yes, sure :)
>>> Thanks :)
>>>
>>>>>> @@ -1117,6 +1117,9 @@ munmap_back:
>>>>>>  		}
>>>>>>  	}
>>>>>>  
>>>>>> +	if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
>>>>>> +		return -ENOMEM;
>>>>>> +
>>>>> Why not use existintg cap_vm_enough_memory and co?
>>>>>
>>>> I thought about it and almost used may_expand_vm(), but there is a slight catch
>>>> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are
>>>> called after total_vm has been calculated. In our case we need to keep the
>>>> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as.
>>> So? What prevents us from using these hooks? :)
>> 1. We need to account total_vm usage of the task anyway. So why have two places,
>>    one for accounting and second for control?
> 
> We still have two of them even placing hooks in each place manually.
> 
> Besides, putting the mem_cgroup_(un)charge_as() in these vm hooks will
> 1. save the number of places to patch
> 2. help keeping memcgroup consistent in case someone adds more places
>    that expand tasks vm (arches, drivers) - in case we have our hooks
>    celled from inside vm ones, we won't have to patch more.
> 

I am not sure I understand your proposal. Without manually placing these hooks
how do we track

1. When the vm size has increased/decreased
2. In case due to some reason, the call following these hooks fail, how do we
   undo it, without placing hooks?


>> 2. These hooks are activated for conditionally invoked for vma's with VM_ACCOUNT
>>    set.
> 
> This is a good point against. But, wrt my previous comment, can we handle 
> this somehow?

Not sure I understand

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][0/3] Virtual address space control for cgroups
  2008-03-17  5:22         ` Paul Menage
@ 2008-03-17 15:15           ` Balbir Singh
  0 siblings, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2008-03-17 15:15 UTC (permalink / raw)
  To: Paul Menage
  Cc: Li Zefan, linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	linux-kernel, taka, David Rientjes, Pavel Emelianov,
	Andrew Morton, KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Mon, Mar 17, 2008 at 1:08 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>  I understand the per-mm pointer overhead back to the cgroup. I don't understand
>>  the part about adding a per-mm pointer back to the "owning" task. We already
>>  have task->mm.
> 
> Yes, but we don't have mm->owner, which is what I was proposing -
> mm->owner would be a pointer typically to the mm's thread group
> leader. It would remove the need to have to have pointers for the
> various different cgroup subsystems that need to act on an mm rather
> than a task_struct, since then you could use
> mm->owner->cgroups[subsys_id].
> 

Aaahh.. Yes.. mm->owner might be a good idea. The only thing we'll need to
handle is when mm->owner dies (I think the thread group is still kept around).
The other disadvantage is the double dereferencing, which should not be all that
bad.

> But this is kind of orthogonal to whether virtual address space limits
> should be a separate cgroup subsystem.
> 

Yes, sure.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh
  2008-03-17  2:02   ` Paul Menage
  2008-03-17 11:36   ` Pavel Emelyanov
@ 2008-03-17 16:53   ` Dave Hansen
  2008-03-18  1:14     ` Balbir Singh
  2008-03-17 23:35   ` YAMAMOTO Takashi
  3 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2008-03-17 16:53 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

On Sun, 2008-03-16 at 23:00 +0530, Balbir Singh wrote:
> @@ -787,6 +788,8 @@ static int ptrace_bts_realloc(struct tas
>         current->mm->total_vm  -= old_size;
>         current->mm->locked_vm -= old_size;
>  
> +       mem_cgroup_update_as(current->mm, -old_size);
> +
>         if (size == 0)
>                 goto out;

I think splattering these things all over is probably a bad idea.

If you're going to do this, I think you need a couple of phases.  

1. update the vm_(un)acct_memory() functions to take an mm
2. start using them (or some other abstracted functions in place)
3. update the new functions for cgroups

It's a bit non-obvious why you do the mem_cgroup_update_as() calls in
the places that you do from context.

Having some other vm-abstracted functions will also keep you from
splattering mem_cgroup_update_as() across the tree.  That's a pretty bad
name. :)  ...update_mapped() or ...update_vm() might be a wee bit
better. 

-- Dave


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh
                     ` (2 preceding siblings ...)
  2008-03-17 16:53   ` Dave Hansen
@ 2008-03-17 23:35   ` YAMAMOTO Takashi
  2008-03-18  1:10     ` Balbir Singh
  3 siblings, 1 reply; 30+ messages in thread
From: YAMAMOTO Takashi @ 2008-03-17 23:35 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, hugh, skumar, menage, lizf, linux-kernel, taka,
	rientjes, xemul, akpm, kamezawa.hiroyu

> diff -puN mm/swapfile.c~memory-controller-virtual-address-space-accounting-and-control mm/swapfile.c
> diff -puN mm/memory.c~memory-controller-virtual-address-space-accounting-and-control mm/memory.c
> --- linux-2.6.25-rc5/mm/memory.c~memory-controller-virtual-address-space-accounting-and-control	2008-03-16 22:57:40.000000000 +0530
> +++ linux-2.6.25-rc5-balbir/mm/memory.c	2008-03-16 22:57:40.000000000 +0530
> @@ -838,6 +838,11 @@ unsigned long unmap_vmas(struct mmu_gath
>  
>  		if (vma->vm_flags & VM_ACCOUNT)
>  			*nr_accounted += (end - start) >> PAGE_SHIFT;
> +		/*
> +		 * Unaccount used virtual memory for cgroups
> +		 */
> +		mem_cgroup_update_as(vma->vm_mm,
> +					((long)(start - end)) >> PAGE_SHIFT);
>  
>  		while (start != end) {
>  			if (!tlb_start_valid) {

i think you can sum and uncharge it with a single call.

YAMAMOTO Takashi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-17 23:35   ` YAMAMOTO Takashi
@ 2008-03-18  1:10     ` Balbir Singh
  0 siblings, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2008-03-18  1:10 UTC (permalink / raw)
  To: YAMAMOTO Takashi
  Cc: linux-mm, hugh, skumar, menage, lizf, linux-kernel, taka,
	rientjes, xemul, akpm, kamezawa.hiroyu

YAMAMOTO Takashi wrote:
>> diff -puN mm/swapfile.c~memory-controller-virtual-address-space-accounting-and-control mm/swapfile.c
>> diff -puN mm/memory.c~memory-controller-virtual-address-space-accounting-and-control mm/memory.c
>> --- linux-2.6.25-rc5/mm/memory.c~memory-controller-virtual-address-space-accounting-and-control	2008-03-16 22:57:40.000000000 +0530
>> +++ linux-2.6.25-rc5-balbir/mm/memory.c	2008-03-16 22:57:40.000000000 +0530
>> @@ -838,6 +838,11 @@ unsigned long unmap_vmas(struct mmu_gath
>>  
>>  		if (vma->vm_flags & VM_ACCOUNT)
>>  			*nr_accounted += (end - start) >> PAGE_SHIFT;
>> +		/*
>> +		 * Unaccount used virtual memory for cgroups
>> +		 */
>> +		mem_cgroup_update_as(vma->vm_mm,
>> +					((long)(start - end)) >> PAGE_SHIFT);
>>  
>>  		while (start != end) {
>>  			if (!tlb_start_valid) {
> 
> i think you can sum and uncharge it with a single call.
> 

Like nr_accounted? I'll have to duplicate nr_accounted since that depends
conditionally on VM_ACCOUNT.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-17 16:53   ` Dave Hansen
@ 2008-03-18  1:14     ` Balbir Singh
  2008-03-18 17:11       ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: Balbir Singh @ 2008-03-18  1:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

Dave Hansen wrote:
> On Sun, 2008-03-16 at 23:00 +0530, Balbir Singh wrote:
>> @@ -787,6 +788,8 @@ static int ptrace_bts_realloc(struct tas
>>         current->mm->total_vm  -= old_size;
>>         current->mm->locked_vm -= old_size;
>>  
>> +       mem_cgroup_update_as(current->mm, -old_size);
>> +
>>         if (size == 0)
>>                 goto out;
> 
> I think splattering these things all over is probably a bad idea.
> 

I agree and I tried to avoid the splattering

> If you're going to do this, I think you need a couple of phases.  
> 
> 1. update the vm_(un)acct_memory() functions to take an mm

There are other problems

1. vm_(un)acct_memory is conditionally dependent on VM_ACCOUNT. Look at
shmem_(un)acct_size for example
2. These routines are not called from all contexts that we care about (look at
insert_special_mapping())

> 2. start using them (or some other abstracted functions in place)
> 3. update the new functions for cgroups
> 
> It's a bit non-obvious why you do the mem_cgroup_update_as() calls in
> the places that you do from context.
> 
> Having some other vm-abstracted functions will also keep you from
> splattering mem_cgroup_update_as() across the tree.  That's a pretty bad
> name. :)  ...update_mapped() or ...update_vm() might be a wee bit
> better. 
> 

I am going to split mem_cgroup_update_as() to two routines with a better name. I
agree with you in principle about splattering, but please see my comments above

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-18  1:14     ` Balbir Singh
@ 2008-03-18 17:11       ` Dave Hansen
  2008-03-18 17:58         ` Balbir Singh
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2008-03-18 17:11 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

On Tue, 2008-03-18 at 06:44 +0530, Balbir Singh wrote: 
> > If you're going to do this, I think you need a couple of phases.  
> > 
> > 1. update the vm_(un)acct_memory() functions to take an mm
> 
> There are other problems
> 
> 1. vm_(un)acct_memory is conditionally dependent on VM_ACCOUNT. Look at
> shmem_(un)acct_size for example

Yeah, but if VM_ACCOUNT isn't set, do you really want the controller
accounting for them?  It's there for a reason. :)

The shmem_acct_size() helpers look good.  I wonder if we should be using
that kind of things more generically.

> 2. These routines are not called from all contexts that we care about (look at
> insert_special_mapping())

Could you explain why "we" care about it and why it isn't accounted for
now?

-- Dave


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][2/3] Account and control virtual address space allocations
  2008-03-18 17:11       ` Dave Hansen
@ 2008-03-18 17:58         ` Balbir Singh
  0 siblings, 0 replies; 30+ messages in thread
From: Balbir Singh @ 2008-03-18 17:58 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	Paul Menage, lizf, linux-kernel, taka, David Rientjes,
	Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2008-03-19 19:33 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-16 17:29 [RFC][0/3] Virtual address space control for cgroups Balbir Singh
2008-03-16 17:29 ` [RFC][1/3] Add user interface for virtual address space control Balbir Singh
2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh
2008-03-17  2:02   ` Paul Menage
2008-03-17  2:57     ` Balbir Singh
2008-03-17  3:03       ` Paul Menage
2008-03-17 11:36   ` Pavel Emelyanov
2008-03-17 12:29     ` Balbir Singh
2008-03-17 12:40       ` Pavel Emelyanov
2008-03-17 12:51         ` Balbir Singh
2008-03-17 13:01           ` Pavel Emelyanov
2008-03-17 14:39             ` Balbir Singh
2008-03-17 16:53   ` Dave Hansen
2008-03-18  1:14     ` Balbir Singh
2008-03-18 17:11       ` Dave Hansen
2008-03-18 17:58         ` Balbir Singh
2008-03-17 23:35   ` YAMAMOTO Takashi
2008-03-18  1:10     ` Balbir Singh
2008-03-16 17:30 ` [RFC][3/3] Update documentation for virtual address space control Balbir Singh
2008-03-16 18:32   ` Randy Dunlap
2008-03-17  1:33     ` Balbir Singh
2008-03-16 23:26 ` [RFC][0/3] Virtual address space control for cgroups Paul Menage
2008-03-17  1:47   ` Li Zefan
2008-03-17  1:57     ` Paul Menage
2008-03-17  5:08       ` Balbir Singh
2008-03-17  5:22         ` Paul Menage
2008-03-17 15:15           ` Balbir Singh
2008-03-17  1:50   ` Balbir Singh
2008-03-17  1:55     ` Paul Menage
2008-03-17  3:12       ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).