LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC][-mm] Add an owner to the mm_struct (v4)
@ 2008-04-01 12:43 Balbir Singh
  2008-04-01 16:00 ` Pekka Enberg
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Balbir Singh @ 2008-04-01 12:43 UTC (permalink / raw)
  To: Paul Menage, Pavel Emelianov
  Cc: Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf, linux-kernel,
	taka, linux-mm, David Rientjes, Balbir Singh, Andrew Morton,
	KAMEZAWA Hiroyuki



Changelog v3
------------

1. Add mm->owner change callbacks using cgroups

This patch removes the mem_cgroup member from mm_struct and instead adds
an owner. This approach was suggested by Paul Menage. The advantage of
this approach is that, once the mm->owner is known, using the subsystem
id, the cgroup can be determined. It also allows several control groups
that are virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller,
to mm_struct.

A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.

NOTE: This patch was developed on top of 2.6.25-rc5-mm1 and is applied on top
of the memory-controller-move-to-own-slab patch (which is already present
in the Andrew's patchset).

This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes. The mm_cgroup_changed callback is called with the task_lock()
of the new task held and is called just prior to changing the mm->owner.

I am indebted to Paul Menage for the several reviews of this patchset
and helping me make it lighter and simpler.

This patch was tested on a powerpc box.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 fs/exec.c                  |    1 
 include/linux/cgroup.h     |   15 ++++++++++
 include/linux/init_task.h  |    2 -
 include/linux/memcontrol.h |   17 ++---------
 include/linux/mm_types.h   |    5 ++-
 include/linux/sched.h      |   14 +++++++++
 init/Kconfig               |   15 ++++++++++
 kernel/cgroup.c            |   24 ++++++++++++++++
 kernel/exit.c              |   66 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/fork.c              |   11 +++++--
 mm/memcontrol.c            |   21 ++------------
 11 files changed, 154 insertions(+), 37 deletions(-)

diff -puN include/linux/mm_types.h~memory-controller-add-mm-owner include/linux/mm_types.h
--- linux-2.6.25-rc5/include/linux/mm_types.h~memory-controller-add-mm-owner	2008-03-28 09:30:47.000000000 +0530
+++ linux-2.6.25-rc5-balbir/include/linux/mm_types.h	2008-03-31 14:53:04.000000000 +0530
@@ -227,8 +227,9 @@ struct mm_struct {
 	/* aio bits */
 	rwlock_t		ioctx_list_lock;
 	struct kioctx		*ioctx_list;
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR
-	struct mem_cgroup *mem_cgroup;
+#ifdef CONFIG_MM_OWNER
+	struct task_struct *owner;	/* The thread group leader that */
+					/* owns the mm_struct.		*/
 #endif
 
 #ifdef CONFIG_PROC_FS
diff -puN kernel/fork.c~memory-controller-add-mm-owner kernel/fork.c
--- linux-2.6.25-rc5/kernel/fork.c~memory-controller-add-mm-owner	2008-03-28 09:30:47.000000000 +0530
+++ linux-2.6.25-rc5-balbir/kernel/fork.c	2008-04-01 10:26:42.000000000 +0530
@@ -358,14 +358,13 @@ static struct mm_struct * mm_init(struct
 	mm->ioctx_list = NULL;
 	mm->free_area_cache = TASK_UNMAPPED_BASE;
 	mm->cached_hole_size = ~0UL;
-	mm_init_cgroup(mm, p);
+	mm_init_owner(mm, p);
 
 	if (likely(!mm_alloc_pgd(mm))) {
 		mm->def_flags = 0;
 		return mm;
 	}
 
-	mm_free_cgroup(mm);
 	free_mm(mm);
 	return NULL;
 }
@@ -394,7 +393,6 @@ void __mmdrop(struct mm_struct *mm)
 {
 	BUG_ON(mm == &init_mm);
 	mm_free_pgd(mm);
-	mm_free_cgroup(mm);
 	destroy_context(mm);
 	free_mm(mm);
 }
@@ -995,6 +993,13 @@ static void rt_mutex_init_task(struct ta
 #endif
 }
 
+#ifdef CONFIG_MM_OWNER
+void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
+{
+	mm->owner = p;
+}
+#endif /* CONFIG_MM_OWNER */
+
 /*
  * This creates a new process as a copy of the old one,
  * but does not actually start it yet.
diff -puN include/linux/memcontrol.h~memory-controller-add-mm-owner include/linux/memcontrol.h
--- linux-2.6.25-rc5/include/linux/memcontrol.h~memory-controller-add-mm-owner	2008-03-28 09:30:47.000000000 +0530
+++ linux-2.6.25-rc5-balbir/include/linux/memcontrol.h	2008-04-01 10:27:02.000000000 +0530
@@ -27,9 +27,6 @@ struct mm_struct;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 
-extern void mm_init_cgroup(struct mm_struct *mm, struct task_struct *p);
-extern void mm_free_cgroup(struct mm_struct *mm);
-
 #define page_reset_bad_cgroup(page)	((page)->page_cgroup = 0)
 
 extern struct page_cgroup *page_get_page_cgroup(struct page *page);
@@ -48,8 +45,10 @@ extern unsigned long mem_cgroup_isolate_
 extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask);
 int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem);
 
+extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
+
 #define mm_match_cgroup(mm, cgroup)	\
-	((cgroup) == rcu_dereference((mm)->mem_cgroup))
+	((cgroup) == mem_cgroup_from_task((mm)->owner))
 
 extern int mem_cgroup_prepare_migration(struct page *page);
 extern void mem_cgroup_end_migration(struct page *page);
@@ -73,15 +72,6 @@ extern long mem_cgroup_calc_reclaim_inac
 				struct zone *zone, int priority);
 
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
-static inline void mm_init_cgroup(struct mm_struct *mm,
-					struct task_struct *p)
-{
-}
-
-static inline void mm_free_cgroup(struct mm_struct *mm)
-{
-}
-
 static inline void page_reset_bad_cgroup(struct page *page)
 {
 }
@@ -172,6 +162,7 @@ static inline long mem_cgroup_calc_recla
 {
 	return 0;
 }
+
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff -puN mm/memcontrol.c~memory-controller-add-mm-owner mm/memcontrol.c
--- linux-2.6.25-rc5/mm/memcontrol.c~memory-controller-add-mm-owner	2008-03-28 09:30:47.000000000 +0530
+++ linux-2.6.25-rc5-balbir/mm/memcontrol.c	2008-03-28 18:55:33.000000000 +0530
@@ -238,26 +238,12 @@ static struct mem_cgroup *mem_cgroup_fro
 				css);
 }
 
-static struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
+struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
 {
 	return container_of(task_subsys_state(p, mem_cgroup_subsys_id),
 				struct mem_cgroup, css);
 }
 
-void mm_init_cgroup(struct mm_struct *mm, struct task_struct *p)
-{
-	struct mem_cgroup *mem;
-
-	mem = mem_cgroup_from_task(p);
-	css_get(&mem->css);
-	mm->mem_cgroup = mem;
-}
-
-void mm_free_cgroup(struct mm_struct *mm)
-{
-	css_put(&mm->mem_cgroup->css);
-}
-
 static inline int page_cgroup_locked(struct page *page)
 {
 	return bit_spin_is_locked(PAGE_CGROUP_LOCK_BIT, &page->page_cgroup);
@@ -478,6 +464,7 @@ unsigned long mem_cgroup_isolate_pages(u
 	int zid = zone_idx(z);
 	struct mem_cgroup_per_zone *mz;
 
+	BUG_ON(!mem_cont);
 	mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
 	if (active)
 		src = &mz->active_list;
@@ -576,7 +563,7 @@ retry:
 		mm = &init_mm;
 
 	rcu_read_lock();
-	mem = rcu_dereference(mm->mem_cgroup);
+	mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
 	/*
 	 * For every charge from the cgroup, increment reference count
 	 */
@@ -990,7 +977,6 @@ mem_cgroup_create(struct cgroup_subsys *
 
 	if (unlikely((cont->parent) == NULL)) {
 		mem = &init_mem_cgroup;
-		init_mm.mem_cgroup = mem;
 		page_cgroup_cache = KMEM_CACHE(page_cgroup, SLAB_PANIC);
 	} else
 		mem = kzalloc(sizeof(struct mem_cgroup), GFP_KERNEL);
@@ -1072,7 +1058,6 @@ static void mem_cgroup_move_task(struct 
 		goto out;
 
 	css_get(&mem->css);
-	rcu_assign_pointer(mm->mem_cgroup, mem);
 	css_put(&old_mem->css);
 
 out:
diff -puN include/linux/sched.h~memory-controller-add-mm-owner include/linux/sched.h
--- linux-2.6.25-rc5/include/linux/sched.h~memory-controller-add-mm-owner	2008-03-28 09:30:47.000000000 +0530
+++ linux-2.6.25-rc5-balbir/include/linux/sched.h	2008-04-01 12:58:24.000000000 +0530
@@ -2130,6 +2130,20 @@ static inline void migration_init(void)
 
 #define TASK_STATE_TO_CHAR_STR "RSDTtZX"
 
+#ifdef CONFIG_MM_OWNER
+extern void mm_update_next_owner(struct mm_struct *mm);
+extern void mm_init_owner(struct mm_struct *mm, struct task_struct *p);
+#else
+static inline void
+mm_update_next_owner(struct mm_struct *mm, struct task_struct *p)
+{
+}
+
+static inline void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
+{
+}
+#endif /* CONFIG_MM_OWNER */
+
 #endif /* __KERNEL__ */
 
 #endif
diff -puN kernel/exit.c~memory-controller-add-mm-owner kernel/exit.c
--- linux-2.6.25-rc5/kernel/exit.c~memory-controller-add-mm-owner	2008-03-28 09:30:47.000000000 +0530
+++ linux-2.6.25-rc5-balbir/kernel/exit.c	2008-04-01 18:05:02.000000000 +0530
@@ -579,6 +579,71 @@ void exit_fs(struct task_struct *tsk)
 
 EXPORT_SYMBOL_GPL(exit_fs);
 
+#ifdef CONFIG_MM_OWNER
+/*
+ * Task p is exiting and it owned p, so lets find a new owner for it
+ */
+static inline int
+mm_need_new_owner(struct mm_struct *mm, struct task_struct *p)
+{
+	int ret;
+
+	ret = (mm && (atomic_read(&mm->mm_users) > 1) && (mm->owner == p) &&
+		!delay_group_leader(p));
+	return ret;
+}
+
+void mm_update_next_owner(struct mm_struct *mm)
+{
+	struct task_struct *c, *g, *p = current;
+
+retry:
+	if (!mm_need_new_owner(mm, p))
+		return;
+
+	rcu_read_lock();
+	/*
+	 * Search in the children
+	 */
+	list_for_each_entry(c, &p->children, sibling) {
+		if (c->mm == mm)
+			goto assign_new_owner;
+	}
+
+	/*
+	 * Search in the siblings
+	 */
+	list_for_each_entry(c, &p->parent->children, sibling) {
+		if (c->mm == mm)
+			goto assign_new_owner;
+	}
+
+	/*
+	 * Search through everything else. We should not get
+	 * here often
+	 */
+	do_each_thread(g, c) {
+		if (c->mm == mm)
+			goto assign_new_owner;
+	} while_each_thread(g, c);
+
+	rcu_read_unlock();
+	return;
+
+assign_new_owner:
+	rcu_read_unlock();
+	BUG_ON(c == p);
+	task_lock(c);
+	if (c->mm != mm) {
+		task_unlock(c);
+		goto retry;
+	}
+	cgroup_mm_owner_callbacks(mm->owner, c);
+	mm->owner = c;
+	task_unlock(c);
+}
+#endif /* CONFIG_MM_OWNER */
+
 /*
  * Turn us into a lazy TLB process if we
  * aren't already..
@@ -618,6 +683,7 @@ static void exit_mm(struct task_struct *
 	/* We don't want this task to be frozen prematurely */
 	clear_freeze_flag(tsk);
 	task_unlock(tsk);
+	mm_update_next_owner(mm);
 	mmput(mm);
 }
 
diff -puN init/Kconfig~memory-controller-add-mm-owner init/Kconfig
--- linux-2.6.25-rc5/init/Kconfig~memory-controller-add-mm-owner	2008-03-28 09:30:47.000000000 +0530
+++ linux-2.6.25-rc5-balbir/init/Kconfig	2008-04-01 08:58:57.000000000 +0530
@@ -364,9 +364,21 @@ config RESOURCE_COUNTERS
           infrastructure that works with cgroups
 	depends on CGROUPS
 
+config MM_OWNER
+	bool "Enable ownership of mm structure"
+	help
+	  This option enables mm_struct's to have an owner. The advantage
+	  of this approach is that it allows for several independent memory
+	  based cgorup controllers to co-exist independently without too
+	  much space overhead
+
+	  This feature adds fork/exit overhead. So enable this only if
+	  you need resource controllers
+
 config CGROUP_MEM_RES_CTLR
 	bool "Memory Resource Controller for Control Groups"
 	depends on CGROUPS && RESOURCE_COUNTERS
+	select MM_OWNER
 	help
 	  Provides a memory resource controller that manages both page cache and
 	  RSS memory.
@@ -379,6 +391,9 @@ config CGROUP_MEM_RES_CTLR
 	  Only enable when you're ok with these trade offs and really
 	  sure you need the memory resource controller.
 
+	  This config option also selects MM_OWNER config option, which
+	  could in turn add some fork/exit overhead.
+
 config SYSFS_DEPRECATED
 	bool
 
diff -puN include/linux/init_task.h~memory-controller-add-mm-owner include/linux/init_task.h
--- linux-2.6.25-rc5/include/linux/init_task.h~memory-controller-add-mm-owner	2008-03-28 18:14:36.000000000 +0530
+++ linux-2.6.25-rc5-balbir/include/linux/init_task.h	2008-03-31 23:29:33.000000000 +0530
@@ -57,6 +57,7 @@
 	.page_table_lock =  __SPIN_LOCK_UNLOCKED(name.page_table_lock),	\
 	.mmlist		= LIST_HEAD_INIT(name.mmlist),		\
 	.cpu_vm_mask	= CPU_MASK_ALL,				\
+	.owner		= &init_task,				\
 }
 
 #define INIT_SIGNALS(sig) {						\
@@ -199,7 +200,6 @@ extern struct group_info init_groups;
 	INIT_LOCKDEP							\
 }
 
-
 #define INIT_CPU_TIMERS(cpu_timers)					\
 {									\
 	LIST_HEAD_INIT(cpu_timers[0]),					\
diff -puN fs/exec.c~memory-controller-add-mm-owner fs/exec.c
--- linux-2.6.25-rc5/fs/exec.c~memory-controller-add-mm-owner	2008-03-28 20:38:20.000000000 +0530
+++ linux-2.6.25-rc5-balbir/fs/exec.c	2008-03-29 11:34:02.000000000 +0530
@@ -735,6 +735,7 @@ static int exec_mmap(struct mm_struct *m
 	tsk->active_mm = mm;
 	activate_mm(active_mm, mm);
 	task_unlock(tsk);
+	mm_update_next_owner(mm);
 	arch_pick_mmap_layout(mm);
 	if (old_mm) {
 		up_read(&old_mm->mmap_sem);
diff -puN kernel/cgroup.c~memory-controller-add-mm-owner kernel/cgroup.c
--- linux-2.6.25-rc5/kernel/cgroup.c~memory-controller-add-mm-owner	2008-04-01 13:08:41.000000000 +0530
+++ linux-2.6.25-rc5-balbir/kernel/cgroup.c	2008-04-01 15:51:05.000000000 +0530
@@ -118,6 +118,7 @@ static int root_count;
  * be called.
  */
 static int need_forkexit_callback;
+static int need_mm_owner_callback;
 
 /* convenient tests for these bits */
 inline int cgroup_is_removed(const struct cgroup *cgrp)
@@ -2481,6 +2482,7 @@ static void __init cgroup_init_subsys(st
 	}
 
 	need_forkexit_callback |= ss->fork || ss->exit;
+	need_mm_owner_callback |= !!ss->mm_owner_changed;
 
 	ss->active = 1;
 }
@@ -2717,6 +2719,28 @@ void cgroup_fork_callbacks(struct task_s
 	}
 }
 
+#ifdef CONFIG_MM_OWNER
+/**
+ * cgroup_mm_owner_callbacks - run callbacks when the mm->owner changes
+ * @p: the new owner
+ *
+ * Called on every change to mm->owner. mm_init_owner() does not
+ * invoke this routine, since it assigns the mm->owner the first time
+ * and does not change it.
+ */
+void cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
+{
+	if (need_mm_owner_callback) {
+		int i;
+		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+			struct cgroup_subsys *ss = subsys[i];
+			if (ss->mm_owner_changed)
+				ss->mm_owner_changed(ss, old, new);
+		}
+	}
+}
+#endif /* CONFIG_MM_OWNER */
+
 /**
  * cgroup_post_fork - called on a new task after adding it to the task list
  * @child: the task in question
diff -puN include/linux/cgroup.h~memory-controller-add-mm-owner include/linux/cgroup.h
--- linux-2.6.25-rc5/include/linux/cgroup.h~memory-controller-add-mm-owner	2008-04-01 13:08:45.000000000 +0530
+++ linux-2.6.25-rc5-balbir/include/linux/cgroup.h	2008-04-01 15:40:17.000000000 +0530
@@ -292,6 +292,12 @@ struct cgroup_subsys {
 			struct cgroup *cgrp);
 	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
+	/*
+	 * This routine is called with the task_lock of mm->owner held
+	 */
+	void (*mm_owner_changed)(struct cgroup_subsys *ss,
+					struct task_struct *old,
+					struct task_struct *new);
 	int subsys_id;
 	int active;
 	int disabled;
@@ -377,4 +383,13 @@ static inline int cgroupstats_build(stru
 
 #endif /* !CONFIG_CGROUPS */
 
+#ifdef CONFIG_MM_OWNER
+extern void
+cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new);
+#else /* !CONFIG_MM_OWNER */
+static inline void
+cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
+{
+}
+#endif /* CONFIG_MM_OWNER */
 #endif /* _LINUX_CGROUP_H */
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-01 12:43 [RFC][-mm] Add an owner to the mm_struct (v4) Balbir Singh
@ 2008-04-01 16:00 ` Pekka Enberg
  2008-04-01 16:15   ` Balbir Singh
  2008-04-02  0:31 ` KAMEZAWA Hiroyuki
  2008-04-02 18:53 ` Balbir Singh
  2 siblings, 1 reply; 13+ messages in thread
From: Pekka Enberg @ 2008-04-01 16:00 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Paul Menage, Pavel Emelianov, Hugh Dickins, Sudhir Kumar,
	YAMAMOTO Takashi, lizf, linux-kernel, taka, linux-mm,
	David Rientjes, Andrew Morton, KAMEZAWA Hiroyuki

Hi,

On Tue, Apr 1, 2008 at 3:43 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>  @@ -227,8 +227,9 @@ struct mm_struct {
>         /* aio bits */
>         rwlock_t                ioctx_list_lock;
>         struct kioctx           *ioctx_list;
>  -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
>  -       struct mem_cgroup *mem_cgroup;
>  +#ifdef CONFIG_MM_OWNER
>  +       struct task_struct *owner;      /* The thread group leader that */
>  +                                       /* owns the mm_struct.          */
>   #endif

Yes, please. This is useful for the revokeat() patches as well. I
currently need a big ugly loop to scan each task so I can break COW of
private pages.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-01 16:00 ` Pekka Enberg
@ 2008-04-01 16:15   ` Balbir Singh
  0 siblings, 0 replies; 13+ messages in thread
From: Balbir Singh @ 2008-04-01 16:15 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Paul Menage, Pavel Emelianov, Hugh Dickins, Sudhir Kumar,
	YAMAMOTO Takashi, lizf, linux-kernel, taka, linux-mm,
	David Rientjes, Andrew Morton, KAMEZAWA Hiroyuki

Pekka Enberg wrote:
> Hi,
> 
> On Tue, Apr 1, 2008 at 3:43 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>  @@ -227,8 +227,9 @@ struct mm_struct {
>>         /* aio bits */
>>         rwlock_t                ioctx_list_lock;
>>         struct kioctx           *ioctx_list;
>>  -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
>>  -       struct mem_cgroup *mem_cgroup;
>>  +#ifdef CONFIG_MM_OWNER
>>  +       struct task_struct *owner;      /* The thread group leader that */
>>  +                                       /* owns the mm_struct.          */
>>   #endif
> 
> Yes, please. This is useful for the revokeat() patches as well. I
> currently need a big ugly loop to scan each task so I can break COW of
> private pages.

Hi, Pekka,

It's good to know that this will be useful. I think understand your use case of
having to walk the entire tasklist to break COW of private pages, having the
owner information right in the mm_struct is definitely useful.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-01 12:43 [RFC][-mm] Add an owner to the mm_struct (v4) Balbir Singh
  2008-04-01 16:00 ` Pekka Enberg
@ 2008-04-02  0:31 ` KAMEZAWA Hiroyuki
  2008-04-02  3:25   ` Balbir Singh
  2008-04-02 18:53 ` Balbir Singh
  2 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-02  0:31 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Paul Menage, Pavel Emelianov, Hugh Dickins, Sudhir Kumar,
	YAMAMOTO Takashi, lizf, linux-kernel, taka, linux-mm,
	David Rientjes, Andrew Morton


On Tue, 01 Apr 2008 18:13:12 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> +	/*
> +	 * Search in the children
> +	 */
> +	list_for_each_entry(c, &p->children, sibling) {
> +		if (c->mm == mm)
> +			goto assign_new_owner;
> +	}
> +
This finds new owner when "current" is multi-threaded and
"current" called pthread_create(), right ?

> +	/*
> +	 * Search in the siblings
> +	 */
> +	list_for_each_entry(c, &p->parent->children, sibling) {
> +		if (c->mm == mm)
> +			goto assign_new_owner;
> +	}
> +
This finds new owner when "current" is multi-threaded and
"current" is just a child (means it doesn't call pthread_create()) ?


> +	/*
> +	 * Search through everything else. We should not get
> +	 * here often
> +	 */
> +	do_each_thread(g, c) {
> +		if (c->mm == mm)
> +			goto assign_new_owner;
> +	} while_each_thread(g, c);

Doing above in synchronized manner seems too heavy.
When this happen ? or Can this be done in lazy "on-demand" manner ?

+assign_new_owner:
+	rcu_read_unlock();
+	BUG_ON(c == p);
+	task_lock(c);
+	if (c->mm != mm) {
+		task_unlock(c);
+		goto retry;
+	}
+	cgroup_mm_owner_callbacks(mm->owner, c);
+	mm->owner = c;
+	task_unlock(c);
+}
Why rcu_read_unlock() before changing owner ? Is it safe ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-02  0:31 ` KAMEZAWA Hiroyuki
@ 2008-04-02  3:25   ` Balbir Singh
  2008-04-02  4:53     ` KAMEZAWA Hiroyuki
  2008-04-02 19:27     ` Paul Menage
  0 siblings, 2 replies; 13+ messages in thread
From: Balbir Singh @ 2008-04-02  3:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Paul Menage, Pavel Emelianov, Hugh Dickins, Sudhir Kumar,
	YAMAMOTO Takashi, lizf, linux-kernel, taka, linux-mm,
	David Rientjes, Andrew Morton

KAMEZAWA Hiroyuki wrote:
> On Tue, 01 Apr 2008 18:13:12 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> +	/*
>> +	 * Search in the children
>> +	 */
>> +	list_for_each_entry(c, &p->children, sibling) {
>> +		if (c->mm == mm)
>> +			goto assign_new_owner;
>> +	}
>> +
> This finds new owner when "current" is multi-threaded and
> "current" called pthread_create(), right ?
> 

No, it won't find the new owner if we have CLONE_THREAD passed while creating
threads. mm_need_new_owner() checks for !delay_group_leader(). If the
group_leader is set, we don't need a new owner, it stays around till all threads
exit.

>> +	/*
>> +	 * Search in the siblings
>> +	 */
>> +	list_for_each_entry(c, &p->parent->children, sibling) {
>> +		if (c->mm == mm)
>> +			goto assign_new_owner;
>> +	}
>> +
> This finds new owner when "current" is multi-threaded and
> "current" is just a child (means it doesn't call pthread_create()) ?
> 

Ditto

> 
>> +	/*
>> +	 * Search through everything else. We should not get
>> +	 * here often
>> +	 */
>> +	do_each_thread(g, c) {
>> +		if (c->mm == mm)
>> +			goto assign_new_owner;
>> +	} while_each_thread(g, c);
> 
> Doing above in synchronized manner seems too heavy.
> When this happen ? or Can this be done in lazy "on-demand" manner ?
> 

Do you mean under task_lock()?

> +assign_new_owner:
> +	rcu_read_unlock();
> +	BUG_ON(c == p);
> +	task_lock(c);
> +	if (c->mm != mm) {
> +		task_unlock(c);
> +		goto retry;
> +	}
> +	cgroup_mm_owner_callbacks(mm->owner, c);
> +	mm->owner = c;
> +	task_unlock(c);
> +}
> Why rcu_read_unlock() before changing owner ? Is it safe ?
> 

It should be safe, since we take task_lock(), but to be doubly sure, we can drop
rcu read lock after taking the task_lock().

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-02  3:25   ` Balbir Singh
@ 2008-04-02  4:53     ` KAMEZAWA Hiroyuki
  2008-04-02  6:40       ` Balbir Singh
  2008-04-02 19:27     ` Paul Menage
  1 sibling, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-02  4:53 UTC (permalink / raw)
  To: balbir
  Cc: Paul Menage, Pavel Emelianov, Hugh Dickins, Sudhir Kumar,
	YAMAMOTO Takashi, lizf, linux-kernel, taka, linux-mm,
	David Rientjes, Andrew Morton

On Wed, 02 Apr 2008 08:55:34 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> > 
> >> +	/*
> >> +	 * Search through everything else. We should not get
> >> +	 * here often
> >> +	 */
> >> +	do_each_thread(g, c) {
> >> +		if (c->mm == mm)
> >> +			goto assign_new_owner;
> >> +	} while_each_thread(g, c);
> > 
> > Doing above in synchronized manner seems too heavy.
> > When this happen ? or Can this be done in lazy "on-demand" manner ?
> > 
> 
> Do you mean under task_lock()?
> 
No, scanning itself. 
How rarely this scan happens under a server which has 10000- threads ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-02  4:53     ` KAMEZAWA Hiroyuki
@ 2008-04-02  6:40       ` Balbir Singh
  0 siblings, 0 replies; 13+ messages in thread
From: Balbir Singh @ 2008-04-02  6:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Paul Menage, Pavel Emelianov, Hugh Dickins, Sudhir Kumar,
	YAMAMOTO Takashi, lizf, linux-kernel, taka, linux-mm,
	David Rientjes, Andrew Morton

KAMEZAWA Hiroyuki wrote:
> On Wed, 02 Apr 2008 08:55:34 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>>>> +	/*
>>>> +	 * Search through everything else. We should not get
>>>> +	 * here often
>>>> +	 */
>>>> +	do_each_thread(g, c) {
>>>> +		if (c->mm == mm)
>>>> +			goto assign_new_owner;
>>>> +	} while_each_thread(g, c);
>>> Doing above in synchronized manner seems too heavy.
>>> When this happen ? or Can this be done in lazy "on-demand" manner ?
>>>
>> Do you mean under task_lock()?
>>
> No, scanning itself. 
> How rarely this scan happens under a server which has 10000- threads ?
> 

This routine will be called every time a thread exits, but will quickly exit
after checking mm_need_new_owner()


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-01 12:43 [RFC][-mm] Add an owner to the mm_struct (v4) Balbir Singh
  2008-04-01 16:00 ` Pekka Enberg
  2008-04-02  0:31 ` KAMEZAWA Hiroyuki
@ 2008-04-02 18:53 ` Balbir Singh
  2008-04-02 19:53   ` Paul Menage
  2 siblings, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2008-04-02 18:53 UTC (permalink / raw)
  To: Paul Menage, Pavel Emelianov
  Cc: Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf, linux-kernel,
	taka, linux-mm, David Rientjes, Andrew Morton, KAMEZAWA Hiroyuki

Balbir Singh wrote:
> Changelog v3
> ------------
> 
> 1. Add mm->owner change callbacks using cgroups
> 
> This patch removes the mem_cgroup member from mm_struct and instead adds
> an owner. This approach was suggested by Paul Menage. The advantage of
> this approach is that, once the mm->owner is known, using the subsystem
> id, the cgroup can be determined. It also allows several control groups
> that are virtually grouped by mm_struct, to exist independent of the memory
> controller i.e., without adding mem_cgroup's for each controller,
> to mm_struct.
> 
> A new config option CONFIG_MM_OWNER is added and the memory resource
> controller selects this config option.
> 
> NOTE: This patch was developed on top of 2.6.25-rc5-mm1 and is applied on top
> of the memory-controller-move-to-own-slab patch (which is already present
> in the Andrew's patchset).
> 
> This patch also adds cgroup callbacks to notify subsystems when mm->owner
> changes. The mm_cgroup_changed callback is called with the task_lock()
> of the new task held and is called just prior to changing the mm->owner.
> 
> I am indebted to Paul Menage for the several reviews of this patchset
> and helping me make it lighter and simpler.
> 
> This patch was tested on a powerpc box.
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>

Hi,

So far I've heard no objections or seen any review suggestions. Paul if you are
OK with this patch, I'll ask Andrew to include it in -mm.

People waiting on this patch

1. Pekka Enberg for revoke* syscalls
2. Serge Hallyn for swap namespaces
3. Myself to implement the rlimit controller for cgroups

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-02  3:25   ` Balbir Singh
  2008-04-02  4:53     ` KAMEZAWA Hiroyuki
@ 2008-04-02 19:27     ` Paul Menage
  1 sibling, 0 replies; 13+ messages in thread
From: Paul Menage @ 2008-04-02 19:27 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Pavel Emelianov, Hugh Dickins, Sudhir Kumar,
	YAMAMOTO Takashi, lizf, linux-kernel, taka, linux-mm,
	David Rientjes, Andrew Morton

On Tue, Apr 1, 2008 at 8:25 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>  > +assign_new_owner:
>  > +     rcu_read_unlock();
>  > +     BUG_ON(c == p);
>  > +     task_lock(c);
>  > +     if (c->mm != mm) {
>  > +             task_unlock(c);
>  > +             goto retry;
>  > +     }
>  > +     cgroup_mm_owner_callbacks(mm->owner, c);
>  > +     mm->owner = c;
>  > +     task_unlock(c);
>  > +}
>  > Why rcu_read_unlock() before changing owner ? Is it safe ?
>  >
>
>  It should be safe, since we take task_lock(), but to be doubly sure, we can drop
>  rcu read lock after taking the task_lock().
>

I agree with Kamezawa - the task can technically disappear as soon as
we leave the RCU critical section. (In practice, it'll only happen
with CONFIG_PREEMPT).

Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-02 18:53 ` Balbir Singh
@ 2008-04-02 19:53   ` Paul Menage
  2008-04-03  4:05     ` Balbir Singh
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Menage @ 2008-04-02 19:53 UTC (permalink / raw)
  To: balbir
  Cc: Pavel Emelianov, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	lizf, linux-kernel, taka, linux-mm, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki

On Wed, Apr 2, 2008 at 11:53 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>  So far I've heard no objections or seen any review suggestions. Paul if you are
>  OK with this patch, I'll ask Andrew to include it in -mm.

My only thoughts were:

- I think I'd still prefer CONFIG_MM_OWNER to be auto-selected rather
than manually configured, but it's not a huge deal either way.

- in theory I think we should goto retry if we get to the end of
mm_update_next_owner() without finding any other owner. Otherwise we
could miss another user if we race with one process forking a new
child and then exiting?

- I was looking through the exit code trying to convince myself that
current is still on the tasklist until after it makes this call. If it
isn't then we could have trouble finding the new owner. But I can't
figure out for sure exactly at what point we come off the tasklist.

- I think we only need the cgroup callback in the event that
current->cgroups != new_owner->cgroups. (Hmm, have we already been
moved back to the root cgroup by this point? If so, then we'll have no
way to know which cgruop to unaccount from).

Paul

>
>  People waiting on this patch
>
>  1. Pekka Enberg for revoke* syscalls
>  2. Serge Hallyn for swap namespaces
>  3. Myself to implement the rlimit controller for cgroups
>
>
>
>  --
>         Warm Regards,
>         Balbir Singh
>         Linux Technology Center
>         IBM, ISTL
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-02 19:53   ` Paul Menage
@ 2008-04-03  4:05     ` Balbir Singh
  2008-04-03  4:10       ` Paul Menage
  0 siblings, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2008-04-03  4:05 UTC (permalink / raw)
  To: Paul Menage
  Cc: Pavel Emelianov, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	lizf, linux-kernel, taka, linux-mm, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Wed, Apr 2, 2008 at 11:53 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>  So far I've heard no objections or seen any review suggestions. Paul if you are
>>  OK with this patch, I'll ask Andrew to include it in -mm.
> 
> My only thoughts were:
> 
> - I think I'd still prefer CONFIG_MM_OWNER to be auto-selected rather
> than manually configured, but it's not a huge deal either way.
> 

It is auto-selected now by CONFIG_CGROUP_MEM_RES_CTLR in the latest patchset

> - in theory I think we should goto retry if we get to the end of
> mm_update_next_owner() without finding any other owner. Otherwise we
> could miss another user if we race with one process forking a new
> child and then exiting?
> 

When we the current task is exiting and we've verified that we are mm->owner and
we cannot miss the new process since through the process of forking, it would
have added the new process to the tasklist before exiting.

> - I was looking through the exit code trying to convince myself that
> current is still on the tasklist until after it makes this call. If it
> isn't then we could have trouble finding the new owner. But I can't
> figure out for sure exactly at what point we come off the tasklist.
> 

We come off the task list in __unhash_process(), which is in turn called by
release_task() through __exit_signal().

> - I think we only need the cgroup callback in the event that
> current->cgroups != new_owner->cgroups. (Hmm, have we already been
> moved back to the root cgroup by this point? If so, then we'll have no
> way to know which cgruop to unaccount from).
> 

I checked to see that cgroup_exit is called after mm_update_new_owner(). We call
mm_update_new_owner() from exit_mm(). I did not check for current->cgroups !=
new_owner->cgroups, since I did not want to limit the callbacks. An interested
callback can make that check and no-op the callback.

I am going to change the rcu_read_lock(), so that it is released after we take
the task_lock() and repost the patch


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-03  4:05     ` Balbir Singh
@ 2008-04-03  4:10       ` Paul Menage
  2008-04-03  4:32         ` Balbir Singh
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Menage @ 2008-04-03  4:10 UTC (permalink / raw)
  To: balbir
  Cc: Pavel Emelianov, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	lizf, linux-kernel, taka, linux-mm, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki

On Wed, Apr 2, 2008 at 9:05 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>  I checked to see that cgroup_exit is called after mm_update_new_owner(). We call
>  mm_update_new_owner() from exit_mm(). I did not check for current->cgroups !=
>  new_owner->cgroups, since I did not want to limit the callbacks.

No cgroup subsystem should be concerned about mm ownership changes
between tasks in the same cgroup. So I think that's a valid and useful
optimization.

Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][-mm] Add an owner to the mm_struct (v4)
  2008-04-03  4:10       ` Paul Menage
@ 2008-04-03  4:32         ` Balbir Singh
  0 siblings, 0 replies; 13+ messages in thread
From: Balbir Singh @ 2008-04-03  4:32 UTC (permalink / raw)
  To: Paul Menage
  Cc: Pavel Emelianov, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi,
	lizf, linux-kernel, taka, linux-mm, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Wed, Apr 2, 2008 at 9:05 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>  I checked to see that cgroup_exit is called after mm_update_new_owner(). We call
>>  mm_update_new_owner() from exit_mm(). I did not check for current->cgroups !=
>>  new_owner->cgroups, since I did not want to limit the callbacks.
> 
> No cgroup subsystem should be concerned about mm ownership changes
> between tasks in the same cgroup. So I think that's a valid and useful
> optimization.
> 

Hmm. probably.. I'll do that check. Let me post v5 with these changes

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-04-03  4:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-01 12:43 [RFC][-mm] Add an owner to the mm_struct (v4) Balbir Singh
2008-04-01 16:00 ` Pekka Enberg
2008-04-01 16:15   ` Balbir Singh
2008-04-02  0:31 ` KAMEZAWA Hiroyuki
2008-04-02  3:25   ` Balbir Singh
2008-04-02  4:53     ` KAMEZAWA Hiroyuki
2008-04-02  6:40       ` Balbir Singh
2008-04-02 19:27     ` Paul Menage
2008-04-02 18:53 ` Balbir Singh
2008-04-02 19:53   ` Paul Menage
2008-04-03  4:05     ` Balbir Singh
2008-04-03  4:10       ` Paul Menage
2008-04-03  4:32         ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).