LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy.
@ 2008-11-12  4:30 KAMEZAWA Hiroyuki
  2008-11-12  4:53 ` Balbir Singh
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-12  4:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, balbir, menage, nishimura, lizf, akpm

Balbir, Paul, Li, How about this ?
=
As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.

It has following lock sequence.

	cgroup_mutex (cgroup_rmdir)
	    -> pre_destroy
		-> mem_cgroup_pre_destroy
			-> force_empty
			   -> lru_add_drain_all->
			      -> schedule_work_on_all_cpus
                                 -> get_online_cpus -> cpuhotplug.lock.

But, cpuset has following.
	cpu_hotplug.lock (call notifier)
		-> cgroup_mutex. (within notifier)

Then, this lock sequence should be fixed.

Considering how pre_destroy works, it's not necessary to holding
cgroup_mutex() while calling it. 

As side effect, we don't have to wait at this mutex while memcg's force_empty
works.(it can be long when there are tons of pages.)

Note: memcg is an only user of pre_destroy, now.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 kernel/cgroup.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Index: mmotm-2.6.28-Nov10/kernel/cgroup.c
===================================================================
--- mmotm-2.6.28-Nov10.orig/kernel/cgroup.c
+++ mmotm-2.6.28-Nov10/kernel/cgroup.c
@@ -2475,10 +2475,7 @@ static int cgroup_rmdir(struct inode *un
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-
-	parent = cgrp->parent;
-	root = cgrp->root;
-	sb = root->sb;
+	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * Call pre_destroy handlers of subsys. Notify subsystems
@@ -2486,7 +2483,14 @@ static int cgroup_rmdir(struct inode *un
 	 */
 	cgroup_call_pre_destroy(cgrp);
 
-	if (cgroup_has_css_refs(cgrp)) {
+	mutex_lock(&cgroup_mutex);
+	parent = cgrp->parent;
+	root = cgrp->root;
+	sb = root->sb;
+
+	if (atomic_read(&cgrp->count)
+	    || list_empty(&cgrp->children)
+	    || cgroup_has_css_refs(cgrp)) {
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy.
  2008-11-12  4:30 [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy KAMEZAWA Hiroyuki
@ 2008-11-12  4:53 ` Balbir Singh
  2008-11-12  4:55   ` KAMEZAWA Hiroyuki
  2008-11-12  6:58 ` Li Zefan
  2008-11-12  7:32 ` [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2) KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 7+ messages in thread
From: Balbir Singh @ 2008-11-12  4:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, menage, nishimura, lizf, akpm

KAMEZAWA Hiroyuki wrote:
> Balbir, Paul, Li, How about this ?
> =
> As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
> 
> It has following lock sequence.
> 
> 	cgroup_mutex (cgroup_rmdir)
> 	    -> pre_destroy
> 		-> mem_cgroup_pre_destroy
> 			-> force_empty
> 			   -> lru_add_drain_all->
> 			      -> schedule_work_on_all_cpus
>                                  -> get_online_cpus -> cpuhotplug.lock.
> 
> But, cpuset has following.
> 	cpu_hotplug.lock (call notifier)
> 		-> cgroup_mutex. (within notifier)
> 
> Then, this lock sequence should be fixed.
> 
> Considering how pre_destroy works, it's not necessary to holding
> cgroup_mutex() while calling it. 
> 
> As side effect, we don't have to wait at this mutex while memcg's force_empty
> works.(it can be long when there are tons of pages.)
> 
> Note: memcg is an only user of pre_destroy, now.
> 

I thought about this and it seems promising. My concern is that with
cgroup_mutex given, the state of cgroup within pre-destroy will be
unpredictable. I suspect, if pre-destory really needs cgroup_mutex, we can hold
it within pre-destroy.

BTW, your last check, does not seem right

+	if (atomic_read(&cgrp->count)
+	    || list_empty(&cgrp->children)

Why should list_empty() result in EBUSY, shouldn't it be !list_empty()?

+	    || cgroup_has_css_refs(cgrp)) {


-- 
	Balbir

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy.
  2008-11-12  4:53 ` Balbir Singh
@ 2008-11-12  4:55   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-12  4:55 UTC (permalink / raw)
  To: balbir; +Cc: linux-kernel, linux-mm, menage, nishimura, lizf, akpm

On Wed, 12 Nov 2008 10:23:55 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > Balbir, Paul, Li, How about this ?
> > =
> > As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
> > 
> > It has following lock sequence.
> > 
> > 	cgroup_mutex (cgroup_rmdir)
> > 	    -> pre_destroy
> > 		-> mem_cgroup_pre_destroy
> > 			-> force_empty
> > 			   -> lru_add_drain_all->
> > 			      -> schedule_work_on_all_cpus
> >                                  -> get_online_cpus -> cpuhotplug.lock.
> > 
> > But, cpuset has following.
> > 	cpu_hotplug.lock (call notifier)
> > 		-> cgroup_mutex. (within notifier)
> > 
> > Then, this lock sequence should be fixed.
> > 
> > Considering how pre_destroy works, it's not necessary to holding
> > cgroup_mutex() while calling it. 
> > 
> > As side effect, we don't have to wait at this mutex while memcg's force_empty
> > works.(it can be long when there are tons of pages.)
> > 
> > Note: memcg is an only user of pre_destroy, now.
> > 
> 
> I thought about this and it seems promising. My concern is that with
> cgroup_mutex given, the state of cgroup within pre-destroy will be
> unpredictable. I suspect, if pre-destory really needs cgroup_mutex, we can hold
> it within pre-destroy.
> 
I agree.

> BTW, your last check, does not seem right
> 
> +	if (atomic_read(&cgrp->count)
> +	    || list_empty(&cgrp->children)
> 
> Why should list_empty() result in EBUSY, shouldn't it be !list_empty()?
> 
> +	    || cgroup_has_css_refs(cgrp)) {
>
Oh, my bad...

will fix soon.

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy.
  2008-11-12  4:30 [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy KAMEZAWA Hiroyuki
  2008-11-12  4:53 ` Balbir Singh
@ 2008-11-12  6:58 ` Li Zefan
  2008-11-12  8:15   ` KAMEZAWA Hiroyuki
  2008-11-12  7:32 ` [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2) KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 7+ messages in thread
From: Li Zefan @ 2008-11-12  6:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, balbir, menage, nishimura, akpm

KAMEZAWA Hiroyuki wrote:
> Balbir, Paul, Li, How about this ?
> =
> As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
> 
> It has following lock sequence.
> 
> 	cgroup_mutex (cgroup_rmdir)
> 	    -> pre_destroy
> 		-> mem_cgroup_pre_destroy
> 			-> force_empty
> 			   -> lru_add_drain_all->
> 			      -> schedule_work_on_all_cpus
>                                  -> get_online_cpus -> cpuhotplug.lock.
> 
> But, cpuset has following.
> 	cpu_hotplug.lock (call notifier)
> 		-> cgroup_mutex. (within notifier)
> 
> Then, this lock sequence should be fixed.
> 
> Considering how pre_destroy works, it's not necessary to holding
> cgroup_mutex() while calling it. 
> 

I think it's safe to call cgroup_call_pre_destroy() without cgroup_lock.
If cgroup_call_pre_destroy() gets called, it means the cgroup fs has sub-dirs,
so any remount/umount will fail, which means root->subsys_list won't be
changed during rmdir(), so using for_each_subsys() in cgroup_call_pre_destroy()
is safe.

> As side effect, we don't have to wait at this mutex while memcg's force_empty
> works.(it can be long when there are tons of pages.)
> 
> Note: memcg is an only user of pre_destroy, now.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  kernel/cgroup.c |   14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> Index: mmotm-2.6.28-Nov10/kernel/cgroup.c
> ===================================================================
> --- mmotm-2.6.28-Nov10.orig/kernel/cgroup.c
> +++ mmotm-2.6.28-Nov10/kernel/cgroup.c
> @@ -2475,10 +2475,7 @@ static int cgroup_rmdir(struct inode *un
>  		mutex_unlock(&cgroup_mutex);
>  		return -EBUSY;
>  	}
> -
> -	parent = cgrp->parent;
> -	root = cgrp->root;
> -	sb = root->sb;
> +	mutex_unlock(&cgroup_mutex);
>  
>  	/*
>  	 * Call pre_destroy handlers of subsys. Notify subsystems
> @@ -2486,7 +2483,14 @@ static int cgroup_rmdir(struct inode *un
>  	 */
>  	cgroup_call_pre_destroy(cgrp);
>  
> -	if (cgroup_has_css_refs(cgrp)) {
> +	mutex_lock(&cgroup_mutex);
> +	parent = cgrp->parent;
> +	root = cgrp->root;
> +	sb = root->sb;
> +
> +	if (atomic_read(&cgrp->count)
> +	    || list_empty(&cgrp->children)
> +	    || cgroup_has_css_refs(cgrp)) {
>  		mutex_unlock(&cgroup_mutex);
>  		return -EBUSY;
>  	}
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2)
  2008-11-12  4:30 [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy KAMEZAWA Hiroyuki
  2008-11-12  4:53 ` Balbir Singh
  2008-11-12  6:58 ` Li Zefan
@ 2008-11-12  7:32 ` KAMEZAWA Hiroyuki
  2008-11-12 11:26   ` Balbir Singh
  2 siblings, 1 reply; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-12  7:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-kernel, linux-mm, balbir, menage, nishimura, lizf, akpm

This is fixed one. Thank you for all help.

Regards,
-Kame
==
As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.

It has following lock sequence.

	cgroup_mutex (cgroup_rmdir)
	    -> pre_destroy -> mem_cgroup_pre_destroy-> force_empty
		-> cpu_hotplug.lock. (lru_add_drain_all->
				      schedule_work->
                                      get_online_cpus)

But, cpuset has following.
	cpu_hotplug.lock (call notifier)
		-> cgroup_mutex. (within notifier)

Then, this lock sequence should be fixed.

Considering how pre_destroy works, it's not necessary to holding
cgroup_mutex() while calling it. 

As side effect, we don't have to wait at this mutex while memcg's force_empty
works.(it can be long when there are tons of pages.)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 kernel/cgroup.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Index: mmotm-2.6.28-Nov10/kernel/cgroup.c
===================================================================
--- mmotm-2.6.28-Nov10.orig/kernel/cgroup.c
+++ mmotm-2.6.28-Nov10/kernel/cgroup.c
@@ -2475,10 +2475,7 @@ static int cgroup_rmdir(struct inode *un
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-
-	parent = cgrp->parent;
-	root = cgrp->root;
-	sb = root->sb;
+	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * Call pre_destroy handlers of subsys. Notify subsystems
@@ -2486,7 +2483,14 @@ static int cgroup_rmdir(struct inode *un
 	 */
 	cgroup_call_pre_destroy(cgrp);
 
-	if (cgroup_has_css_refs(cgrp)) {
+	mutex_lock(&cgroup_mutex);
+	parent = cgrp->parent;
+	root = cgrp->root;
+	sb = root->sb;
+
+	if (atomic_read(&cgrp->count)
+	    || !list_empty(&cgrp->children)
+	    || cgroup_has_css_refs(cgrp)) {
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy.
  2008-11-12  6:58 ` Li Zefan
@ 2008-11-12  8:15   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-12  8:15 UTC (permalink / raw)
  To: Li Zefan; +Cc: linux-kernel, linux-mm, balbir, menage, nishimura, akpm

On Wed, 12 Nov 2008 14:58:06 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > Balbir, Paul, Li, How about this ?
> > =
> > As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
> > 
> > It has following lock sequence.
> > 
> > 	cgroup_mutex (cgroup_rmdir)
> > 	    -> pre_destroy
> > 		-> mem_cgroup_pre_destroy
> > 			-> force_empty
> > 			   -> lru_add_drain_all->
> > 			      -> schedule_work_on_all_cpus
> >                                  -> get_online_cpus -> cpuhotplug.lock.
> > 
> > But, cpuset has following.
> > 	cpu_hotplug.lock (call notifier)
> > 		-> cgroup_mutex. (within notifier)
> > 
> > Then, this lock sequence should be fixed.
> > 
> > Considering how pre_destroy works, it's not necessary to holding
> > cgroup_mutex() while calling it. 
> > 
> 
> I think it's safe to call cgroup_call_pre_destroy() without cgroup_lock.
> If cgroup_call_pre_destroy() gets called, it means the cgroup fs has sub-dirs,
> so any remount/umount will fail, which means root->subsys_list won't be
> changed during rmdir(), so using for_each_subsys() in cgroup_call_pre_destroy()
> is safe.

Thank you for review.

-Kame

> 
> > As side effect, we don't have to wait at this mutex while memcg's force_empty
> > works.(it can be long when there are tons of pages.)
> > 
> > Note: memcg is an only user of pre_destroy, now.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > ---
> >  kernel/cgroup.c |   14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> > 
> > Index: mmotm-2.6.28-Nov10/kernel/cgroup.c
> > ===================================================================
> > --- mmotm-2.6.28-Nov10.orig/kernel/cgroup.c
> > +++ mmotm-2.6.28-Nov10/kernel/cgroup.c
> > @@ -2475,10 +2475,7 @@ static int cgroup_rmdir(struct inode *un
> >  		mutex_unlock(&cgroup_mutex);
> >  		return -EBUSY;
> >  	}
> > -
> > -	parent = cgrp->parent;
> > -	root = cgrp->root;
> > -	sb = root->sb;
> > +	mutex_unlock(&cgroup_mutex);
> >  
> >  	/*
> >  	 * Call pre_destroy handlers of subsys. Notify subsystems
> > @@ -2486,7 +2483,14 @@ static int cgroup_rmdir(struct inode *un
> >  	 */
> >  	cgroup_call_pre_destroy(cgrp);
> >  
> > -	if (cgroup_has_css_refs(cgrp)) {
> > +	mutex_lock(&cgroup_mutex);
> > +	parent = cgrp->parent;
> > +	root = cgrp->root;
> > +	sb = root->sb;
> > +
> > +	if (atomic_read(&cgrp->count)
> > +	    || list_empty(&cgrp->children)
> > +	    || cgroup_has_css_refs(cgrp)) {
> >  		mutex_unlock(&cgroup_mutex);
> >  		return -EBUSY;
> >  	}
> > 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2)
  2008-11-12  7:32 ` [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2) KAMEZAWA Hiroyuki
@ 2008-11-12 11:26   ` Balbir Singh
  0 siblings, 0 replies; 7+ messages in thread
From: Balbir Singh @ 2008-11-12 11:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, menage, nishimura, lizf, akpm

KAMEZAWA Hiroyuki wrote:
> This is fixed one. Thank you for all help.
> 
> Regards,
> -Kame
> ==
> As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
> 
> It has following lock sequence.
> 
> 	cgroup_mutex (cgroup_rmdir)
> 	    -> pre_destroy -> mem_cgroup_pre_destroy-> force_empty
> 		-> cpu_hotplug.lock. (lru_add_drain_all->
> 				      schedule_work->
>                                       get_online_cpus)
> 
> But, cpuset has following.
> 	cpu_hotplug.lock (call notifier)
> 		-> cgroup_mutex. (within notifier)
> 
> Then, this lock sequence should be fixed.
> 
> Considering how pre_destroy works, it's not necessary to holding
> cgroup_mutex() while calling it. 
> 
> As side effect, we don't have to wait at this mutex while memcg's force_empty
> works.(it can be long when there are tons of pages.)
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  kernel/cgroup.c |   14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> Index: mmotm-2.6.28-Nov10/kernel/cgroup.c
> ===================================================================
> --- mmotm-2.6.28-Nov10.orig/kernel/cgroup.c
> +++ mmotm-2.6.28-Nov10/kernel/cgroup.c
> @@ -2475,10 +2475,7 @@ static int cgroup_rmdir(struct inode *un
>  		mutex_unlock(&cgroup_mutex);
>  		return -EBUSY;
>  	}
> -
> -	parent = cgrp->parent;
> -	root = cgrp->root;
> -	sb = root->sb;
> +	mutex_unlock(&cgroup_mutex);
> 
>  	/*
>  	 * Call pre_destroy handlers of subsys. Notify subsystems
> @@ -2486,7 +2483,14 @@ static int cgroup_rmdir(struct inode *un
>  	 */
>  	cgroup_call_pre_destroy(cgrp);
> 
> -	if (cgroup_has_css_refs(cgrp)) {
> +	mutex_lock(&cgroup_mutex);
> +	parent = cgrp->parent;
> +	root = cgrp->root;
> +	sb = root->sb;
> +
> +	if (atomic_read(&cgrp->count)
> +	    || !list_empty(&cgrp->children)
> +	    || cgroup_has_css_refs(cgrp)) {
>  		mutex_unlock(&cgroup_mutex);
>  		return -EBUSY;
>  	}
> 

I think the last statement deserves a comment that after re-acquiring the lock,
we need to check if count, children or references changed. Otherwise looks good,
though I've not yet tested it.

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Balbir

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-11-12 11:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-12  4:30 [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy KAMEZAWA Hiroyuki
2008-11-12  4:53 ` Balbir Singh
2008-11-12  4:55   ` KAMEZAWA Hiroyuki
2008-11-12  6:58 ` Li Zefan
2008-11-12  8:15   ` KAMEZAWA Hiroyuki
2008-11-12  7:32 ` [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2) KAMEZAWA Hiroyuki
2008-11-12 11:26   ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).