LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy.
@ 2008-11-12 4:30 KAMEZAWA Hiroyuki
2008-11-12 4:53 ` Balbir Singh
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-12 4:30 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, balbir, menage, nishimura, lizf, akpm
Balbir, Paul, Li, How about this ?
=
As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
It has following lock sequence.
cgroup_mutex (cgroup_rmdir)
-> pre_destroy
-> mem_cgroup_pre_destroy
-> force_empty
-> lru_add_drain_all->
-> schedule_work_on_all_cpus
-> get_online_cpus -> cpuhotplug.lock.
But, cpuset has following.
cpu_hotplug.lock (call notifier)
-> cgroup_mutex. (within notifier)
Then, this lock sequence should be fixed.
Considering how pre_destroy works, it's not necessary to holding
cgroup_mutex() while calling it.
As side effect, we don't have to wait at this mutex while memcg's force_empty
works.(it can be long when there are tons of pages.)
Note: memcg is an only user of pre_destroy, now.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
kernel/cgroup.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
Index: mmotm-2.6.28-Nov10/kernel/cgroup.c
===================================================================
--- mmotm-2.6.28-Nov10.orig/kernel/cgroup.c
+++ mmotm-2.6.28-Nov10/kernel/cgroup.c
@@ -2475,10 +2475,7 @@ static int cgroup_rmdir(struct inode *un
mutex_unlock(&cgroup_mutex);
return -EBUSY;
}
-
- parent = cgrp->parent;
- root = cgrp->root;
- sb = root->sb;
+ mutex_unlock(&cgroup_mutex);
/*
* Call pre_destroy handlers of subsys. Notify subsystems
@@ -2486,7 +2483,14 @@ static int cgroup_rmdir(struct inode *un
*/
cgroup_call_pre_destroy(cgrp);
- if (cgroup_has_css_refs(cgrp)) {
+ mutex_lock(&cgroup_mutex);
+ parent = cgrp->parent;
+ root = cgrp->root;
+ sb = root->sb;
+
+ if (atomic_read(&cgrp->count)
+ || list_empty(&cgrp->children)
+ || cgroup_has_css_refs(cgrp)) {
mutex_unlock(&cgroup_mutex);
return -EBUSY;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy.
2008-11-12 4:30 [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy KAMEZAWA Hiroyuki
@ 2008-11-12 4:53 ` Balbir Singh
2008-11-12 4:55 ` KAMEZAWA Hiroyuki
2008-11-12 6:58 ` Li Zefan
2008-11-12 7:32 ` [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2) KAMEZAWA Hiroyuki
2 siblings, 1 reply; 7+ messages in thread
From: Balbir Singh @ 2008-11-12 4:53 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, menage, nishimura, lizf, akpm
KAMEZAWA Hiroyuki wrote:
> Balbir, Paul, Li, How about this ?
> =
> As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
>
> It has following lock sequence.
>
> cgroup_mutex (cgroup_rmdir)
> -> pre_destroy
> -> mem_cgroup_pre_destroy
> -> force_empty
> -> lru_add_drain_all->
> -> schedule_work_on_all_cpus
> -> get_online_cpus -> cpuhotplug.lock.
>
> But, cpuset has following.
> cpu_hotplug.lock (call notifier)
> -> cgroup_mutex. (within notifier)
>
> Then, this lock sequence should be fixed.
>
> Considering how pre_destroy works, it's not necessary to holding
> cgroup_mutex() while calling it.
>
> As side effect, we don't have to wait at this mutex while memcg's force_empty
> works.(it can be long when there are tons of pages.)
>
> Note: memcg is an only user of pre_destroy, now.
>
I thought about this and it seems promising. My concern is that with
cgroup_mutex given, the state of cgroup within pre-destroy will be
unpredictable. I suspect, if pre-destory really needs cgroup_mutex, we can hold
it within pre-destroy.
BTW, your last check, does not seem right
+ if (atomic_read(&cgrp->count)
+ || list_empty(&cgrp->children)
Why should list_empty() result in EBUSY, shouldn't it be !list_empty()?
+ || cgroup_has_css_refs(cgrp)) {
--
Balbir
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy.
2008-11-12 4:53 ` Balbir Singh
@ 2008-11-12 4:55 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-12 4:55 UTC (permalink / raw)
To: balbir; +Cc: linux-kernel, linux-mm, menage, nishimura, lizf, akpm
On Wed, 12 Nov 2008 10:23:55 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > Balbir, Paul, Li, How about this ?
> > =
> > As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
> >
> > It has following lock sequence.
> >
> > cgroup_mutex (cgroup_rmdir)
> > -> pre_destroy
> > -> mem_cgroup_pre_destroy
> > -> force_empty
> > -> lru_add_drain_all->
> > -> schedule_work_on_all_cpus
> > -> get_online_cpus -> cpuhotplug.lock.
> >
> > But, cpuset has following.
> > cpu_hotplug.lock (call notifier)
> > -> cgroup_mutex. (within notifier)
> >
> > Then, this lock sequence should be fixed.
> >
> > Considering how pre_destroy works, it's not necessary to holding
> > cgroup_mutex() while calling it.
> >
> > As side effect, we don't have to wait at this mutex while memcg's force_empty
> > works.(it can be long when there are tons of pages.)
> >
> > Note: memcg is an only user of pre_destroy, now.
> >
>
> I thought about this and it seems promising. My concern is that with
> cgroup_mutex given, the state of cgroup within pre-destroy will be
> unpredictable. I suspect, if pre-destory really needs cgroup_mutex, we can hold
> it within pre-destroy.
>
I agree.
> BTW, your last check, does not seem right
>
> + if (atomic_read(&cgrp->count)
> + || list_empty(&cgrp->children)
>
> Why should list_empty() result in EBUSY, shouldn't it be !list_empty()?
>
> + || cgroup_has_css_refs(cgrp)) {
>
Oh, my bad...
will fix soon.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy.
2008-11-12 4:30 [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy KAMEZAWA Hiroyuki
2008-11-12 4:53 ` Balbir Singh
@ 2008-11-12 6:58 ` Li Zefan
2008-11-12 8:15 ` KAMEZAWA Hiroyuki
2008-11-12 7:32 ` [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2) KAMEZAWA Hiroyuki
2 siblings, 1 reply; 7+ messages in thread
From: Li Zefan @ 2008-11-12 6:58 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, balbir, menage, nishimura, akpm
KAMEZAWA Hiroyuki wrote:
> Balbir, Paul, Li, How about this ?
> =
> As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
>
> It has following lock sequence.
>
> cgroup_mutex (cgroup_rmdir)
> -> pre_destroy
> -> mem_cgroup_pre_destroy
> -> force_empty
> -> lru_add_drain_all->
> -> schedule_work_on_all_cpus
> -> get_online_cpus -> cpuhotplug.lock.
>
> But, cpuset has following.
> cpu_hotplug.lock (call notifier)
> -> cgroup_mutex. (within notifier)
>
> Then, this lock sequence should be fixed.
>
> Considering how pre_destroy works, it's not necessary to holding
> cgroup_mutex() while calling it.
>
I think it's safe to call cgroup_call_pre_destroy() without cgroup_lock.
If cgroup_call_pre_destroy() gets called, it means the cgroup fs has sub-dirs,
so any remount/umount will fail, which means root->subsys_list won't be
changed during rmdir(), so using for_each_subsys() in cgroup_call_pre_destroy()
is safe.
> As side effect, we don't have to wait at this mutex while memcg's force_empty
> works.(it can be long when there are tons of pages.)
>
> Note: memcg is an only user of pre_destroy, now.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> ---
> kernel/cgroup.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> Index: mmotm-2.6.28-Nov10/kernel/cgroup.c
> ===================================================================
> --- mmotm-2.6.28-Nov10.orig/kernel/cgroup.c
> +++ mmotm-2.6.28-Nov10/kernel/cgroup.c
> @@ -2475,10 +2475,7 @@ static int cgroup_rmdir(struct inode *un
> mutex_unlock(&cgroup_mutex);
> return -EBUSY;
> }
> -
> - parent = cgrp->parent;
> - root = cgrp->root;
> - sb = root->sb;
> + mutex_unlock(&cgroup_mutex);
>
> /*
> * Call pre_destroy handlers of subsys. Notify subsystems
> @@ -2486,7 +2483,14 @@ static int cgroup_rmdir(struct inode *un
> */
> cgroup_call_pre_destroy(cgrp);
>
> - if (cgroup_has_css_refs(cgrp)) {
> + mutex_lock(&cgroup_mutex);
> + parent = cgrp->parent;
> + root = cgrp->root;
> + sb = root->sb;
> +
> + if (atomic_read(&cgrp->count)
> + || list_empty(&cgrp->children)
> + || cgroup_has_css_refs(cgrp)) {
> mutex_unlock(&cgroup_mutex);
> return -EBUSY;
> }
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2)
2008-11-12 4:30 [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy KAMEZAWA Hiroyuki
2008-11-12 4:53 ` Balbir Singh
2008-11-12 6:58 ` Li Zefan
@ 2008-11-12 7:32 ` KAMEZAWA Hiroyuki
2008-11-12 11:26 ` Balbir Singh
2 siblings, 1 reply; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-12 7:32 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-kernel, linux-mm, balbir, menage, nishimura, lizf, akpm
This is fixed one. Thank you for all help.
Regards,
-Kame
==
As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
It has following lock sequence.
cgroup_mutex (cgroup_rmdir)
-> pre_destroy -> mem_cgroup_pre_destroy-> force_empty
-> cpu_hotplug.lock. (lru_add_drain_all->
schedule_work->
get_online_cpus)
But, cpuset has following.
cpu_hotplug.lock (call notifier)
-> cgroup_mutex. (within notifier)
Then, this lock sequence should be fixed.
Considering how pre_destroy works, it's not necessary to holding
cgroup_mutex() while calling it.
As side effect, we don't have to wait at this mutex while memcg's force_empty
works.(it can be long when there are tons of pages.)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
kernel/cgroup.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
Index: mmotm-2.6.28-Nov10/kernel/cgroup.c
===================================================================
--- mmotm-2.6.28-Nov10.orig/kernel/cgroup.c
+++ mmotm-2.6.28-Nov10/kernel/cgroup.c
@@ -2475,10 +2475,7 @@ static int cgroup_rmdir(struct inode *un
mutex_unlock(&cgroup_mutex);
return -EBUSY;
}
-
- parent = cgrp->parent;
- root = cgrp->root;
- sb = root->sb;
+ mutex_unlock(&cgroup_mutex);
/*
* Call pre_destroy handlers of subsys. Notify subsystems
@@ -2486,7 +2483,14 @@ static int cgroup_rmdir(struct inode *un
*/
cgroup_call_pre_destroy(cgrp);
- if (cgroup_has_css_refs(cgrp)) {
+ mutex_lock(&cgroup_mutex);
+ parent = cgrp->parent;
+ root = cgrp->root;
+ sb = root->sb;
+
+ if (atomic_read(&cgrp->count)
+ || !list_empty(&cgrp->children)
+ || cgroup_has_css_refs(cgrp)) {
mutex_unlock(&cgroup_mutex);
return -EBUSY;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy.
2008-11-12 6:58 ` Li Zefan
@ 2008-11-12 8:15 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-12 8:15 UTC (permalink / raw)
To: Li Zefan; +Cc: linux-kernel, linux-mm, balbir, menage, nishimura, akpm
On Wed, 12 Nov 2008 14:58:06 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > Balbir, Paul, Li, How about this ?
> > =
> > As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
> >
> > It has following lock sequence.
> >
> > cgroup_mutex (cgroup_rmdir)
> > -> pre_destroy
> > -> mem_cgroup_pre_destroy
> > -> force_empty
> > -> lru_add_drain_all->
> > -> schedule_work_on_all_cpus
> > -> get_online_cpus -> cpuhotplug.lock.
> >
> > But, cpuset has following.
> > cpu_hotplug.lock (call notifier)
> > -> cgroup_mutex. (within notifier)
> >
> > Then, this lock sequence should be fixed.
> >
> > Considering how pre_destroy works, it's not necessary to holding
> > cgroup_mutex() while calling it.
> >
>
> I think it's safe to call cgroup_call_pre_destroy() without cgroup_lock.
> If cgroup_call_pre_destroy() gets called, it means the cgroup fs has sub-dirs,
> so any remount/umount will fail, which means root->subsys_list won't be
> changed during rmdir(), so using for_each_subsys() in cgroup_call_pre_destroy()
> is safe.
Thank you for review.
-Kame
>
> > As side effect, we don't have to wait at this mutex while memcg's force_empty
> > works.(it can be long when there are tons of pages.)
> >
> > Note: memcg is an only user of pre_destroy, now.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > ---
> > kernel/cgroup.c | 14 +++++++++-----
> > 1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > Index: mmotm-2.6.28-Nov10/kernel/cgroup.c
> > ===================================================================
> > --- mmotm-2.6.28-Nov10.orig/kernel/cgroup.c
> > +++ mmotm-2.6.28-Nov10/kernel/cgroup.c
> > @@ -2475,10 +2475,7 @@ static int cgroup_rmdir(struct inode *un
> > mutex_unlock(&cgroup_mutex);
> > return -EBUSY;
> > }
> > -
> > - parent = cgrp->parent;
> > - root = cgrp->root;
> > - sb = root->sb;
> > + mutex_unlock(&cgroup_mutex);
> >
> > /*
> > * Call pre_destroy handlers of subsys. Notify subsystems
> > @@ -2486,7 +2483,14 @@ static int cgroup_rmdir(struct inode *un
> > */
> > cgroup_call_pre_destroy(cgrp);
> >
> > - if (cgroup_has_css_refs(cgrp)) {
> > + mutex_lock(&cgroup_mutex);
> > + parent = cgrp->parent;
> > + root = cgrp->root;
> > + sb = root->sb;
> > +
> > + if (atomic_read(&cgrp->count)
> > + || list_empty(&cgrp->children)
> > + || cgroup_has_css_refs(cgrp)) {
> > mutex_unlock(&cgroup_mutex);
> > return -EBUSY;
> > }
> >
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2)
2008-11-12 7:32 ` [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2) KAMEZAWA Hiroyuki
@ 2008-11-12 11:26 ` Balbir Singh
0 siblings, 0 replies; 7+ messages in thread
From: Balbir Singh @ 2008-11-12 11:26 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, menage, nishimura, lizf, akpm
KAMEZAWA Hiroyuki wrote:
> This is fixed one. Thank you for all help.
>
> Regards,
> -Kame
> ==
> As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
>
> It has following lock sequence.
>
> cgroup_mutex (cgroup_rmdir)
> -> pre_destroy -> mem_cgroup_pre_destroy-> force_empty
> -> cpu_hotplug.lock. (lru_add_drain_all->
> schedule_work->
> get_online_cpus)
>
> But, cpuset has following.
> cpu_hotplug.lock (call notifier)
> -> cgroup_mutex. (within notifier)
>
> Then, this lock sequence should be fixed.
>
> Considering how pre_destroy works, it's not necessary to holding
> cgroup_mutex() while calling it.
>
> As side effect, we don't have to wait at this mutex while memcg's force_empty
> works.(it can be long when there are tons of pages.)
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> ---
> kernel/cgroup.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> Index: mmotm-2.6.28-Nov10/kernel/cgroup.c
> ===================================================================
> --- mmotm-2.6.28-Nov10.orig/kernel/cgroup.c
> +++ mmotm-2.6.28-Nov10/kernel/cgroup.c
> @@ -2475,10 +2475,7 @@ static int cgroup_rmdir(struct inode *un
> mutex_unlock(&cgroup_mutex);
> return -EBUSY;
> }
> -
> - parent = cgrp->parent;
> - root = cgrp->root;
> - sb = root->sb;
> + mutex_unlock(&cgroup_mutex);
>
> /*
> * Call pre_destroy handlers of subsys. Notify subsystems
> @@ -2486,7 +2483,14 @@ static int cgroup_rmdir(struct inode *un
> */
> cgroup_call_pre_destroy(cgrp);
>
> - if (cgroup_has_css_refs(cgrp)) {
> + mutex_lock(&cgroup_mutex);
> + parent = cgrp->parent;
> + root = cgrp->root;
> + sb = root->sb;
> +
> + if (atomic_read(&cgrp->count)
> + || !list_empty(&cgrp->children)
> + || cgroup_has_css_refs(cgrp)) {
> mutex_unlock(&cgroup_mutex);
> return -EBUSY;
> }
>
I think the last statement deserves a comment that after re-acquiring the lock,
we need to check if count, children or references changed. Otherwise looks good,
though I've not yet tested it.
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
--
Balbir
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-11-12 11:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-12 4:30 [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy KAMEZAWA Hiroyuki
2008-11-12 4:53 ` Balbir Singh
2008-11-12 4:55 ` KAMEZAWA Hiroyuki
2008-11-12 6:58 ` Li Zefan
2008-11-12 8:15 ` KAMEZAWA Hiroyuki
2008-11-12 7:32 ` [PATCH] [BUGFIX]cgroup: fix potential deadlock in pre_destroy (v2) KAMEZAWA Hiroyuki
2008-11-12 11:26 ` Balbir Singh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).