LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Failure to release lock after CPU hot-unplug canceled
@ 2007-01-08 17:07 Benjamin Gilbert
  2007-01-09 12:17 ` Heiko Carstens
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Gilbert @ 2007-01-08 17:07 UTC (permalink / raw)
  To: linux-kernel

If a module returns NOTIFY_BAD to a CPU_DOWN_PREPARE callback, subsequent
attempts to take a CPU down cause the write into sysfs to wedge.

This is reproducible in 2.6.20-rc4, but was originally found in 2.6.18.5.

Steps to reproduce:

1.  Load the test module included below
2.  Run the following shell commands as root:

echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu1/online

The second echo command hangs in uninterruptible sleep during the write()
call, and the following appears in dmesg:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.20-rc4-686 #1
-------------------------------------------------------
bash/1699 is trying to acquire lock:
 (cpu_add_remove_lock){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

but task is already holding lock:
 (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (workqueue_mutex){--..}:
       [<c01374b9>] __lock_acquire+0x912/0xa34
       [<c01378f6>] lock_acquire+0x67/0x8a
       [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
       [<c03791eb>] mutex_lock+0x1c/0x1f
       [<c012dc27>] workqueue_cpu_callback+0x10b/0x20c
       [<c037c687>] notifier_call_chain+0x20/0x31
       [<c012a907>] raw_notifier_call_chain+0x8/0xa
       [<c013aa10>] _cpu_down+0x47/0x1f8
       [<c013abe7>] cpu_down+0x26/0x38
       [<c0296462>] store_online+0x27/0x5a
       [<c02935f4>] sysdev_store+0x20/0x25
       [<c0190da1>] sysfs_write_file+0xb3/0xdb
       [<c01602d9>] vfs_write+0xaf/0x163
       [<c0160925>] sys_write+0x3d/0x61
       [<c0102d88>] syscall_call+0x7/0xb
       [<ffffffff>] 0xffffffff

-> #1 (cache_chain_mutex){--..}:
       [<c01374b9>] __lock_acquire+0x912/0xa34
       [<c01378f6>] lock_acquire+0x67/0x8a
       [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
       [<c03791eb>] mutex_lock+0x1c/0x1f
       [<c015dc0d>] cpuup_callback+0x29/0x2d3
       [<c037c687>] notifier_call_chain+0x20/0x31
       [<c012a907>] raw_notifier_call_chain+0x8/0xa
       [<c013a869>] _cpu_up+0x3d/0xbf
       [<c013a911>] cpu_up+0x26/0x38
       [<c010045e>] init+0x7d/0x2d9
       [<c0103a3f>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (cpu_add_remove_lock){--..}:
       [<c01373ba>] __lock_acquire+0x813/0xa34
       [<c01378f6>] lock_acquire+0x67/0x8a
       [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
       [<c03791eb>] mutex_lock+0x1c/0x1f
       [<c013abd2>] cpu_down+0x11/0x38
       [<c0296462>] store_online+0x27/0x5a
       [<c02935f4>] sysdev_store+0x20/0x25
       [<c0190da1>] sysfs_write_file+0xb3/0xdb
       [<c01602d9>] vfs_write+0xaf/0x163
       [<c0160925>] sys_write+0x3d/0x61
       [<c0102d88>] syscall_call+0x7/0xb
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

2 locks held by bash/1699:
 #0:  (cache_chain_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
 #1:  (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c0103dcd>] show_trace_log_lvl+0x1a/0x2f
 [<c01043f4>] show_trace+0x12/0x14
 [<c01044a6>] dump_stack+0x16/0x18
 [<c0135c99>] print_circular_bug_tail+0x5f/0x68
 [<c01373ba>] __lock_acquire+0x813/0xa34
 [<c01378f6>] lock_acquire+0x67/0x8a
 [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
 [<c03791eb>] mutex_lock+0x1c/0x1f
 [<c013abd2>] cpu_down+0x11/0x38
 [<c0296462>] store_online+0x27/0x5a
 [<c02935f4>] sysdev_store+0x20/0x25
 [<c0190da1>] sysfs_write_file+0xb3/0xdb
 [<c01602d9>] vfs_write+0xaf/0x163
 [<c0160925>] sys_write+0x3d/0x61
 [<c0102d88>] syscall_call+0x7/0xb
 =======================

Exiting the bash process after the first echo command instead results in
the following:

=====================================
[ BUG: lock held at task exit time! ]
-------------------------------------
bash/1547 is exiting with locks still held!
2 locks held by bash/1547:
 #0:  (cache_chain_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
 #1:  (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c0103dcd>] show_trace_log_lvl+0x1a/0x2f
 [<c01043f4>] show_trace+0x12/0x14
 [<c01044a6>] dump_stack+0x16/0x18
 [<c01358ba>] debug_check_no_locks_held+0x80/0x86
 [<c01217ed>] do_exit+0x6bf/0x6f5
 [<c0121893>] sys_exit_group+0x0/0x11
 [<c01218a2>] sys_exit_group+0xf/0x11
 [<c0102d88>] syscall_call+0x7/0xb
 =======================

If I can provide any other information to help track this down, please let
me know.

--Benjamin Gilbert

8<---------------------------------------------------------->8

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/cpu.h>

static int cpu_callback(struct notifier_block *nb, unsigned long action,
			void *data)
{
	int cpu=(int)data;
	
	switch (action) {
	case CPU_DOWN_PREPARE:
		printk(KERN_DEBUG "Refusing shutdown of CPU %d\n", cpu);
		return NOTIFY_BAD;
	case CPU_DEAD:
		printk(KERN_DEBUG "CPU %d down\n", cpu);
		break;
	}
	return NOTIFY_OK;
}

static struct notifier_block cpu_notifier = {
	.notifier_call = cpu_callback
};

int __init mod_start(void)
{
	int err;
	
	err=register_cpu_notifier(&cpu_notifier);
	if (err)
		return err;
	return 0;
}
module_init(mod_start);

void __exit mod_shutdown(void)
{
	unregister_cpu_notifier(&cpu_notifier);
}
module_exit(mod_shutdown);

MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Failure to release lock after CPU hot-unplug canceled
  2007-01-08 17:07 Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert
@ 2007-01-09 12:17 ` Heiko Carstens
  2007-01-09 12:27   ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 9+ messages in thread
From: Heiko Carstens @ 2007-01-09 12:17 UTC (permalink / raw)
  To: Benjamin Gilbert
  Cc: linux-kernel, vatsa, Ingo Molnar, Gautham shenoy, Andrew Morton

On Mon, Jan 08, 2007 at 12:07:19PM -0500, Benjamin Gilbert wrote:
> If a module returns NOTIFY_BAD to a CPU_DOWN_PREPARE callback, subsequent
> attempts to take a CPU down cause the write into sysfs to wedge.
> 
> This is reproducible in 2.6.20-rc4, but was originally found in 2.6.18.5.
> 
> Steps to reproduce:
> 
> 1.  Load the test module included below
> 2.  Run the following shell commands as root:
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> The second echo command hangs in uninterruptible sleep during the write()
> call, and the following appears in dmesg:
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.20-rc4-686 #1
> -------------------------------------------------------
> bash/1699 is trying to acquire lock:
>  (cpu_add_remove_lock){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
> 
> but task is already holding lock:
>  (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
> 
> which lock already depends on the new lock.

There is something like this

raw_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED, (void *)(long)cpu);

missing in kernel cpu.c in _cpu_down() in case CPU_DOWN_PREPARE
returned with NOTIFY_BAD. However... this reveals that there is just a
more fundamental problem.

The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
returns NOTIFY_BAD the rest of the entries in the callchain won't be
called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
entry.
So we might even end up with a mutex_unlock(&workqueue_mutex) even if
mutex_lock(&workqueue_mutex) hasn't been called...

Maybe this will be addressed by somebody else since cpu hotplug locking
is being worked on (again).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Failure to release lock after CPU hot-unplug canceled
  2007-01-09 12:17 ` Heiko Carstens
@ 2007-01-09 12:27   ` Srivatsa Vaddagiri
  2007-01-09 15:03     ` Heiko Carstens
  0 siblings, 1 reply; 9+ messages in thread
From: Srivatsa Vaddagiri @ 2007-01-09 12:27 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Benjamin Gilbert, linux-kernel, Ingo Molnar, Gautham shenoy,
	Andrew Morton

On Tue, Jan 09, 2007 at 01:17:38PM +0100, Heiko Carstens wrote:
> missing in kernel cpu.c in _cpu_down() in case CPU_DOWN_PREPARE
> returned with NOTIFY_BAD. However... this reveals that there is just a
> more fundamental problem.
> 
> The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
> again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
> returns NOTIFY_BAD the rest of the entries in the callchain won't be
> called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
> entry.
> So we might even end up with a mutex_unlock(&workqueue_mutex) even if
> mutex_lock(&workqueue_mutex) hasn't been called...

This is a known problem. Gautham had sent out patches to address them

http://lkml.org/lkml/2006/11/14/93

Looks like they are in latest mm tree. Perhaps the testcase should be
retried against latest mm.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Failure to release lock after CPU hot-unplug canceled
  2007-01-09 12:27   ` Srivatsa Vaddagiri
@ 2007-01-09 15:03     ` Heiko Carstens
  2007-01-09 15:05       ` [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed Heiko Carstens
                         ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Heiko Carstens @ 2007-01-09 15:03 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Benjamin Gilbert, linux-kernel, Ingo Molnar, Gautham shenoy,
	Andrew Morton

On Tue, Jan 09, 2007 at 05:57:40PM +0530, Srivatsa Vaddagiri wrote:
> On Tue, Jan 09, 2007 at 01:17:38PM +0100, Heiko Carstens wrote:
> > missing in kernel cpu.c in _cpu_down() in case CPU_DOWN_PREPARE
> > returned with NOTIFY_BAD. However... this reveals that there is just a
> > more fundamental problem.
> >
> > The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
> > again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
> > returns NOTIFY_BAD the rest of the entries in the callchain won't be
> > called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
> > entry.
> > So we might even end up with a mutex_unlock(&workqueue_mutex) even if
> > mutex_lock(&workqueue_mutex) hasn't been called...
> 
> This is a known problem. Gautham had sent out patches to address them
> 
> http://lkml.org/lkml/2006/11/14/93
> 
> Looks like they are in latest mm tree. Perhaps the testcase should be
> retried against latest mm.

Ah, nice! Wasn't aware of that. But I still think we should have a
CPU_DOWN_FAILED in case CPU_DOWN_PREPARED failed.
Also the slab cache code hasn't been changed to make use of the of the
new CPU_LOCK_[ACQUIRE|RELEASE] stuff. I'm going to send patches in reply
to this mail.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed
  2007-01-09 15:03     ` Heiko Carstens
@ 2007-01-09 15:05       ` Heiko Carstens
  2007-01-09 15:06       ` [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE] Heiko Carstens
  2007-01-09 16:34       ` Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert
  2 siblings, 0 replies; 9+ messages in thread
From: Heiko Carstens @ 2007-01-09 15:05 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Benjamin Gilbert, linux-kernel, Ingo Molnar, Gautham shenoy,
	Andrew Morton

From: Heiko Carstens <heiko.carstens@de.ibm.com>

This makes cpu hotplug symmetrical: if CPU_UP_PREPARE fails we get
CPU_UP_CANCELED, so we can undo what ever happened on PREPARE.
The same should happen for CPU_DOWN_PREPARE.

Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Gautham Shenoy <ego@in.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/cpu.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

Index: linux-2.6.20-rc3-mm1/kernel/cpu.c
===================================================================
--- linux-2.6.20-rc3-mm1.orig/kernel/cpu.c
+++ linux-2.6.20-rc3-mm1/kernel/cpu.c
@@ -122,9 +122,10 @@ static int take_cpu_down(void *unused)
 /* Requires cpu_add_remove_lock to be held */
 static int _cpu_down(unsigned int cpu)
 {
-	int err;
+	int err, nr_calls = 0;
 	struct task_struct *p;
 	cpumask_t old_allowed, tmp;
+	void *hcpu = (void *)(long)cpu;
 
 	if (num_online_cpus() == 1)
 		return -EBUSY;
@@ -132,11 +133,12 @@ static int _cpu_down(unsigned int cpu)
 	if (!cpu_online(cpu))
 		return -EINVAL;
 
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE,
-						(void *)(long)cpu);
-	err = raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE,
-						(void *)(long)cpu);
+	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
+	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE,
+					hcpu, -1, &nr_calls);
 	if (err == NOTIFY_BAD) {
+		__raw_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED, hcpu,
+					  nr_calls, NULL);
 		printk("%s: attempt to take down CPU %u failed\n",
 				__FUNCTION__, cpu);
 		err = -EINVAL;
@@ -156,7 +158,7 @@ static int _cpu_down(unsigned int cpu)
 	if (IS_ERR(p) || cpu_online(cpu)) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		if (raw_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED,
-				(void *)(long)cpu) == NOTIFY_BAD)
+					    hcpu) == NOTIFY_BAD)
 			BUG();
 
 		if (IS_ERR(p)) {
@@ -178,8 +180,7 @@ static int _cpu_down(unsigned int cpu)
 	put_cpu();
 
 	/* CPU is completely dead: tell everyone.  Too late to complain. */
-	if (raw_notifier_call_chain(&cpu_chain, CPU_DEAD,
-			(void *)(long)cpu) == NOTIFY_BAD)
+	if (raw_notifier_call_chain(&cpu_chain, CPU_DEAD, hcpu) == NOTIFY_BAD)
 		BUG();
 
 	check_for_tasks(cpu);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE]
  2007-01-09 15:03     ` Heiko Carstens
  2007-01-09 15:05       ` [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed Heiko Carstens
@ 2007-01-09 15:06       ` Heiko Carstens
  2007-01-10 18:20         ` Christoph Lameter
  2007-01-09 16:34       ` Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert
  2 siblings, 1 reply; 9+ messages in thread
From: Heiko Carstens @ 2007-01-09 15:06 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Benjamin Gilbert, linux-kernel, Ingo Molnar, Gautham shenoy,
	Andrew Morton, Pekka Enberg

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Looks like this was forgotten when CPU_LOCK_[ACQUIRE|RELEASE] was
introduced.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Gautham Shenoy <ego@in.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 mm/slab.c |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

Index: linux-2.6.20-rc3-mm1/mm/slab.c
===================================================================
--- linux-2.6.20-rc3-mm1.orig/mm/slab.c
+++ linux-2.6.20-rc3-mm1/mm/slab.c
@@ -1177,8 +1177,10 @@ static int __cpuinit cpuup_callback(stru
 	int memsize = sizeof(struct kmem_list3);
 
 	switch (action) {
-	case CPU_UP_PREPARE:
+	case CPU_LOCK_ACQUIRE:
 		mutex_lock(&cache_chain_mutex);
+		break;
+	case CPU_UP_PREPARE:
 		/*
 		 * We need to do this right in the beginning since
 		 * alloc_arraycache's are going to use this list.
@@ -1264,16 +1266,9 @@ static int __cpuinit cpuup_callback(stru
 		}
 		break;
 	case CPU_ONLINE:
-		mutex_unlock(&cache_chain_mutex);
 		start_cpu_timer(cpu);
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
-	case CPU_DOWN_PREPARE:
-		mutex_lock(&cache_chain_mutex);
-		break;
-	case CPU_DOWN_FAILED:
-		mutex_unlock(&cache_chain_mutex);
-		break;
 	case CPU_DEAD:
 		/*
 		 * Even if all the cpus of a node are down, we don't free the
@@ -1344,6 +1339,8 @@ free_array_cache:
 				continue;
 			drain_freelist(cachep, l3, l3->free_objects);
 		}
+		break;
+	case CPU_LOCK_RELEASE:
 		mutex_unlock(&cache_chain_mutex);
 		break;
 	}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Failure to release lock after CPU hot-unplug canceled
  2007-01-09 15:03     ` Heiko Carstens
  2007-01-09 15:05       ` [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed Heiko Carstens
  2007-01-09 15:06       ` [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE] Heiko Carstens
@ 2007-01-09 16:34       ` Benjamin Gilbert
  2 siblings, 0 replies; 9+ messages in thread
From: Benjamin Gilbert @ 2007-01-09 16:34 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Srivatsa Vaddagiri, linux-kernel, Ingo Molnar, Gautham shenoy,
	Andrew Morton

Heiko Carstens wrote:
> On Tue, Jan 09, 2007 at 05:57:40PM +0530, Srivatsa Vaddagiri wrote:
>> On Tue, Jan 09, 2007 at 01:17:38PM +0100, Heiko Carstens wrote:
>>> The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
>>> again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
>>> returns NOTIFY_BAD the rest of the entries in the callchain won't be
>>> called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
>>> entry.
>>> So we might even end up with a mutex_unlock(&workqueue_mutex) even if
>>> mutex_lock(&workqueue_mutex) hasn't been called...
 >>
>> This is a known problem. Gautham had sent out patches to address them
>>
>> http://lkml.org/lkml/2006/11/14/93
>>
>> Looks like they are in latest mm tree. Perhaps the testcase should be
>> retried against latest mm.
 >
> Ah, nice! Wasn't aware of that. But I still think we should have a
> CPU_DOWN_FAILED in case CPU_DOWN_PREPARED failed.
> Also the slab cache code hasn't been changed to make use of the of the
> new CPU_LOCK_[ACQUIRE|RELEASE] stuff. I'm going to send patches in reply
> to this mail.

2.6.20-rc3-mm1 plus your patches fixes it for me.

Thanks
--Benjamin Gilbert


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE]
  2007-01-09 15:06       ` [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE] Heiko Carstens
@ 2007-01-10 18:20         ` Christoph Lameter
  2007-01-11  2:30           ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2007-01-10 18:20 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Srivatsa Vaddagiri, Benjamin Gilbert, linux-kernel, Ingo Molnar,
	Gautham shenoy, Andrew Morton, Pekka Enberg

On Tue, 9 Jan 2007, Heiko Carstens wrote:

> -	case CPU_UP_PREPARE:
> +	case CPU_LOCK_ACQUIRE:
>  		mutex_lock(&cache_chain_mutex);
> +		break;

I have got a bad feeling about upcoming deadlock problems when looking at 
the mutex_lock / unlock code in cpuup_callback in slab.c. Branches 
that just obtain a lock or release a lock? I hope there is some 
control of  what happens between lock acquisition and release?

You are aware that this lock is taken for cache shrinking/destroy, tuning 
of cpu cache sizes, proc output and cache creation? Any of those run on 
the same processor should cause a deadlock.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE]
  2007-01-10 18:20         ` Christoph Lameter
@ 2007-01-11  2:30           ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 9+ messages in thread
From: Srivatsa Vaddagiri @ 2007-01-11  2:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Heiko Carstens, Benjamin Gilbert, linux-kernel, Ingo Molnar,
	Gautham shenoy, Andrew Morton, Pekka Enberg

On Wed, Jan 10, 2007 at 10:20:28AM -0800, Christoph Lameter wrote:
> I have got a bad feeling about upcoming deadlock problems when looking at
> the mutex_lock / unlock code in cpuup_callback in slab.c. Branches
> that just obtain a lock or release a lock? I hope there is some
> control of  what happens between lock acquisition and release?

A cpu hotplug should happen between LOCK_ACQUIRE/RELEASE

> You are aware that this lock is taken for cache shrinking/destroy, tuning
> of cpu cache sizes, proc output and cache creation? Any of those run on
> the same processor should cause a deadlock.

Why? mutex_lock() taken in LOCK_ACQ will just block those functions
(cache create etc) from proceeding simultaneously as a hotplug event.
This per-subsystem mutex_lock() is supposed to be a replacement for the global
lock_cpu_hotplug() lock .. 

But the whole thing is changing again ..we will likely move towards a
process freezer based cpu hotplug locking ..all the lock_cpu_hotplugs()
and the existing LOCK_ACQ/RELS can go away when we do that ..

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-01-11  2:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-08 17:07 Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert
2007-01-09 12:17 ` Heiko Carstens
2007-01-09 12:27   ` Srivatsa Vaddagiri
2007-01-09 15:03     ` Heiko Carstens
2007-01-09 15:05       ` [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed Heiko Carstens
2007-01-09 15:06       ` [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE] Heiko Carstens
2007-01-10 18:20         ` Christoph Lameter
2007-01-11  2:30           ` Srivatsa Vaddagiri
2007-01-09 16:34       ` Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).