LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [Need Help] Cpuhotplug operations on 32-bit mode of xeon-64bit processor crashes the system.
@ 2007-01-22  8:12 Srinivasa Ds
  2007-01-31  0:30 ` Siddha, Suresh B
  0 siblings, 1 reply; 3+ messages in thread
From: Srinivasa Ds @ 2007-01-22  8:12 UTC (permalink / raw)
  To: Siddha, Suresh B, ashok.raj, linux-kernel, Ingo Molnar, mingo

I saw cpuhotplug operations on 32-bit mode of xeon-64bit processors 
crashing the system. This happens on latest 2.6.20-rc5 kernel also. Same 
(i386 cpuhotplug code) runs fine on xeon-32bit processors.
Steps to reproduce.
====================
echo 0 > /sys/devices/system/cpu/cpu6/online
echo 1 > /sys/devices/system/cpu/cpu6/online
================================
dmesg shows.
==============
Breaking affinity for irq 4
cpu_mask_to_apicid: Not a valid mask!
CPU 6 is now offline
=======================

On debugging the problem, I found that problem is not in cpuhotplug code 
but in apic part. Execution of  "stale" IPI's by onlined cpus(which we 
offlined earlier) is causing the crash. Now we need to debug,why IPI's 
are reaching the offlined cpu's too.

1)   During the calculation of apicid's, if cpu to which IPI has to 
deliver is not in
same apic cluster,it prints "Not a valid mask" error and returns "0xFF" 
which means broadcast the IPI's to all cpus(which are offlined too) and 
hence the problem.

2) I booted the system with maxcpus=2 boot parameter, and tried cpu 
hotplugging on it.
but still problem recreates(I think there is no concept of apic clusters 
if there are only 2 cpus). Hence it makes me to conclude that problem is 
in delivery of IPI's.

So Iam completely stuck here. Iam not able to move forward in debugging. 
So could someone(may be intel folks) please throw some light on this.

Thanks in advance
  Srinivasa DS
  LTC-IBM


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Need Help] Cpuhotplug operations on 32-bit mode of xeon-64bit processor crashes the system.
  2007-01-22  8:12 [Need Help] Cpuhotplug operations on 32-bit mode of xeon-64bit processor crashes the system Srinivasa Ds
@ 2007-01-31  0:30 ` Siddha, Suresh B
  2007-01-31  4:51   ` Srinivasa DS
  0 siblings, 1 reply; 3+ messages in thread
From: Siddha, Suresh B @ 2007-01-31  0:30 UTC (permalink / raw)
  To: Srinivasa Ds
  Cc: Siddha, Suresh B, ashok.raj, linux-kernel, Ingo Molnar, mingo

Sorry for my delayed response. I was away on vacation.

What platform is this? what do you mean by crashing? Do you see a
system freeze or oops?

thanks,
suresh

On Mon, Jan 22, 2007 at 01:42:48PM +0530, Srinivasa Ds wrote:
> I saw cpuhotplug operations on 32-bit mode of xeon-64bit processors 
> crashing the system. This happens on latest 2.6.20-rc5 kernel also. Same 
> (i386 cpuhotplug code) runs fine on xeon-32bit processors.
> Steps to reproduce.
> ====================
> echo 0 > /sys/devices/system/cpu/cpu6/online
> echo 1 > /sys/devices/system/cpu/cpu6/online
> ================================
> dmesg shows.
> ==============
> Breaking affinity for irq 4
> cpu_mask_to_apicid: Not a valid mask!
> CPU 6 is now offline
> =======================
> 
> On debugging the problem, I found that problem is not in cpuhotplug code 
> but in apic part. Execution of  "stale" IPI's by onlined cpus(which we 
> offlined earlier) is causing the crash. Now we need to debug,why IPI's 
> are reaching the offlined cpu's too.
> 
> 1)   During the calculation of apicid's, if cpu to which IPI has to 
> deliver is not in
> same apic cluster,it prints "Not a valid mask" error and returns "0xFF" 
> which means broadcast the IPI's to all cpus(which are offlined too) and 
> hence the problem.
> 
> 2) I booted the system with maxcpus=2 boot parameter, and tried cpu 
> hotplugging on it.
> but still problem recreates(I think there is no concept of apic clusters 
> if there are only 2 cpus). Hence it makes me to conclude that problem is 
> in delivery of IPI's.
> 
> So Iam completely stuck here. Iam not able to move forward in debugging. 
> So could someone(may be intel folks) please throw some light on this.
> 
> Thanks in advance
>   Srinivasa DS
>   LTC-IBM

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Need Help] Cpuhotplug operations on 32-bit mode of xeon-64bit processor crashes the system.
  2007-01-31  0:30 ` Siddha, Suresh B
@ 2007-01-31  4:51   ` Srinivasa DS
  0 siblings, 0 replies; 3+ messages in thread
From: Srinivasa DS @ 2007-01-31  4:51 UTC (permalink / raw)
  Cc: Siddha, Suresh B, ashok.raj, linux-kernel, Ingo Molnar, mingo

Siddha, Suresh B wrote:
> Sorry for my delayed response. I was away on vacation.
>
> What platform is this? what do you mean by crashing? Do you see a
> system freeze or oops?
>   
Its xeon-64 bit processor,running in 32-bit compatibility 
mode(i386-code). We have not seen this problem in x86_64 envioronment. 
It happens in 32-bit compatibility mode.

Problem is in calculation of apicid's and delivery of IPI's.

I saw a oops,when I do cpuhotplug operations on it.

If you want any further information,please free to ask.

Thanks
Srinivasa Ds

> thanks,
> suresh
>
> On Mon, Jan 22, 2007 at 01:42:48PM +0530, Srinivasa Ds wrote:
>   
>> I saw cpuhotplug operations on 32-bit mode of xeon-64bit processors 
>> crashing the system. This happens on latest 2.6.20-rc5 kernel also. Same 
>> (i386 cpuhotplug code) runs fine on xeon-32bit processors.
>> Steps to reproduce.
>> ====================
>> echo 0 > /sys/devices/system/cpu/cpu6/online
>> echo 1 > /sys/devices/system/cpu/cpu6/online
>> ================================
>> dmesg shows.
>> ==============
>> Breaking affinity for irq 4
>> cpu_mask_to_apicid: Not a valid mask!
>> CPU 6 is now offline
>> =======================
>>
>> On debugging the problem, I found that problem is not in cpuhotplug code 
>> but in apic part. Execution of  "stale" IPI's by onlined cpus(which we 
>> offlined earlier) is causing the crash. Now we need to debug,why IPI's 
>> are reaching the offlined cpu's too.
>>
>> 1)   During the calculation of apicid's, if cpu to which IPI has to 
>> deliver is not in
>> same apic cluster,it prints "Not a valid mask" error and returns "0xFF" 
>> which means broadcast the IPI's to all cpus(which are offlined too) and 
>> hence the problem.
>>
>> 2) I booted the system with maxcpus=2 boot parameter, and tried cpu 
>> hotplugging on it.
>> but still problem recreates(I think there is no concept of apic clusters 
>> if there are only 2 cpus). Hence it makes me to conclude that problem is 
>> in delivery of IPI's.
>>
>> So Iam completely stuck here. Iam not able to move forward in debugging. 
>> So could someone(may be intel folks) please throw some light on this.
>>
>> Thanks in advance
>>   Srinivasa DS
>>   LTC-IBM
>>     


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-01-31  4:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-22  8:12 [Need Help] Cpuhotplug operations on 32-bit mode of xeon-64bit processor crashes the system Srinivasa Ds
2007-01-31  0:30 ` Siddha, Suresh B
2007-01-31  4:51   ` Srinivasa DS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).