LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Cannot boot xen DomU > 2.6.23.1
@ 2008-01-17 16:13 xming
  2008-01-17 17:15 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: xming @ 2008-01-17 16:13 UTC (permalink / raw)
  To: linux-kernel

Hi,

I finally found the piece of code that prevents me from booting Xen DomU
with vallina kernel > 2.6.23.1.

The problem is that with every kernel (> 2.6.32.1 including 2.6.24 RCs)
will just hang with "too much" console activity. Sometimes (well most
of the time) boot msg is too much. When I can boot into the kernel,
generating a lots of cosole out it will hang, no oops, no more
console/network. Generating with the same way through ssh will not
hang the domU.

When I reverse the following patch, things work as before, tried this
with 2.6.23.14 and 2.6.14-rc8. But I don't have the knowledge to
understand the reason behind this.

BTW, I am not subscribed.

--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -160,8 +160,9 @@ struct vcpu_set_singleshot_timer {
  */
 #define VCPUOP_register_vcpu_info   10  /* arg == struct vcpu_info */
 struct vcpu_register_vcpu_info {
-    uint32_t mfn;               /* mfn of page to place vcpu_info */
-    uint32_t offset;            /* offset within page */
+    uint64_t mfn;    /* mfn of page to place vcpu_info */
+    uint32_t offset; /* offset within page */
+    uint32_t rsvd;   /* unused */
 };

 #endif /* __XEN_PUBLIC_VCPU_H__ */

I am running Xen 3.1.2 PAE

# uname -a
Linux builder 2.6.24-rc8 #2 SMP Thu Jan 17 16:37:19 CET 2008 i686
AMD Athlon(tm) X2 Dual Core Processor BE-2300 AuthenticAMD GNU/Linux

# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 107
model name      : AMD Athlon(tm) X2 Dual Core Processor BE-2300
stepping        : 1
cpu MHz         : 1899.930
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu tsc msr pae mce cx8 mca cmov pat pse36 clflush
mmx fxsr sse sse2 ht nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16
lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch ts fid vid ttp
tm stc 100mhzsteps
bogomips        : 3829.72
clflush size    : 64

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 107
model name      : AMD Athlon(tm) X2 Dual Core Processor BE-2300
stepping        : 1
cpu MHz         : 1899.930
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu tsc msr pae mce cx8 mca cmov pat pse36 clflush
mmx fxsr sse sse2 ht nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16
lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch ts fid vid ttp
tm stc 100mhzsteps
bogomips        : 3829.72
clflush size    : 64

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-17 16:13 Cannot boot xen DomU > 2.6.23.1 xming
@ 2008-01-17 17:15 ` Jeremy Fitzhardinge
  2008-01-17 19:13   ` xming
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-17 17:15 UTC (permalink / raw)
  To: xming; +Cc: linux-kernel

xming wrote:
> Hi,
>
> I finally found the piece of code that prevents me from booting Xen DomU
> with vallina kernel > 2.6.23.1.
>
> The problem is that with every kernel (> 2.6.32.1 including 2.6.24 RCs)
> will just hang with "too much" console activity. Sometimes (well most
> of the time) boot msg is too much. When I can boot into the kernel,
> generating a lots of cosole out it will hang, no oops, no more
> console/network. Generating with the same way through ssh will not
> hang the domU.
>
> When I reverse the following patch, things work as before, tried this
> with 2.6.23.14 and 2.6.14-rc8. But I don't have the knowledge to
> understand the reason behind this.
>   

Uh, that's very strange.  That patch fixes an outright bug, unless 
you're using one specific changeset out of the xen-unstable mercurial 
tree.  What version of Xen are you using?  Is it one you've built from 
xenbits, or distributed by someone?  Are you using a 32 or 64 bit xen/dom0?

I don't understand where the "too much console activity" message comes 
from.  Could you post an actual cut'n'paste of the message, or a 
screenshot?  Are you using a serial console on your dom0?

Thanks,
    J

> BTW, I am not subscribed.
>
> --- a/include/xen/interface/vcpu.h
> +++ b/include/xen/interface/vcpu.h
> @@ -160,8 +160,9 @@ struct vcpu_set_singleshot_timer {
>   */
>  #define VCPUOP_register_vcpu_info   10  /* arg == struct vcpu_info */
>  struct vcpu_register_vcpu_info {
> -    uint32_t mfn;               /* mfn of page to place vcpu_info */
> -    uint32_t offset;            /* offset within page */
> +    uint64_t mfn;    /* mfn of page to place vcpu_info */
> +    uint32_t offset; /* offset within page */
> +    uint32_t rsvd;   /* unused */
>  };
>
>  #endif /* __XEN_PUBLIC_VCPU_H__ */
>
> I am running Xen 3.1.2 PAE
>
> # uname -a
> Linux builder 2.6.24-rc8 #2 SMP Thu Jan 17 16:37:19 CET 2008 i686
> AMD Athlon(tm) X2 Dual Core Processor BE-2300 AuthenticAMD GNU/Linux
>
> # cat /proc/cpuinfo
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 15
> model           : 107
> model name      : AMD Athlon(tm) X2 Dual Core Processor BE-2300
> stepping        : 1
> cpu MHz         : 1899.930
> cache size      : 512 KB
> physical id     : 0
> siblings        : 2
> core id         : 1
> cpu cores       : 2
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu tsc msr pae mce cx8 mca cmov pat pse36 clflush
> mmx fxsr sse sse2 ht nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16
> lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch ts fid vid ttp
> tm stc 100mhzsteps
> bogomips        : 3829.72
> clflush size    : 64
>
> processor       : 1
> vendor_id       : AuthenticAMD
> cpu family      : 15
> model           : 107
> model name      : AMD Athlon(tm) X2 Dual Core Processor BE-2300
> stepping        : 1
> cpu MHz         : 1899.930
> cache size      : 512 KB
> physical id     : 0
> siblings        : 2
> core id         : 1
> cpu cores       : 2
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu tsc msr pae mce cx8 mca cmov pat pse36 clflush
> mmx fxsr sse sse2 ht nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16
> lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch ts fid vid ttp
> tm stc 100mhzsteps
> bogomips        : 3829.72
> clflush size    : 64
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>   


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-17 17:15 ` Jeremy Fitzhardinge
@ 2008-01-17 19:13   ` xming
  2008-01-18  7:32     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: xming @ 2008-01-17 19:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jeremy Fitzhardinge

On Jan 17, 2008 6:15 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Uh, that's very strange.  That patch fixes an outright bug, unless
> you're using one specific changeset out of the xen-unstable mercurial
> tree.  What version of Xen are you using?  Is it one you've built from
> xenbits, or distributed by someone?  Are you using a 32 or 64 bit xen/dom0?
>
> I don't understand where the "too much console activity" message comes
> from.  Could you post an actual cut'n'paste of the message, or a
> screenshot?  Are you using a serial console on your dom0?

I am running Xen 3.1.2 from Gentoo, tried 3.1.1. 32 bit PAE as I wrote in the
previous post

> > I am running Xen 3.1.2 PAE

I never had any problem (at least THIS problem) with any PV DomU (2.6.18,
2.6.20, 2.6.21, 2.6.23-RCs and 2.6.23.1). The problem started with 2.6.23.3
and today I finally found time to track it down.

This only affects PV domU, so I don't undestand your question about serial
console of Dom0.

The symptom is (with a lot of subjective judgment) when there is a lot (or
too quick) output on the console of the domU (hvc0 connected with either
"xm crea file.cfg -c" or "xm cons id") the whole PV domU hangs. It will
really hang at random places, sometimes right after init and sometime
after I logged in and just generate some ouput (on hvc0) like "find /". IIRC
I have never seen a hang before init.

When it hangs there is no output on console any more and network to
that domU is dead too, nothing affects dom0. "xm list" still reports the
domU as "r", nothing special in the logs (of dom0 xen logs) and no
OOPS nor panic reported in the domU. It's seem that it's running in a
infinite loop.

So, I can make screenshots, but they won't tell you anything, the is no
message, it's just dead.

Gentoo's xen is just a src tarball made from the mercurial repro, w/o
any patches (AFAIK).

cheers

Ming-Wei Shih

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-17 19:13   ` xming
@ 2008-01-18  7:32     ` Jeremy Fitzhardinge
  2008-01-18 12:38       ` xming
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-18  7:32 UTC (permalink / raw)
  To: xming; +Cc: linux-kernel, Xen-devel

xming wrote:
> The symptom is (with a lot of subjective judgment) when there is a lot (or
> too quick) output on the console of the domU (hvc0 connected with either
> "xm crea file.cfg -c" or "xm cons id") the whole PV domU hangs. It will
> really hang at random places, sometimes right after init and sometime
> after I logged in and just generate some ouput (on hvc0) like "find /". IIRC
> I have never seen a hang before init.
>   

OK, I misunderstood your original report to mean that something was 
complaining about "too much" output.  You're saying that lots of console 
output seems to lock the domain.

I've had a report about heavy disk IO seems to lock up as well.  Perhaps 
they're both related to high event rates.  Do you think you could try an 
IO-intensive workload to see if you can get a similar lockup?

When the domain is locked up, what does /usr/lib/xen/bin/xenctx say?

Hm.  Rather than backing out the structure-change patch, could you try 
this workaround:

diff -r be3ca4e0e19e arch/x86/xen/enlighten.c
--- a/arch/x86/xen/enlighten.c	Thu Jan 17 14:25:07 2008 -0800
+++ b/arch/x86/xen/enlighten.c	Thu Jan 17 16:37:42 2008 -0800
@@ -95,7 +95,7 @@ struct shared_info *HYPERVISOR_shared_in
  *
  * 0: not available, 1: available
  */
-static int have_vcpu_info_placement = 1;
+static int have_vcpu_info_placement = 0;
 
 static void __init xen_vcpu_setup(int cpu)
 {


Reverting the structure shape could cause crashes or random data 
corruption, but it has the side-effect of disabling the vpu_info 
structure placement mechanism.  This patch disables it cleanly.

Thanks,
    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-18  7:32     ` Jeremy Fitzhardinge
@ 2008-01-18 12:38       ` xming
  2008-01-18 16:19         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: xming @ 2008-01-18 12:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, Xen-devel

> OK, I misunderstood your original report to mean that something was
> complaining about "too much" output.  You're saying that lots of console
> output seems to lock the domain.

Sorry about that, and yes that is the case.

> I've had a report about heavy disk IO seems to lock up as well.  Perhaps
> they're both related to high event rates.  Do you think you could try an
> IO-intensive workload to see if you can get a similar lockup?

IO-intensive locks up too (see below)

> When the domain is locked up, what does /usr/lib/xen/bin/xenctx say?

see below

> Hm.  Rather than backing out the structure-change patch, could you try
> this workaround:
>
> diff -r be3ca4e0e19e arch/x86/xen/enlighten.c
> --- a/arch/x86/xen/enlighten.c  Thu Jan 17 14:25:07 2008 -0800
> +++ b/arch/x86/xen/enlighten.c  Thu Jan 17 16:37:42 2008 -0800
> @@ -95,7 +95,7 @@ struct shared_info *HYPERVISOR_shared_in
>   *
>   * 0: not available, 1: available
>   */
> -static int have_vcpu_info_placement = 1;
> +static int have_vcpu_info_placement = 0;
>
>  static void __init xen_vcpu_setup(int cpu)
>  {

First of all this patch solves the lock-ups, it works as advertised :) The DomU
works as before. Just for the record for people trying to apply this to 2.6.23.x
you need to change the /x86/ to /i386/, unified x86 is since 2.6.24.

I tried to create 2 tests, one is IO intensive and the other is console
output intensive:

test1. bonnie++ -s 1024 -u nobody
test2. for i in `seq 1 50000`; do echo 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ; done

In all acese where it crashed(hanged) there was no oops/panic.

scenario 1 (booted 2.6.23.14 as is)
--------------------------------------------------
(but with init=/bin/bash, otherwise I couldn't get a prompt)

test1: crashed

# /usr/lib/xen/bin/xenctx 108
eip: c037c0c7
esp: c0343f90
eax: 00000000   ebx: 00000001   ecx: 00000000   edx: c0342000
esi: c0373004   edi: c1210df4   ebp: 00001b7d
 cs: 00000061    ds: 0000007b    fs: 000000d8    gs: 00000000

Stack:
 c0100add c0378980 c0101962 c0104821 c120a000 c0378df4 c0348cff 00000025
 c0348430 00000004 00009000 00006df4 00ea1000 c0363be0 c0343fe8 c03dd007
 00000000 c0343fec c0349868 c0343fe0 178bc1f1 00002001 01020800 00060fb1
 00000000 c03dd000 00000000 00000000

Code:
cc cc cc cc cc cc cc cc cc cc cc cc cc cc b8 06 00 00 00 cd 82 <c3> cc
cc cc cc cc cc cc cc cc cc

Call Trace:
  [<c037c0c7>]  <--
  [<c0100add>]
  [<c0378980>]
  [<c0101962>]
  [<c0104821>]
  [<c120a000>]
  [<c0378df4>]
  [<c0348cff>]
  [<c0348430>]
  [<c0363be0>]
  [<c0343fe8>]
  [<c03dd007>]
  [<c0343fec>]
  [<c0349868>]
  [<c0343fe0>]
  [<178bc1f1>]
  [<c03dd000>]

test2: crashed after many many retries and sometimes with strange output

00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAA
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
0000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ

# /usr/lib/xen/bin/xenctx 113
eip: c037c0c7
esp: c0343f90
eax: 00000000   ebx: 00000001   ecx: 00000000   edx: c0342000
esi: c0373004   edi: c1210df4   ebp: 00001b7d
 cs: 00000061    ds: 0000007b    fs: 000000d8    gs: 00000000

Stack:
 c0100add c0378980 c0101962 c0104821 c120a000 c0378df4 c0348cff 00000025
 c0348430 00000004 00009000 00006df4 00ea1000 c0363be0 c0343fe8 c03dd007
 00000000 c0343fec c0349868 c0343fe0 178bc1f1 00002001 00020800 00060fb1
 00000000 c03dd000 00000000 00000000

Code:
cc cc cc cc cc cc cc cc cc cc cc cc cc cc b8 06 00 00 00 cd 82 <c3> cc
cc cc cc cc cc cc cc cc cc

Call Trace:
  [<c037c0c7>]  <--
  [<c0100add>]
  [<c0378980>]
  [<c0101962>]
  [<c0104821>]
  [<c120a000>]
  [<c0378df4>]
  [<c0348cff>]
  [<c0348430>]
  [<c0363be0>]
  [<c0343fe8>]
  [<c03dd007>]
  [<c0343fec>]
  [<c0349868>]
  [<c0343fe0>]
  [<178bc1f1>]
  [<c03dd000>]

Scenario 2 (have_vcpu_info_placement = 0)
--------------------------------------------------------------

test1: no crash
test2: no crash, but occationally I still get funny output like this

00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
0000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-18 12:38       ` xming
@ 2008-01-18 16:19         ` Jeremy Fitzhardinge
  2008-01-18 16:56           ` xming
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-18 16:19 UTC (permalink / raw)
  To: xming; +Cc: linux-kernel, Xen-devel

xming wrote:
>> OK, I misunderstood your original report to mean that something was
>> complaining about "too much" output.  You're saying that lots of console
>> output seems to lock the domain.
>>     
>
> Sorry about that, and yes that is the case.
>
>   
>> I've had a report about heavy disk IO seems to lock up as well.  Perhaps
>> they're both related to high event rates.  Do you think you could try an
>> IO-intensive workload to see if you can get a similar lockup?
>>     
>
> IO-intensive locks up too (see below)
>
>   
>> When the domain is locked up, what does /usr/lib/xen/bin/xenctx say?
>>     
>
> see below
>
>   
>> Hm.  Rather than backing out the structure-change patch, could you try
>> this workaround:
>>
>> diff -r be3ca4e0e19e arch/x86/xen/enlighten.c
>> --- a/arch/x86/xen/enlighten.c  Thu Jan 17 14:25:07 2008 -0800
>> +++ b/arch/x86/xen/enlighten.c  Thu Jan 17 16:37:42 2008 -0800
>> @@ -95,7 +95,7 @@ struct shared_info *HYPERVISOR_shared_in
>>   *
>>   * 0: not available, 1: available
>>   */
>> -static int have_vcpu_info_placement = 1;
>> +static int have_vcpu_info_placement = 0;
>>
>>  static void __init xen_vcpu_setup(int cpu)
>>  {
>>     
>
> First of all this patch solves the lock-ups, it works as advertised :)

OK, good.  I guess events are getting lost somewhere with vcpu_info 
placement.

> # /usr/lib/xen/bin/xenctx 113
>   

Would it be possible to map the eip and some top parts of the stack back 
to kernel symbols?  Seems to be the same place in both traces, which is 
interesting.

> eip: c037c0c7
> esp: c0343f90
> eax: 00000000   ebx: 00000001   ecx: 00000000   edx: c0342000
> esi: c0373004   edi: c1210df4   ebp: 00001b7d
>  cs: 00000061    ds: 0000007b    fs: 000000d8    gs: 00000000
>
> Stack:
>  c0100add c0378980 c0101962 c0104821 c120a000 c0378df4 c0348cff 00000025
>  c0348430 00000004 00009000 00006df4 00ea1000 c0363be0 c0343fe8 c03dd007
>  00000000 c0343fec c0349868 c0343fe0 178bc1f1 00002001 00020800 00060fb1
>  00000000 c03dd000 00000000 00000000
>
> Code:
> cc cc cc cc cc cc cc cc cc cc cc cc cc cc b8 06 00 00 00 cd 82 <c3> cc
> cc cc cc cc cc cc cc cc cc
>
> Call Trace:
>   [<c037c0c7>]  <--
>   [<c0100add>]
>   [<c0378980>]
>   [<c0101962>]
>   [<c0104821>]
>   [<c120a000>]
>   [<c0378df4>]
>   [<c0348cff>]
>   [<c0348430>]
>   [<c0363be0>]
>   [<c0343fe8>]
>   [<c03dd007>]
>   [<c0343fec>]
>   [<c0349868>]
>   [<c0343fe0>]
>   [<178bc1f1>]
>   [<c03dd000>]
>
> Scenario 2 (have_vcpu_info_placement = 0)
> --------------------------------------------------------------
>
> test1: no crash
> test2: no crash, but occationally I still get funny output like this
>
> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> 0000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> 000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> 000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>   

Hm, I guess some of the output is getting dropped.  Does this happen 
with 2.6.18-xen?

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-18 16:19         ` Jeremy Fitzhardinge
@ 2008-01-18 16:56           ` xming
  2008-01-18 17:26             ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: xming @ 2008-01-18 16:56 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, Xen-devel

On Jan 18, 2008 5:19 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> > First of all this patch solves the lock-ups, it works as advertised :)
>
> OK, good.  I guess events are getting lost somewhere with vcpu_info
> placement.


> Would it be possible to map the eip and some top parts of the stack back
> to kernel symbols?  Seems to be the same place in both traces, which is
> interesting.

Can you tell me how, or show me some pointers?


> > Scenario 2 (have_vcpu_info_placement = 0)
> > --------------------------------------------------------------
> >
> > test1: no crash
> > test2: no crash, but occationally I still get funny output like this
> >
> > 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> > 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> > 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> > 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> > 0000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> > 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> > 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> > 000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> > 000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> > 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
> >
>
> Hm, I guess some of the output is getting dropped.  Does this happen
> with 2.6.18-xen?

yes it does

00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
0000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ

# uname -a
Linux builder 2.6.18-xen-r8 #3 SMP Thu Dec 20 15:07:20 CET 2007 i686
AMD Athlon(tm) X2 Dual Core Processor BE-2300 AuthenticAMD GNU/Linux

cheers

xming

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-18 16:56           ` xming
@ 2008-01-18 17:26             ` Jeremy Fitzhardinge
  2008-01-20 13:08               ` xming
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-18 17:26 UTC (permalink / raw)
  To: xming; +Cc: linux-kernel, Xen-devel

xming wrote:
>> Would it be possible to map the eip and some top parts of the stack back
>> to kernel symbols?  Seems to be the same place in both traces, which is
>> interesting.
>>     
>
> Can you tell me how, or show me some pointers?
>   

Do "nm -n vmlinux" on the kernel to set an address sorted list of 
symbols, and then look to see what's near the eip (c037c0c7) and near 
the top of the stack (c0100add, c0378980, c0101962, ...).  Some of these 
may be in data, or other strange places, but the ones which correspond 
to code are interesting.

>>> Scenario 2 (have_vcpu_info_placement = 0)
>>> --------------------------------------------------------------
>>>
>>> test1: no crash
>>> test2: no crash, but occationally I still get funny output like this
>>>
>>> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>>> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>>> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>>> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>>> 0000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>>> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>>> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>>> 000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>>> 000AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>>> 00AAAAAAAAAAAAAAAAAAAAAAAAAAZZ
>>>
>>>       
>> Hm, I guess some of the output is getting dropped.  Does this happen
>> with 2.6.18-xen?
>>     
>
> yes it does
>   

OK, good.  I Didn't Break It (tm) ;)

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-18 17:26             ` Jeremy Fitzhardinge
@ 2008-01-20 13:08               ` xming
  2008-01-20 18:37                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: xming @ 2008-01-20 13:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, Xen-devel

On Jan 18, 2008 6:26 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> >> Would it be possible to map the eip and some top parts of the stack back
> >> to kernel symbols?  Seems to be the same place in both traces, which is
> >> interesting.

> Do "nm -n vmlinux" on the kernel to set an address sorted list of
> symbols, and then look to see what's near the eip (c037c0c7) and near
> the top of the stack (c0100add, c0378980, c0101962, ...).  Some of these
> may be in data, or other strange places, but the ones which correspond
> to code are interesting.

ok I have done some of them, but I still don't know what I should be looking
at. Do you mean code related to xen or code related to have_vcpu_info_placement?
Please be patient with me :)

I just paste some of the result (around those addresses) here:

c037b000 B empty_zero_page
c037c000 B hypercall_page
c037d000 B system_state

c0100a00 t xen_cpuid
c0100a80 t xen_set_debugreg
c0100a90 t xen_get_debugreg
c0100aa0 t xen_save_fl
c0100ac0 t xen_irq_disable
c0100ad0 t xen_safe_halt
c0100af0 t xen_halt
c0100b20 t xen_store_tr
c0100b30 t cvt_gate_to_trap
c0100bb0 t xen_io_delay


c0378980 D per_cpu__irq_stat
c03789c0 d per_cpu__runqueues
c0378df4 D __per_cpu_end

c01018b0 t xen_flush_tlb_single
c0101940 t xen_idle
c0101980 T xen_setup_features
c01019c0 T xen_mc_flush
c0101aa0 T xen_mc_callback

c0104710 T kernel_thread
c01047c0 T cpu_idle
c0104840 T cpu_idle_wait
c0104940 T exit_thread

c0103fe4 T xen_irq_enable_direct
c0103ff1 T xen_irq_enable_direct_reloc
c0103ff5 T xen_irq_enable_direct_end
c0103ff8 T xen_irq_disable_direct
c0104000 T xen_irq_disable_direct_end
c0104004 T xen_save_fl_direct
c0104011 T xen_save_fl_direct_end
c0104014 T xen_restore_fl_direct
c010402b T xen_restore_fl_direct_reloc

c03483f0 t maxcpus
c0348430 t unknown_bootoption
c0348610 T parse_early_param


> >> Hm, I guess some of the output is getting dropped.  Does this happen
> >> with 2.6.18-xen?

> > yes it does

> OK, good.  I Didn't Break It (tm) ;)

So no fix from you? :)

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-20 13:08               ` xming
@ 2008-01-20 18:37                 ` Jeremy Fitzhardinge
  2008-01-20 19:29                   ` xming
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-20 18:37 UTC (permalink / raw)
  To: xming; +Cc: linux-kernel, Xen-devel

xming wrote:
> ok I have done some of them, but I still don't know what I should be looking
> at. Do you mean code related to xen or code related to have_vcpu_info_placement?
> Please be patient with me :)
>
> I just paste some of the result (around those addresses) here:
>   

Thanks, that answers that particular question; the vcpu is blocked 
waiting for something to happen, which probably means it missed the 
event which was supposed to wake it up.  Why is another question.  At 
least there's a workaround, and that workaround gives me some clue where 
to look.

BTW, is it an SMP or UP domain?   Does it make a difference?

>> OK, good.  I Didn't Break It (tm) ;)
>>     
>
> So no fix from you? :)
>   

Maybe when I have nothing else to do.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-20 18:37                 ` Jeremy Fitzhardinge
@ 2008-01-20 19:29                   ` xming
  2008-01-20 23:52                     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: xming @ 2008-01-20 19:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, Xen-devel

On Jan 20, 2008 7:37 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> xming wrote:
> > ok I have done some of them, but I still don't know what I should be looking
> > at. Do you mean code related to xen or code related to have_vcpu_info_placement?
> > Please be patient with me :)
> >
> > I just paste some of the result (around those addresses) here:
> >
>
> Thanks, that answers that particular question; the vcpu is blocked
> waiting for something to happen, which probably means it missed the
> event which was supposed to wake it up.  Why is another question.  At
> least there's a workaround, and that workaround gives me some clue where
> to look.

Want me to test it?

> BTW, is it an SMP or UP domain?   Does it make a difference?

It doesn't matter, I tried vcpu=1 and vcpu=2, unless you want me to try
to recompile a UP kernel?

> >> OK, good.  I Didn't Break It (tm) ;)
>
> > So no fix from you? :)
>
> Maybe when I have nothing else to do.

I'll wait, or should I poke xen-devel?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Cannot boot xen DomU > 2.6.23.1
  2008-01-20 19:29                   ` xming
@ 2008-01-20 23:52                     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-20 23:52 UTC (permalink / raw)
  To: xming; +Cc: linux-kernel, Xen-devel

xming wrote:
>> Thanks, that answers that particular question; the vcpu is blocked
>> waiting for something to happen, which probably means it missed the
>> event which was supposed to wake it up.  Why is another question.  At
>> least there's a workaround, and that workaround gives me some clue where
>> to look.
>>     
>
> Want me to test it?
>   

I'll probably look at this when my current batch of work is under 
control.  In the meantime, I'll submit the workaround patch to keep 
people happy.  The only downside is a small performance hit.

>> BTW, is it an SMP or UP domain?   Does it make a difference?
>>     
>
> It doesn't matter, I tried vcpu=1 and vcpu=2, unless you want me to try
> to recompile a UP kernel?
>   

It would be an interesting datapoint, but I don't think it will make a 
difference.

>> Maybe when I have nothing else to do.
>>     
>
> I'll wait, or should I poke xen-devel?
>   

Poke xen-devel.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-01-20 23:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-17 16:13 Cannot boot xen DomU > 2.6.23.1 xming
2008-01-17 17:15 ` Jeremy Fitzhardinge
2008-01-17 19:13   ` xming
2008-01-18  7:32     ` Jeremy Fitzhardinge
2008-01-18 12:38       ` xming
2008-01-18 16:19         ` Jeremy Fitzhardinge
2008-01-18 16:56           ` xming
2008-01-18 17:26             ` Jeremy Fitzhardinge
2008-01-20 13:08               ` xming
2008-01-20 18:37                 ` Jeremy Fitzhardinge
2008-01-20 19:29                   ` xming
2008-01-20 23:52                     ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).