LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: Xen paravirt frontend block hang
       [not found] <4772AC8E.7010007@theshore.net>
@ 2008-02-28 20:00 ` Jeremy Fitzhardinge
  2008-03-02  0:43   ` Christopher S. Aker
       [not found] ` <47758352.5040504@goop.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-28 20:00 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: virtualization, Linux Kernel Mailing List, Xen-devel

[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]

Christopher S. Aker wrote:
> Sorry for the noise if this isn't the appropriate venue for this.  I 
> posted this last month to xen-devel:
>
> http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html
>
> I can reliably cause a paravirt_ops Xen guest to hang during intensive 
> IO.  My current recipe is an untar/tar loop, without compression, of a 
> kernel tree.  For example:
>
> wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2
> bzip2 -d linux-2.6.23.tar.bz2
>
> while true;
>   echo `date`
>   tar xf linux-2.6.23.tar
>   tar cf linux-2.6.23.tar linux-2.6.23
> done
>
> After a few loops, anything that touches the xvd device that hung will 
> get stuck in D state.

I've been running this all night without seeing any problem.  I'm using 
current x86.git#testing with a few local patches, but nothing especially 
relevent-looking.

Could you try the attached patch to see if it makes any difference?

    J

>
> This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools).  Paravirt 
> guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and 
> 2.6.24-rc6.  It does *not* occur using the Xensource 2.6.18 domU tree 
> from 3.1.2.  In all cases, the host continues to run fine, nothing out 
> of the ordinary is logged on the dom0 side, xenstore reports the 
> status of the devices is fine.
>
> Can anyone reproduce this problem, or let me know what else I can 
> provide to help track this down?
>
> Thanks,
> -Chris
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization


[-- Attachment #2: xen-indirect-iret.patch --]
[-- Type: text/x-patch, Size: 2429 bytes --]

Subject: xen: use iret instruction all the time

Change iret implementation to not be dependent on direct-access vcpu
structure.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>

---
 arch/x86/xen/enlighten.c |    3 +--
 arch/x86/xen/xen-asm.S   |   11 +++--------
 arch/x86/xen/xen-ops.h   |    2 +-
 3 files changed, 5 insertions(+), 11 deletions(-)

===================================================================
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -860,7 +860,6 @@ void __init xen_setup_vcpu_info_placemen
 		pv_irq_ops.irq_disable = xen_irq_disable_direct;
 		pv_irq_ops.irq_enable = xen_irq_enable_direct;
 		pv_mmu_ops.read_cr2 = xen_read_cr2_direct;
-		pv_cpu_ops.iret = xen_iret_direct;
 	}
 }
 
@@ -964,7 +963,7 @@ static const struct pv_cpu_ops xen_cpu_o
 	.read_tsc = native_read_tsc,
 	.read_pmc = native_read_pmc,
 
-	.iret = (void *)&hypercall_page[__HYPERVISOR_iret],
+	.iret = xen_iret,
 	.irq_enable_syscall_ret = NULL,  /* never called */
 
 	.load_tr_desc = paravirt_nop,
===================================================================
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -130,13 +130,8 @@ ENDPATCH(xen_restore_fl_direct)
 	current stack state in whatever form its in, we keep things
 	simple by only using a single register which is pushed/popped
 	on the stack.
-
-	Non-direct iret could be done in the same way, but it would
-	require an annoying amount of code duplication.  We'll assume
-	that direct mode will be the common case once the hypervisor
-	support becomes commonplace.
  */
-ENTRY(xen_iret_direct)
+ENTRY(xen_iret)
 	/* test eflags for special cases */
 	testl $(X86_EFLAGS_VM | XEN_EFLAGS_NMI), 8(%esp)
 	jnz hyper_iret
@@ -150,9 +145,9 @@ ENTRY(xen_iret_direct)
 	GET_THREAD_INFO(%eax)
 	movl TI_cpu(%eax),%eax
 	movl __per_cpu_offset(,%eax,4),%eax
-	lea per_cpu__xen_vcpu_info(%eax),%eax
+	mov per_cpu__xen_vcpu(%eax),%eax
 #else
-	movl $per_cpu__xen_vcpu_info, %eax
+	movl per_cpu__xen_vcpu, %eax
 #endif
 
 	/* check IF state we're restoring */
===================================================================
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -63,5 +63,5 @@ DECL_ASM(unsigned long, xen_save_fl_dire
 DECL_ASM(unsigned long, xen_save_fl_direct, void);
 DECL_ASM(void, xen_restore_fl_direct, unsigned long);
 
-void xen_iret_direct(void);
+void xen_iret(void);
 #endif /* XEN_OPS_H */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Xen paravirt frontend block hang
       [not found]             ` <519a8b110802070612j2a1717f3s6aa25eeea8b7d18a@mail.gmail.com>
@ 2008-02-28 20:03               ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-28 20:03 UTC (permalink / raw)
  To: xming
  Cc: Christopher S. Aker, virtualization, Keir Fraser, Xen-devel,
	Linux Kernel Mailing List

xming wrote:
> On Feb 7, 2008 5:09 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>   
>> xming wrote:
>>     
>>> But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do
>>> not boot any more and 2.6.24 does boot but will hang after cpufreq changes
>>> the frequency.
>>>
>>>       
>> Interesting.  Do you mean dom0 cpufreq frequency changes will cause the
>> domU to hang?
>>
>>     J
>>
>>     
>
> Yes, when Dom0 changes freq while domU is doing something will trigger this.
> When using "on demand" will trigger this very eassily.
>
> This is from xm top when a domU hangs:
>
>     test32 ------       4018   98.8     131072    6.4     131072
> 6.4     1    1     4516    50087    1        0   433908   300403
> 3084907223
>
> So it appers to be running (eating CPU) sometimes the state is "r"
> sometimes "-",
> but both console and network are dead.
>   

I haven't tried to repro this yet, but I suspect I won't be able to 
because all my test machines have constant_tsc.  Does CPU change TSC 
rate on processor speed changes?

    J

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Xen paravirt frontend block hang
  2008-02-28 20:00 ` Xen paravirt frontend block hang Jeremy Fitzhardinge
@ 2008-03-02  0:43   ` Christopher S. Aker
  2008-03-02 15:35     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: Christopher S. Aker @ 2008-03-02  0:43 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: virtualization, Linux Kernel Mailing List, Xen-devel

Jeremy Fitzhardinge wrote:
> I've been running this all night without seeing any problem.  I'm using 
> current x86.git#testing with a few local patches, but nothing especially 
> relevent-looking.

Meh .. what backend are you using?  We're using LVM volumes exported 
directly into the domUs like so:

disk =[ 'phy:vg1/xencaker-56392,xvda,w', ... ]

> Could you try the attached patch to see if it makes any difference?

Unfortunately we're still in the same place... pv_ops kernels are still 
hanging after heavy disk IO:

works - 2.6.18.x (from xen-unstable)
hangs - 2.6.25-rc3-git3
hangs - 2.6.25-rc3-git3 + your patch

Any other suggestions or debugging I can provide that would be useful to 
squash this?

-Chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Xen paravirt frontend block hang
  2008-03-02  0:43   ` Christopher S. Aker
@ 2008-03-02 15:35     ` Jeremy Fitzhardinge
  2008-03-02 16:03       ` Christopher S. Aker
  0 siblings, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-02 15:35 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: virtualization, Linux Kernel Mailing List, Xen-devel

Christopher S. Aker wrote:
> Jeremy Fitzhardinge wrote:
>> I've been running this all night without seeing any problem.  I'm 
>> using current x86.git#testing with a few local patches, but nothing 
>> especially relevent-looking.
>
> Meh .. what backend are you using?  We're using LVM volumes exported 
> directly into the domUs like so:
>
> disk =[ 'phy:vg1/xencaker-56392,xvda,w', ... ]
>
>> Could you try the attached patch to see if it makes any difference?
>
> Unfortunately we're still in the same place... pv_ops kernels are 
> still hanging after heavy disk IO:
>
> works - 2.6.18.x (from xen-unstable)
> hangs - 2.6.25-rc3-git3
> hangs - 2.6.25-rc3-git3 + your patch
>
> Any other suggestions or debugging I can provide that would be useful 
> to squash this? 

Are you running an SMP or UP domain?  I found I could get hangs very 
easily with UP (but I need confirm it isn't a result of some other very 
experimental patches).

    J

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Xen paravirt frontend block hang
  2008-03-02 15:35     ` Jeremy Fitzhardinge
@ 2008-03-02 16:03       ` Christopher S. Aker
  2008-03-18 16:01         ` [Xen-devel] " Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: Christopher S. Aker @ 2008-03-02 16:03 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: virtualization, Linux Kernel Mailing List, Xen-devel

Jeremy Fitzhardinge wrote:
> Are you running an SMP or UP domain?  I found I could get hangs very 
> easily with UP (but I need confirm it isn't a result of some other very 
> experimental patches).

The hang occurs with both SMP and UP compiled pv_ops kernels.  SMP 
kernels are still slightly responsive after the hang occurs, which makes 
me think only one proc gets stuck at a time, not the entire kernel.

-Chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Xen-devel] Re: Xen paravirt frontend block hang
  2008-03-02 16:03       ` Christopher S. Aker
@ 2008-03-18 16:01         ` Jeremy Fitzhardinge
  2008-03-25  1:37           ` Christopher S. Aker
  0 siblings, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-18 16:01 UTC (permalink / raw)
  To: Christopher S. Aker
  Cc: Xen-devel, Linux Kernel Mailing List, virtualization, xming

Christopher S. Aker wrote:
> Jeremy Fitzhardinge wrote:
>> Are you running an SMP or UP domain?  I found I could get hangs very 
>> easily with UP (but I need confirm it isn't a result of some other 
>> very experimental patches).
>
> The hang occurs with both SMP and UP compiled pv_ops kernels.  SMP 
> kernels are still slightly responsive after the hang occurs, which 
> makes me think only one proc gets stuck at a time, not the entire kernel. 

The patch I posted yesterday - "xen: fix RMW when unmasking events" - 
should definitively fix the hanging-under-load bugs (I hope).  It 
problem came from returning to userspace with pending events, which 
would leave them hanging around on the vcpu unprocessed, and eventually 
everything would deadlock.  This was caused by using an unlocked 
read-modify-write operation on the event pending flag - which can be set 
by another (real) cpu - meaning that the pending event wasn't noticed 
until too late.  It would only be a problem on an SMP host.

The patch should back-apply to 2.6.24.

    J

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Xen-devel] Re: Xen paravirt frontend block hang
  2008-03-18 16:01         ` [Xen-devel] " Jeremy Fitzhardinge
@ 2008-03-25  1:37           ` Christopher S. Aker
  0 siblings, 0 replies; 7+ messages in thread
From: Christopher S. Aker @ 2008-03-25  1:37 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Xen-devel, Linux Kernel Mailing List, virtualization, xming

Jeremy Fitzhardinge wrote:
> Christopher S. Aker wrote:
>> Jeremy Fitzhardinge wrote:
>>> Are you running an SMP or UP domain?  I found I could get hangs very 
>>> easily with UP (but I need confirm it isn't a result of some other 
>>> very experimental patches).
>>
>> The hang occurs with both SMP and UP compiled pv_ops kernels.  SMP 
>> kernels are still slightly responsive after the hang occurs, which 
>> makes me think only one proc gets stuck at a time, not the entire kernel. 
> 
> The patch I posted yesterday - "xen: fix RMW when unmasking events" - 
> should definitively fix the hanging-under-load bugs (I hope). 

Confirmed-by: caker@theshore.net

Nice work!

-Chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-03-25  1:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4772AC8E.7010007@theshore.net>
2008-02-28 20:00 ` Xen paravirt frontend block hang Jeremy Fitzhardinge
2008-03-02  0:43   ` Christopher S. Aker
2008-03-02 15:35     ` Jeremy Fitzhardinge
2008-03-02 16:03       ` Christopher S. Aker
2008-03-18 16:01         ` [Xen-devel] " Jeremy Fitzhardinge
2008-03-25  1:37           ` Christopher S. Aker
     [not found] ` <47758352.5040504@goop.org>
     [not found]   ` <479E71B7.7060207@theshore.net>
     [not found]     ` <479E75E3.6030601@goop.org>
     [not found]       ` <479E7BA4.5050306@theshore.net>
     [not found]         ` <519a8b110802060437k70c099b7y7faefe63dd82039@mail.gmail.com>
     [not found]           ` <47AA845E.8020708@goop.org>
     [not found]             ` <519a8b110802070612j2a1717f3s6aa25eeea8b7d18a@mail.gmail.com>
2008-02-28 20:03               ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).