LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch] paravirt: VDSO page is essential
@ 2007-03-05 12:06 Ingo Molnar
  2007-03-05 12:36 ` Avi Kivity
                   ` (2 more replies)
  0 siblings, 3 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-05 12:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Roland McGrath, Andi Kleen, Rusty Russell

Subject: [patch] paravirt: VDSO page is essential
From: Ingo Molnar <mingo@elte.hu>

commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
an essential component of Linux, and this change forces all of them to 
use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
handle the VDSO properly then it can work things around via the vdso=0 
boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
case, it is a basic taste issue: we DO NOT #ifdef around core features 
like this!)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/i386/kernel/sysenter.c |    4 ----
 1 file changed, 4 deletions(-)

Index: linux/arch/i386/kernel/sysenter.c
===================================================================
--- linux.orig/arch/i386/kernel/sysenter.c
+++ linux/arch/i386/kernel/sysenter.c
@@ -27,11 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
 unsigned int __read_mostly vdso_enabled = 1;
-#endif
 
 EXPORT_SYMBOL_GPL(vdso_enabled);
 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 12:06 [patch] paravirt: VDSO page is essential Ingo Molnar
@ 2007-03-05 12:36 ` Avi Kivity
  2007-03-05 12:40   ` Ingo Molnar
  2007-03-05 14:28   ` Andi Kleen
  2007-03-05 13:28 ` Rusty Russell
  2007-03-05 14:27 ` Andi Kleen
  2 siblings, 2 replies; 86+ messages in thread
From: Avi Kivity @ 2007-03-05 12:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-kernel, Roland McGrath, Andi Kleen, Rusty Russell

Ingo Molnar wrote:
> Subject: [patch] paravirt: VDSO page is essential
> From: Ingo Molnar <mingo@elte.hu>
>
> commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
> CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
> an essential component of Linux, and this change forces all of them to 
> use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
> handle the VDSO properly then it can work things around via the vdso=0 
> boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
> case, it is a basic taste issue: we DO NOT #ifdef around core features 
> like this!)
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/i386/kernel/sysenter.c |    4 ----
>  1 file changed, 4 deletions(-)
>
> Index: linux/arch/i386/kernel/sysenter.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/sysenter.c
> +++ linux/arch/i386/kernel/sysenter.c
> @@ -27,11 +27,7 @@
>   * Should the kernel map a VDSO page into processes and pass its
>   * address down to glibc upon exec()?
>   */
> -#ifdef CONFIG_PARAVIRT
> -unsigned int __read_mostly vdso_enabled = 0;
> -#else
>  unsigned int __read_mostly vdso_enabled = 1;
> -#endif
>  
>  EXPORT_SYMBOL_GPL(vdso_enabled);
>  
>   

Can't paravirt patch the syscall instruction like it does the rest of 
the kernel?

[is someone keeping track of the number of patchsites? e.g. at what date 
will the entire kernel be generated at boot time?]


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 12:36 ` Avi Kivity
@ 2007-03-05 12:40   ` Ingo Molnar
  2007-03-05 13:00     ` Avi Kivity
  2007-03-05 14:28   ` Andi Kleen
  1 sibling, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-05 12:40 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Andrew Morton, Linus Torvalds, linux-kernel, Roland McGrath,
	Andi Kleen, Rusty Russell


* Avi Kivity <avi@qumranet.com> wrote:

> >-#ifdef CONFIG_PARAVIRT
> >-unsigned int __read_mostly vdso_enabled = 0;
> >-#else
> > unsigned int __read_mostly vdso_enabled = 1;
> >-#endif

> Can't paravirt patch the syscall instruction like it does the rest of 
> the kernel?

we want to keep the guest as simple and unmodified as possible. And all 
this #ifdef jungle /will/ bite back. Especially if the change goes in 
with zero explanation like it did:

    [PATCH] paravirt: Disable vdso by default when CONFIG_PARAVIRT is enabled

    They don't work together and this way even glibc still works.

i rather want an experimental feature (CONFIG_PARAVIRT) broken on some 
hypervisors for a bit than an entire body of guest OSs getting used to 
the "you dont have to deal with this VDSO annoyance by default" quirk 
forever ...

but yes, i agree that the hypervisor should have the ability to patch 
the syscall instruction of both the hypervisor interface and of the VDSO 
interface. But this wasnt implemented like that, and the #ifdef quirk 
just /prevents/ a sane solution like that from ever getting done the 
right way.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 12:40   ` Ingo Molnar
@ 2007-03-05 13:00     ` Avi Kivity
  2007-03-05 13:32       ` Rusty Russell
  0 siblings, 1 reply; 86+ messages in thread
From: Avi Kivity @ 2007-03-05 13:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Linus Torvalds, linux-kernel, Roland McGrath,
	Andi Kleen, Rusty Russell

Ingo Molnar wrote:
> * Avi Kivity <avi@qumranet.com> wrote:
>
>   
>>> -#ifdef CONFIG_PARAVIRT
>>> -unsigned int __read_mostly vdso_enabled = 0;
>>> -#else
>>> unsigned int __read_mostly vdso_enabled = 1;
>>> -#endif
>>>       
>
>   
>> Can't paravirt patch the syscall instruction like it does the rest of 
>> the kernel?
>>     
>
> we want to keep the guest as simple and unmodified as possible. And all 
> this #ifdef jungle /will/ bite back. Especially if the change goes in 
> with zero explanation like it did:
>
>     [PATCH] paravirt: Disable vdso by default when CONFIG_PARAVIRT is enabled
>
>     They don't work together and this way even glibc still works.
>
> i rather want an experimental feature (CONFIG_PARAVIRT) broken on some 
> hypervisors for a bit than an entire body of guest OSs getting used to 
> the "you dont have to deal with this VDSO annoyance by default" quirk 
> forever ...
>   

Sure, I agree with this patch.  I'm talking about an alternate solution 
so Xen can work with the vdso instead of #ifdefing away the kernel.

> but yes, i agree that the hypervisor should have the ability to patch 
> the syscall instruction of both the hypervisor interface and of the VDSO 
> interface. But this wasnt implemented like that, and the #ifdef quirk 
> just /prevents/ a sane solution like that from ever getting done the 
> right way.
>   

Rusty, shouldn't this be a one-liner?  No need to involve the hypervisor 
here; the guest can s/syscall/int 80/ on its vdso page like it patches 
cli and its ilk.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 12:06 [patch] paravirt: VDSO page is essential Ingo Molnar
  2007-03-05 12:36 ` Avi Kivity
@ 2007-03-05 13:28 ` Rusty Russell
  2007-03-05 13:38   ` Ingo Molnar
                     ` (2 more replies)
  2007-03-05 14:27 ` Andi Kleen
  2 siblings, 3 replies; 86+ messages in thread
From: Rusty Russell @ 2007-03-05 13:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-kernel, Roland McGrath, Andi Kleen, virtualization

On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:
> Subject: [patch] paravirt: VDSO page is essential
> From: Ingo Molnar <mingo@elte.hu>
> 
> commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
> CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
> an essential component of Linux, and this change forces all of them to 
> use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
> handle the VDSO properly then it can work things around via the vdso=0 
> boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
> case, it is a basic taste issue: we DO NOT #ifdef around core features 
> like this!)

I agree with the criticism, dislike the snarly comments, and disagree
with this patch.

VDSO is only a problem if (1) the hypervisor wants to reserve the top
virtual address space (CONFIG_PARAVIRT=y), and (2) the glibc is old and
can't handle a VDSO mapped anywhere but 0xFFFFE000
(CONFIG_COMPAT_VDSO=y).

Now, KVM wants to use CONFIG_PARAVIRT=y but not reserve_top_address(),
so we should split the config option.  Let's not get too excited because
we kept it simple.  Patch (untested, but fairly simple) below.

BTW, I had a patch to do a runtime test (old glibc causes init to
assert, then disable vdso and try again): everyone hated it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r f75715e64a3b arch/i386/Kconfig
--- a/arch/i386/Kconfig	Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/Kconfig	Tue Mar 06 00:20:44 2007 +1100
@@ -218,9 +218,18 @@ config PARAVIRT
 	  However, when run without a hypervisor the kernel is
 	  theoretically slower.  If in doubt, say N.
 
+config RESERVE_TOP
+	bool
+	help
+	  Many hypervisors want to reserve some amount of the top of
+	  virtual address space.  Unfortunately, old glibc needs the
+	  vdso page there, so we must disable vdso if COMPAT_VDSO is
+	  enabled as well as this option.
+
 config VMI
 	bool "VMI Paravirt-ops support"
 	depends on PARAVIRT && !NO_HZ
+	select RESERVE_TOP
 	default y
 	help
 	  VMI provides a paravirtualized interface to multiple hypervisors
@@ -893,9 +902,10 @@ config COMPAT_VDSO
 config COMPAT_VDSO
 	bool "Compat VDSO support"
 	default y
-	depends on !PARAVIRT
-	help
-	  Map the VDSO to the predictable old-style address too.
+	help
+	  Map the VDSO to the predictable old-style address too, or
+	  in the case of a VMI/Xen/lguest virtualized guest, don't create
+	  the VDSO at all.
 	---help---
 	  Say N here if you are running a sufficiently recent glibc
 	  version (2.3.3 or later), to remove the high-mapped
diff -r f75715e64a3b arch/i386/kernel/sysenter.c
--- a/arch/i386/kernel/sysenter.c	Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/kernel/sysenter.c	Tue Mar 06 00:21:42 2007 +1100
@@ -27,7 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
+#if defined(CONFIG_COMPAT_VDSO) && defined(CONFIG_RESERVE_TOP)
 unsigned int __read_mostly vdso_enabled = 0;
 #else
 unsigned int __read_mostly vdso_enabled = 1;
diff -r f75715e64a3b arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.c	Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/mm/pgtable.c	Tue Mar 06 00:06:00 2007 +1100
@@ -173,7 +173,7 @@ void reserve_top_address(unsigned long r
 	BUG_ON(fixmaps > 0);
 	printk(KERN_INFO "Reserving virtual address space above 0x%08x\n",
 	       (int)-reserve);
-#ifdef CONFIG_COMPAT_VDSO
+#ifndef CONFIG_RESERVE_TOP
 	BUG_ON(reserve != 0);
 #else
 	__FIXADDR_TOP = -reserve - PAGE_SIZE;



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 13:00     ` Avi Kivity
@ 2007-03-05 13:32       ` Rusty Russell
  0 siblings, 0 replies; 86+ messages in thread
From: Rusty Russell @ 2007-03-05 13:32 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Andrew Morton, Linus Torvalds, linux-kernel,
	Roland McGrath, Andi Kleen

On Mon, 2007-03-05 at 15:00 +0200, Avi Kivity wrote:
> Ingo Molnar wrote:
> > * Avi Kivity <avi@qumranet.com> wrote:
> > but yes, i agree that the hypervisor should have the ability to patch 
> > the syscall instruction of both the hypervisor interface and of the VDSO 
> > interface. But this wasnt implemented like that, and the #ifdef quirk 
> > just /prevents/ a sane solution like that from ever getting done the 
> > right way.
> >   
> 
> Rusty, shouldn't this be a one-liner?  No need to involve the hypervisor 
> here; the guest can s/syscall/int 80/ on its vdso page like it patches 
> cli and its ilk.

Probably, but this is a red herring: see previous reply.  Andi was a
little overzealous w/ CONFIG_PARAVIRT & COMPAT_VDSO, that's all.

I've never thought of replacing the syscall insn.  I'll see if I can
come up with a good reason to want to 8)

Rusty.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 13:28 ` Rusty Russell
@ 2007-03-05 13:38   ` Ingo Molnar
  2007-03-05 14:34   ` Andi Kleen
  2007-03-06  0:57   ` Rusty Russell
  2 siblings, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-05 13:38 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Andrew Morton, linux-kernel, Roland McGrath, Andi Kleen, virtualization


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> -#ifdef CONFIG_PARAVIRT
> +#if defined(CONFIG_COMPAT_VDSO) && defined(CONFIG_RESERVE_TOP)

NACK - my patch is quite a bit simpler and yours only increases the
#ifdef jungle. If there's any complication of the VDSO coming from some
other hypervisor support patch then I will judge that in full context, 
when it's submitted. Meanwhile, my patch is a must-have for v2.6.21.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [patch] paravirt: re-enable COMPAT_VDSO
  2007-03-05 14:34   ` Andi Kleen
@ 2007-03-05 13:46     ` Ingo Molnar
  2007-03-05 13:48     ` [patch] paravirt: VDSO page is essential Ingo Molnar
  2007-03-05 20:11     ` Zachary Amsden
  2 siblings, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-05 13:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Rusty Russell, Andrew Morton, Linus Torvalds, linux-kernel,
	Roland McGrath

Subject: [patch] paravirt: re-enable COMPAT_VDSO
From: Ingo Molnar <mingo@elte.hu>

CONFIG_PARAVIRT broke old glibc bootup: it silently turned off the 
selectability of CONFIG_COMPAT_VDSO and thus rendered distro kernels 
unbootable on old-style VDSO glibc setups.

the proper solution is to keep COMPAT_VDSO available - if a hypervisor 
needs any modification of that concept then we'll judge those changes in 
full context, once those changes are submitted.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/i386/Kconfig |    1 -
 1 file changed, 1 deletion(-)

Index: linux/arch/i386/Kconfig
===================================================================
--- linux.orig/arch/i386/Kconfig
+++ linux/arch/i386/Kconfig
@@ -897,7 +897,6 @@ config HOTPLUG_CPU
 config COMPAT_VDSO
 	bool "Compat VDSO support"
 	default y
-	depends on !PARAVIRT
 	help
 	  Map the VDSO to the predictable old-style address too.
 	---help---

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 14:34   ` Andi Kleen
  2007-03-05 13:46     ` [patch] paravirt: re-enable COMPAT_VDSO Ingo Molnar
@ 2007-03-05 13:48     ` Ingo Molnar
  2007-03-05 20:11     ` Zachary Amsden
  2 siblings, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-05 13:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Rusty Russell, Andrew Morton, linux-kernel, Roland McGrath,
	virtualization


* Andi Kleen <ak@suse.de> wrote:

> > VDSO is only a problem if (1) the hypervisor wants to reserve the 
> > top virtual address space (CONFIG_PARAVIRT=y), and (2) the glibc is 
> > old and
> 
> It broke the boot even with native hardware, no hypervisor

yeah - i just sent the fix for that regression - by making 
CONFIG_COMPAT_VDSO usable again.

basically, we want to avoid all the compatibility mess that exists on 
the hypervisor side to enter the Linux kernel.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 14:28   ` Andi Kleen
@ 2007-03-05 13:48     ` Ingo Molnar
  2007-03-05 14:58       ` Andi Kleen
  0 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-05 13:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Avi Kivity, Andrew Morton, linux-kernel, Roland McGrath, Rusty Russell


* Andi Kleen <ak@suse.de> wrote:

> The problem is not the syscall instruction, but some problem with the 
> way the vDSO mapping is set up with CONFIG_PARAVIRT that broke older 
> glibc.

the problem is that CONFIG_PARAVIRT silently turns off 
CONFIG_COMPAT_VDSO, which of course breaks 'old' glibc. This too is a 
must-have fix for v2.6.21.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 14:58       ` Andi Kleen
@ 2007-03-05 13:59         ` Ingo Molnar
  2007-03-05 14:10           ` Avi Kivity
  0 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-05 13:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Avi Kivity, Andrew Morton, Linus Torvalds, linux-kernel,
	Roland McGrath, Rusty Russell


* Andi Kleen <ak@suse.de> wrote:

> I think we would need to have a paravirt ops callback to decide this 
> first. But it doesn't look critical to me anyways.

well, it's critical to me in two ways: 1) to make the i386 paravirt code 
clean 2) to have a proper VDSO for a KVM paravirtual guest. The original 
change is also bad because it changes how a Linux guest behaves: it 
turns off the vdso by default, and disables the compat VDSO. I.e. it's a 
bad performance step backwards if CONFIG_PARAVIRT is enabled (uses int 
$0x80 instead of sysenter), which hurts only KVM and basically none of 
the other hypervisors. It also muddifies the VDSO picture wrt. 
virtualization.

i.e. it hurts the sane stuff and benefits the crappy stuff, and my 
change undoes that. That's enough for me to call it critical ;-)

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 13:59         ` Ingo Molnar
@ 2007-03-05 14:10           ` Avi Kivity
  2007-03-05 14:10             ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Avi Kivity @ 2007-03-05 14:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Andrew Morton, Linus Torvalds, linux-kernel,
	Roland McGrath, Rusty Russell

Ingo Molnar wrote:
> * Andi Kleen <ak@suse.de> wrote:
>
>   
>> I think we would need to have a paravirt ops callback to decide this 
>> first. But it doesn't look critical to me anyways.
>>     
>
> well, it's critical to me in two ways: 1) to make the i386 paravirt code 
> clean 2) to have a proper VDSO for a KVM paravirtual guest. The original 
> change is also bad because it changes how a Linux guest behaves: it 
> turns off the vdso by default, and disables the compat VDSO. I.e. it's a 
> bad performance step backwards if CONFIG_PARAVIRT is enabled (uses int 
> $0x80 instead of sysenter), which hurts only KVM and basically none of 
> the other hypervisors. It also muddifies the VDSO picture wrt. 
> virtualization.
>
>   

kvm paravirt is only used by developers at this time.  I don't have a 
problem with solving this in the 2.6.22 timeframe.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 14:10           ` Avi Kivity
@ 2007-03-05 14:10             ` Ingo Molnar
  0 siblings, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-05 14:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Andi Kleen, Andrew Morton, Linus Torvalds, linux-kernel,
	Roland McGrath, Rusty Russell


* Avi Kivity <avi@qumranet.com> wrote:

> kvm paravirt is only used by developers at this time.  I don't have a 
> problem with solving this in the 2.6.22 timeframe.

oh, certainly, with a KVM hat on i'd agree. But i also have my 
cares-about-i386-arch-cleanliness and cares-about-Linux-maintainability 
hats on ;-) I only mentioned KVM as an example that it /can/ be done 
cleanly.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 12:06 [patch] paravirt: VDSO page is essential Ingo Molnar
  2007-03-05 12:36 ` Avi Kivity
  2007-03-05 13:28 ` Rusty Russell
@ 2007-03-05 14:27 ` Andi Kleen
  2007-03-05 21:58   ` Roland McGrath
  2 siblings, 1 reply; 86+ messages in thread
From: Andi Kleen @ 2007-03-05 14:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-kernel, Roland McGrath, Andi Kleen, Rusty Russell

On Mon, Mar 05, 2007 at 01:06:31PM +0100, Ingo Molnar wrote:
> Subject: [patch] paravirt: VDSO page is essential
> From: Ingo Molnar <mingo@elte.hu>
> 
> commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
> CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 

Well it was the change that made my test machine (with SUSE 9.0 userland)
work with CONFIG_PARAVIRT on. If you have a better solution please post
it.

> an essential component of Linux, and this change forces all of them to 
> use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
> handle the VDSO properly then it can work things around via the vdso=0 

No hypervisor involved, just CONFIG_PARAVIRT=y on bare hardware.

> boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
> case, it is a basic taste issue: we DO NOT #ifdef around core features 
> like this!)

We set sensible defaults for full backwards compatibility. 

-Andi

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 12:36 ` Avi Kivity
  2007-03-05 12:40   ` Ingo Molnar
@ 2007-03-05 14:28   ` Andi Kleen
  2007-03-05 13:48     ` Ingo Molnar
  1 sibling, 1 reply; 86+ messages in thread
From: Andi Kleen @ 2007-03-05 14:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, Roland McGrath,
	Andi Kleen, Rusty Russell

> Can't paravirt patch the syscall instruction like it does the rest of 
> the kernel?

The problem is not the syscall instruction, but some problem with
the way the vDSO mapping is set up with CONFIG_PARAVIRT that broke older glibc.

-Andi

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 13:28 ` Rusty Russell
  2007-03-05 13:38   ` Ingo Molnar
@ 2007-03-05 14:34   ` Andi Kleen
  2007-03-05 13:46     ` [patch] paravirt: re-enable COMPAT_VDSO Ingo Molnar
                       ` (2 more replies)
  2007-03-06  0:57   ` Rusty Russell
  2 siblings, 3 replies; 86+ messages in thread
From: Andi Kleen @ 2007-03-05 14:34 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, Roland McGrath,
	Andi Kleen, virtualization

> VDSO is only a problem if (1) the hypervisor wants to reserve the top
> virtual address space (CONFIG_PARAVIRT=y), and (2) the glibc is old and

It broke the boot even with native hardware, no hypervisor

>  
> +config RESERVE_TOP
> +	bool
> +	help
> +	  Many hypervisors want to reserve some amount of the top of
> +	  virtual address space.  Unfortunately, old glibc needs the
> +	  vdso page there, so we must disable vdso if COMPAT_VDSO is
> +	  enabled as well as this option.

But this still means I would need to decide between a PARAVIRT
kernel that either supports xen/VMI or cannot boot old user land without
weird options.  I don't think that's the correct solution. The goal
is a single binary that runs everywhere and is still compatible.

What would probably work is to somehow decide at runtime if a hypervisor
is there or not and then set vdso default based on that. I guess that
detection would be hypervisor specific though and probably would
need paravirt ops extensions.

-Andi


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 13:48     ` Ingo Molnar
@ 2007-03-05 14:58       ` Andi Kleen
  2007-03-05 13:59         ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Andi Kleen @ 2007-03-05 14:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Avi Kivity, Andrew Morton, linux-kernel,
	Roland McGrath, Rusty Russell

On Mon, Mar 05, 2007 at 02:48:54PM +0100, Ingo Molnar wrote:
> 
> * Andi Kleen <ak@suse.de> wrote:
> 
> > The problem is not the syscall instruction, but some problem with the 
> > way the vDSO mapping is set up with CONFIG_PARAVIRT that broke older 
> > glibc.
> 
> the problem is that CONFIG_PARAVIRT silently turns off 
> CONFIG_COMPAT_VDSO, which of course breaks 'old' glibc. This too is a 
> must-have fix for v2.6.21.

We seem to have different definitions of must-have.

I think we would need to have a paravirt ops callback to decide this
first. But it doesn't look critical to me anyways.

-Andi

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 14:34   ` Andi Kleen
  2007-03-05 13:46     ` [patch] paravirt: re-enable COMPAT_VDSO Ingo Molnar
  2007-03-05 13:48     ` [patch] paravirt: VDSO page is essential Ingo Molnar
@ 2007-03-05 20:11     ` Zachary Amsden
  2007-03-05 20:16       ` Andi Kleen
  2007-03-05 20:19       ` Ingo Molnar
  2 siblings, 2 replies; 86+ messages in thread
From: Zachary Amsden @ 2007-03-05 20:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Rusty Russell, Ingo Molnar, Andrew Morton, linux-kernel,
	Roland McGrath, virtualization

Andi Kleen wrote:
> But this still means I would need to decide between a PARAVIRT
> kernel that either supports xen/VMI or cannot boot old user land without
> weird options.  I don't think that's the correct solution. The goal
> is a single binary that runs everywhere and is still compatible.
>   

Rusty's patch looks a step in the right direction to me.  I can't 
comment on Ingo's patch as he failed to -cc me on it, despite the fact 
that we are the only ones affected by it.  We are not concerned so much 
with supporting legacy user land deployments, as we expect for the most 
part, the install scenario to happen when upgrading to new distros.

> What would probably work is to somehow decide at runtime if a hypervisor
> is there or not and then set vdso default based on that. I guess that
> detection would be hypervisor specific though and probably would
> need paravirt ops extensions.
>   

What we really need to do is to be able to detect an old user land and 
drop VDSO support when that is found.  But since we can't do that, the 
next best thing is to allow the hypervisor to choose whatever workaround 
it wants when it moves the fixmap and compat_vdso was enabled.  In our 
case, the workaround we will want is a boot option to disable VDSO for 
old user land, and a printk warning if you take #GPs and kill the init 
proc, because for us, this is not an expected support scenario.  We 
would much rather support the VDSO by default in paravirt kernels even 
with COMPAT_VDSO turned on.

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 20:11     ` Zachary Amsden
@ 2007-03-05 20:16       ` Andi Kleen
  2007-03-05 20:33         ` Zachary Amsden
  2007-03-05 20:19       ` Ingo Molnar
  1 sibling, 1 reply; 86+ messages in thread
From: Andi Kleen @ 2007-03-05 20:16 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Rusty Russell, Ingo Molnar, Andrew Morton, linux-kernel,
	Roland McGrath, virtualization

 We are not concerned so much 
> with supporting legacy user land deployments,

I am concerned about that. I won't merge any patches that break compatibility
by default.

> > What would probably work is to somehow decide at runtime if a hypervisor
> > is there or not and then set vdso default based on that. I guess that
> > detection would be hypervisor specific though and probably would
> > need paravirt ops extensions.
> >   
> 
> What we really need to do is to be able to detect an old user land and 
> drop VDSO support when that is found.  

Rusty implemented that, but it was widely considered too ugly
(and it was not 100% reliable e.g. with chroots) 

> But since we can't do that, the  
> next best thing is to allow the hypervisor to choose whatever workaround 
> it wants when it moves the fixmap and compat_vdso was enabled.  In our 
> case, the workaround we will want is a boot option to disable VDSO for 
> old user land,

The boot option is already there, but boot options for is not my
idea of user friendly binary compatibility.

> and a printk warning if you take #GPs and kill the init  
> proc, because for us, this is not an expected support scenario.  We 
> would much rather support the VDSO by default in paravirt kernels even 
> with COMPAT_VDSO turned on.

But you can't have it at the compatible fixed address, right?

-Andi

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 20:11     ` Zachary Amsden
  2007-03-05 20:16       ` Andi Kleen
@ 2007-03-05 20:19       ` Ingo Molnar
  2007-03-05 20:42         ` Zachary Amsden
  1 sibling, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-05 20:19 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Andi Kleen, Rusty Russell, Andrew Morton, linux-kernel,
	Roland McGrath, virtualization


* Zachary Amsden <zach@vmware.com> wrote:

> [...]  I can't comment on Ingo's patch as he failed to -cc me on it,

it's on lkml, i'll bounce it to you separately.

> despite the fact that we are the only ones affected by it. [...]

actually, no, my motivation for it was KVM (a Linux based hypervisor and 
its paravirtual interface). You said that VMI wont hinder Linux 
development and design in any way and can adopt to whatever change the 
upstream kernel does, so i'm taking your word for that.

> [...] We are not concerned so much with supporting legacy user land 
> deployments, as we expect for the most part, the install scenario to 
> happen when upgrading to new distros.

that's not really the right argument. This obviously affects the host 
kernel too if it has CONFIG_PARAVIRT enabled. The other option is the 
removal (or temporary disabling) of the whole CONFIG_VMI and 
CONFIG_PARAVIRT stuff if you cannot get into minimal release shape, it's 
experimental stuff after all.

> What we really need to do is to be able to detect an old user land and 
> drop VDSO support when that is found.  [...]

no. What you need is to keep existing mechanisms and not hack off the 
vdso and CONFIG_COMPAT_VDSO...

> [...] But since we can't do that, the next best thing is to allow the 
> hypervisor to choose whatever workaround it wants when it moves the 
> fixmap and compat_vdso was enabled.  In our case, the workaround we 
> will want is a boot option to disable VDSO for old user land, [...]

there's no need to disable the VDSO for old userspace ...

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 20:16       ` Andi Kleen
@ 2007-03-05 20:33         ` Zachary Amsden
  0 siblings, 0 replies; 86+ messages in thread
From: Zachary Amsden @ 2007-03-05 20:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Rusty Russell, Ingo Molnar, Andrew Morton, linux-kernel,
	Roland McGrath, virtualization

Andi Kleen wrote:
> The boot option is already there, but boot options for is not my
> idea of user friendly binary compatibility.
>   

Yes, not friendly.  Perhaps we can reverse the boot option?  I.e make 
vdso_enable=force re-activate the vdso even if it gets moved by a 
hypervisor and COMPAT_VDSO was compiled in.

But how many actual users are affected by these old glibc's that we need 
to care so much about them running new technology, especially if that 
technology can be built with the ability to warn them that they probably 
need to boot with vdso=disabled?


>> and a printk warning if you take #GPs and kill the init  
>> proc, because for us, this is not an expected support scenario.  We 
>> would much rather support the VDSO by default in paravirt kernels even 
>> with COMPAT_VDSO turned on.
>>     
>
> But you can't have it at the compatible fixed address, right?
>   

While technically possible, is is not possible to do that anytime soon, 
and the drawbacks might overshadow the performance gain of the VDSO - 
and I don't think any of the other hypervisors (lhype, Xen) will be able 
to do that either.  KVM can do it, only because it relies on hardware virt.

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 20:19       ` Ingo Molnar
@ 2007-03-05 20:42         ` Zachary Amsden
  0 siblings, 0 replies; 86+ messages in thread
From: Zachary Amsden @ 2007-03-05 20:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Rusty Russell, Andrew Morton, linux-kernel,
	Roland McGrath, virtualization

Ingo Molnar wrote:
> there's no need to disable the VDSO for old userspace ...
>   

Well, apart from the obvious question to which nobody actually knows the 
answer, (how many people run old user space that required 
CONFIG_COMPAT_VDSO), what do you think of reversing the boot option?

vdso=enabled (default - turn on VDSO on normal boots)
vdso=disabled (turn off VDSO unconditionally)
[vdso=compat] (default for COMPAT_VDSO - keep VDSO only when mapped at 
compat location.  Note the option is not required to be implemented 
because it is logically implied from vdso=enabled && COMPAT_VDSO and the 
default boot behavior)
vdso=force (keep VDSO even when moved to a new location and COMPAT_VDSO 
is enabled).

In our case, installing VMware tools in the guest would then detect if 
userspace supports VDSO or if it requires COMPAT_VDSO and would then set 
boot parameters for the kernel appropriately.  And the native boot and 
kvm paravirt-ops boot are completely unaffected.

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 14:27 ` Andi Kleen
@ 2007-03-05 21:58   ` Roland McGrath
  2007-03-05 22:01     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 86+ messages in thread
From: Roland McGrath @ 2007-03-05 21:58 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, Rusty Russell

Does the old userland compatibility you're concerned about really need the
vdso to be at 0xfffffe000 in particular, or just need it to be at a fixed
address that matches the phdrs inside the image?  My recollection of the old
glibc's limitation was that it expected the image's phdrs to match its load
address.  The xen kernels used to change this to 0xffffd000 or something,
and AFAIK that was fine.  If that's all that's needed, it is not so hard to
adjust the vDSO contents at boot time (phdrs, shdrs, and symbols; no code
contents use the absolute address).  Under CONFIG_COMPAT_VDSO, it can see
where the paravirt moved the fixmap to, and apply adjustments.

That said, I don't think this is worth all that much effort since
CONFIG_COMPAT_VDSO is not really desireable for most people.  I think
disabling the vdso under CONFIG_COMPAT_VDSO+CONFIG_PARAVIRT is survivable
(just don't set CONFIG_COMPAT_VDSO for a system you want to be optimal).


Thanks,
Roland

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 21:58   ` Roland McGrath
@ 2007-03-05 22:01     ` Jeremy Fitzhardinge
  2007-03-05 22:58       ` Roland McGrath
  0 siblings, 1 reply; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-05 22:01 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, linux-kernel,
	Rusty Russell, Jan Beulich

Roland McGrath wrote:
> Does the old userland compatibility you're concerned about really need the
> vdso to be at 0xfffffe000 in particular, or just need it to be at a fixed
> address that matches the phdrs inside the image?  My recollection of the old
> glibc's limitation was that it expected the image's phdrs to match its load
> address.  The xen kernels used to change this to 0xffffd000 or something,
> and AFAIK that was fine.  If that's all that's needed, it is not so hard to
> adjust the vDSO contents at boot time (phdrs, shdrs, and symbols; no code
> contents use the absolute address).  Under CONFIG_COMPAT_VDSO, it can see
> where the paravirt moved the fixmap to, and apply adjustments.
>   

Jan Beulich just posted a patch to do just this - relocate the vdso's
ELF header.  If that's all that's really required to keep COMPAT_VDSO
viable under PARAVIRT, then it seems like the way to go.

    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 22:01     ` Jeremy Fitzhardinge
@ 2007-03-05 22:58       ` Roland McGrath
  2007-03-05 23:03         ` Jeremy Fitzhardinge
  2007-03-06  8:34         ` Ingo Molnar
  0 siblings, 2 replies; 86+ messages in thread
From: Roland McGrath @ 2007-03-05 22:58 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, linux-kernel,
	Rusty Russell, Jan Beulich

> Jan Beulich just posted a patch to do just this - relocate the vdso's
> ELF header.  If that's all that's really required to keep COMPAT_VDSO
> viable under PARAVIRT, then it seems like the way to go.

I found http://marc.theaimsgroup.com/?l=xen-devel&m=117309332600075&w=2 and
that must be the one you meant.  The ELF-grokking form of that is exactly
what I had in mind.  The "find relocs with cmp" scheme is pretty silly, but
also works fine.  It trades tweaky ELF knowledge with tweaky fragile build
methods, but it's all about the same to me.  


Thanks,
Roland

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 22:58       ` Roland McGrath
@ 2007-03-05 23:03         ` Jeremy Fitzhardinge
  2007-03-06  8:34         ` Ingo Molnar
  1 sibling, 0 replies; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-05 23:03 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, linux-kernel,
	Rusty Russell, Jan Beulich

Roland McGrath wrote:
>> Jan Beulich just posted a patch to do just this - relocate the vdso's
>> ELF header.  If that's all that's really required to keep COMPAT_VDSO
>> viable under PARAVIRT, then it seems like the way to go.
>>     
>
> I found http://marc.theaimsgroup.com/?l=xen-devel&m=117309332600075&w=2 and
> that must be the one you meant.  The ELF-grokking form of that is exactly
> what I had in mind.  The "find relocs with cmp" scheme is pretty silly, but
> also works fine.  It trades tweaky ELF knowledge with tweaky fragile build
> methods, but it's all about the same to me.  
>   

That's the one.  I think the C version is the one to go with.

    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 13:28 ` Rusty Russell
  2007-03-05 13:38   ` Ingo Molnar
  2007-03-05 14:34   ` Andi Kleen
@ 2007-03-06  0:57   ` Rusty Russell
  2007-03-06  1:03     ` Zachary Amsden
  2 siblings, 1 reply; 86+ messages in thread
From: Rusty Russell @ 2007-03-06  0:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-kernel, Roland McGrath, Andi Kleen, virtualization

On Tue, 2007-03-06 at 00:28 +1100, Rusty Russell wrote:
> On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:
> > Subject: [patch] paravirt: VDSO page is essential
> > From: Ingo Molnar <mingo@elte.hu>
> > 
> > commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
> > CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
> > an essential component of Linux, and this change forces all of them to 
> > use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
> > handle the VDSO properly then it can work things around via the vdso=0 
> > boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
> > case, it is a basic taste issue: we DO NOT #ifdef around core features 
> > like this!)
> 
> I agree with the criticism, dislike the snarly comments, and disagree
> with this patch.

And my patch was pretty crack-induced too.  Sorry.

I shouldn't have been thinking about using CONFIG options at all: we
should simply disable the vdso if CONFIG_COMPAT_VDSO=y when we
*actually* reserve top memory.

This still need some work (doing that now), but do people like the idea?

The current "vdso_disabled" flag merely disabled the ELF note, so it
needs to be made a little stronger, to not set up the vdso at all.

diff -r f75715e64a3b arch/i386/Kconfig
--- a/arch/i386/Kconfig	Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/Kconfig	Tue Mar 06 09:30:36 2007 +1100
@@ -893,9 +893,10 @@ config COMPAT_VDSO
 config COMPAT_VDSO
 	bool "Compat VDSO support"
 	default y
-	depends on !PARAVIRT
-	help
-	  Map the VDSO to the predictable old-style address too.
+	help
+	  Map the VDSO to the predictable old-style address too, or
+	  in the case of a VMI/Xen/lguest virtualized guest, don't create
+	  the VDSO at all.
 	---help---
 	  Say N here if you are running a sufficiently recent glibc
 	  version (2.3.3 or later), to remove the high-mapped
diff -r f75715e64a3b arch/i386/kernel/sysenter.c
--- a/arch/i386/kernel/sysenter.c	Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/kernel/sysenter.c	Tue Mar 06 09:25:47 2007 +1100
@@ -27,11 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
 unsigned int __read_mostly vdso_enabled = 1;
-#endif
 
 EXPORT_SYMBOL_GPL(vdso_enabled);
 
@@ -51,7 +47,7 @@ void enable_sep_cpu(void)
 	int cpu = get_cpu();
 	struct tss_struct *tss = &per_cpu(init_tss, cpu);
 
-	if (!boot_cpu_has(X86_FEATURE_SEP)) {
+	if (!boot_cpu_has(X86_FEATURE_SEP) || !vdso_enabled) {
 		put_cpu();
 		return;
 	}
@@ -74,7 +70,12 @@ static struct page *syscall_pages[1];
 
 int __init sysenter_setup(void)
 {
-	void *syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
+	void *syscall_page;
+
+	if (!vdso_enabled)
+		return 0;
+
+	syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
 	syscall_pages[0] = virt_to_page(syscall_page);
 
 #ifdef CONFIG_COMPAT_VDSO
@@ -106,6 +107,9 @@ int arch_setup_additional_pages(struct l
 	struct mm_struct *mm = current->mm;
 	unsigned long addr;
 	int ret;
+
+	if (!vdso_enabled)
+		return 0;
 
 	down_write(&mm->mmap_sem);
 	addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
diff -r f75715e64a3b arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.c	Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/mm/pgtable.c	Tue Mar 06 09:32:51 2007 +1100
@@ -144,10 +144,8 @@ void set_pmd_pfn(unsigned long vaddr, un
 }
 
 static int fixmaps;
-#ifndef CONFIG_COMPAT_VDSO
 unsigned long __FIXADDR_TOP = 0xfffff000;
 EXPORT_SYMBOL(__FIXADDR_TOP);
-#endif
 
 void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t flags)
 {
@@ -174,11 +172,12 @@ void reserve_top_address(unsigned long r
 	printk(KERN_INFO "Reserving virtual address space above 0x%08x\n",
 	       (int)-reserve);
 #ifdef CONFIG_COMPAT_VDSO
-	BUG_ON(reserve != 0);
-#else
+	/* We can't have both reserved space and VDSO at 0xFFFFE000. */
+	if (reserve)
+		vdso_enabled = 0;
+#endif
 	__FIXADDR_TOP = -reserve - PAGE_SIZE;
 	__VMALLOC_RESERVE += reserve;
-#endif
 }
 
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
diff -r f75715e64a3b include/asm-i386/fixmap.h
--- a/include/asm-i386/fixmap.h	Tue Mar 06 00:04:50 2007 +1100
+++ b/include/asm-i386/fixmap.h	Tue Mar 06 09:29:15 2007 +1100
@@ -19,10 +19,8 @@
  * Leave one empty page between vmalloc'ed areas and
  * the start of the fixmap.
  */
-#ifndef CONFIG_COMPAT_VDSO
 extern unsigned long __FIXADDR_TOP;
-#else
-#define __FIXADDR_TOP  0xfffff000
+#ifdef CONFIG_COMPAT_VDSO
 #define FIXADDR_USER_START	__fix_to_virt(FIX_VDSO)
 #define FIXADDR_USER_END	__fix_to_virt(FIX_VDSO - 1)
 #endif



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  0:57   ` Rusty Russell
@ 2007-03-06  1:03     ` Zachary Amsden
  2007-03-06  1:11       ` Rusty Russell
  2007-03-06  1:14       ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 86+ messages in thread
From: Zachary Amsden @ 2007-03-06  1:03 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, Roland McGrath,
	Andi Kleen, virtualization

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

Rusty Russell wrote:
> On Tue, 2007-03-06 at 00:28 +1100, Rusty Russell wrote:
>   
>> On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:
>>     
>>> Subject: [patch] paravirt: VDSO page is essential
>>> From: Ingo Molnar <mingo@elte.hu>
>>>
>>> commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
>>> CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
>>> an essential component of Linux, and this change forces all of them to 
>>> use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
>>> handle the VDSO properly then it can work things around via the vdso=0 
>>> boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
>>> case, it is a basic taste issue: we DO NOT #ifdef around core features 
>>> like this!)
>>>       
>> I agree with the criticism, dislike the snarly comments, and disagree
>> with this patch.
>>     
>
> And my patch was pretty crack-induced too.  Sorry.
>
> I shouldn't have been thinking about using CONFIG options at all: we
> should simply disable the vdso if CONFIG_COMPAT_VDSO=y when we
> *actually* reserve top memory.
>
> This still need some work (doing that now), but do people like the idea?
>
> The current "vdso_disabled" flag merely disabled the ELF note, so it
> needs to be made a little stronger, to not set up the vdso at all.
>   

I had just sent this out for internal review...



[-- Attachment #2: compat-vdso-broken --]
[-- Type: text/plain, Size: 3617 bytes --]

COMPAT_VDSO is incompatible with PARAVIRT for most implementations, as they
must relocate the fixmap to make room for a hypervisor.  So allow COMPAT_VDSO
kernels to relocate the fixmap as well, just disable the VDSO if they do so.

Signed-off-by: Zachary Amsden <zach@vmware.com>

diff -r fad0910252d2 arch/i386/kernel/sysenter.c
--- a/arch/i386/kernel/sysenter.c	Mon Mar 05 15:24:04 2007 -0800
+++ b/arch/i386/kernel/sysenter.c	Mon Mar 05 15:27:31 2007 -0800
@@ -74,7 +74,12 @@ static struct page *syscall_pages[1];
 
 int __init sysenter_setup(void)
 {
-	void *syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
+	void *syscall_page;
+
+	if (!vdso_enabled)
+		return 0;
+
+	syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
 	syscall_pages[0] = virt_to_page(syscall_page);
 
 #ifdef CONFIG_COMPAT_VDSO
@@ -106,6 +111,11 @@ int arch_setup_additional_pages(struct l
 	struct mm_struct *mm = current->mm;
 	unsigned long addr;
 	int ret;
+
+	if (!vdso_enabled) {
+		current->mm->context.vdso = (void *)~0UL;
+		return 0;
+	}
 
 	down_write(&mm->mmap_sem);
 	addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
diff -r fad0910252d2 arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.c	Mon Mar 05 15:24:04 2007 -0800
+++ b/arch/i386/mm/pgtable.c	Mon Mar 05 16:06:31 2007 -0800
@@ -144,10 +144,8 @@ void set_pmd_pfn(unsigned long vaddr, un
 }
 
 static int fixmaps;
-#ifndef CONFIG_COMPAT_VDSO
 unsigned long __FIXADDR_TOP = 0xfffff000;
 EXPORT_SYMBOL(__FIXADDR_TOP);
-#endif
 
 void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t flags)
 {
@@ -174,11 +172,13 @@ void reserve_top_address(unsigned long r
 	printk(KERN_INFO "Reserving virtual address space above 0x%08x\n",
 	       (int)-reserve);
 #ifdef CONFIG_COMPAT_VDSO
-	BUG_ON(reserve != 0);
-#else
+	if (reserve != 0) {
+		printk(KERN_WARNING "Compat VDSO is incompatible with fixmap relocation - disabling VDSO\n");
+		vdso_enabled = 0;
+	}
+#endif
 	__FIXADDR_TOP = -reserve - PAGE_SIZE;
 	__VMALLOC_RESERVE += reserve;
-#endif
 }
 
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
diff -r fad0910252d2 include/asm-i386/elf.h
--- a/include/asm-i386/elf.h	Mon Mar 05 15:24:04 2007 -0800
+++ b/include/asm-i386/elf.h	Mon Mar 05 15:44:43 2007 -0800
@@ -137,7 +137,7 @@ extern int dump_task_extended_fpu (struc
 
 #ifdef CONFIG_COMPAT_VDSO
 # define VDSO_COMPAT_BASE	VDSO_HIGH_BASE
-# define VDSO_PRELINK		VDSO_HIGH_BASE
+# define VDSO_PRELINK		0xffffe000UL
 #else
 # define VDSO_COMPAT_BASE	VDSO_BASE
 # define VDSO_PRELINK		0
diff -r fad0910252d2 include/asm-i386/fixmap.h
--- a/include/asm-i386/fixmap.h	Mon Mar 05 15:24:04 2007 -0800
+++ b/include/asm-i386/fixmap.h	Mon Mar 05 15:59:30 2007 -0800
@@ -14,19 +14,6 @@
 #define _ASM_FIXMAP_H
 
 
-/* used by vmalloc.c, vsyscall.lds.S.
- *
- * Leave one empty page between vmalloc'ed areas and
- * the start of the fixmap.
- */
-#ifndef CONFIG_COMPAT_VDSO
-extern unsigned long __FIXADDR_TOP;
-#else
-#define __FIXADDR_TOP  0xfffff000
-#define FIXADDR_USER_START	__fix_to_virt(FIX_VDSO)
-#define FIXADDR_USER_END	__fix_to_virt(FIX_VDSO - 1)
-#endif
-
 #ifndef __ASSEMBLY__
 #include <linux/kernel.h>
 #include <asm/acpi.h>
@@ -35,6 +22,15 @@ extern unsigned long __FIXADDR_TOP;
 #ifdef CONFIG_HIGHMEM
 #include <linux/threads.h>
 #include <asm/kmap_types.h>
+#endif
+
+/* used by vmalloc.c, vsyscall.lds.S, elf.h, pgtable.c */
+extern unsigned long __FIXADDR_TOP;
+
+/* used for dumping VDSO to core files */
+#ifdef CONFIG_COMPAT_VDSO
+#define FIXADDR_USER_START     __fix_to_virt(FIX_VDSO)
+#define FIXADDR_USER_END       __fix_to_virt(FIX_VDSO - 1)
 #endif
 
 /*

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  1:03     ` Zachary Amsden
@ 2007-03-06  1:11       ` Rusty Russell
  2007-03-06  1:14       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 86+ messages in thread
From: Rusty Russell @ 2007-03-06  1:11 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, Roland McGrath,
	Andi Kleen, virtualization

On Mon, 2007-03-05 at 17:03 -0800, Zachary Amsden wrote:
> Rusty Russell wrote:
> > This still need some work (doing that now), but do people like the idea?
> >
> I had just sent this out for internal review...

Spooky!

I was just testing with lguest, but I'll do so with your patch instead.

Thanks,
Rusty.



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  1:03     ` Zachary Amsden
  2007-03-06  1:11       ` Rusty Russell
@ 2007-03-06  1:14       ` Jeremy Fitzhardinge
  2007-03-06  1:51         ` Zachary Amsden
  2007-03-06  7:35         ` [patch] paravirt: VDSO page is essential Ingo Molnar
  1 sibling, 2 replies; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06  1:14 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Rusty Russell, virtualization, Andrew Morton, Ingo Molnar,
	Roland McGrath, linux-kernel

Zachary Amsden wrote:
> Rusty Russell wrote:
>> On Tue, 2007-03-06 at 00:28 +1100, Rusty Russell wrote:
>>  
>>> On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:
>>>    
>>>> Subject: [patch] paravirt: VDSO page is essential
>>>> From: Ingo Molnar <mingo@elte.hu>
>>>>
>>>> commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO
>>>> for CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the
>>>> VDSO is an essential component of Linux, and this change forces all
>>>> of them to use int $0x80 - including sane ones like KVM. (If a
>>>> hypervisor does not handle the VDSO properly then it can work
>>>> things around via the vdso=0 boot option. Or CONFIG_PARAVIRT should
>>>> not have been merged. But in any case, it is a basic taste issue:
>>>> we DO NOT #ifdef around core features like this!)
>>>>       
>>> I agree with the criticism, dislike the snarly comments, and disagree
>>> with this patch.
>>>     
>>
>> And my patch was pretty crack-induced too.  Sorry.
>>
>> I shouldn't have been thinking about using CONFIG options at all: we
>> should simply disable the vdso if CONFIG_COMPAT_VDSO=y when we
>> *actually* reserve top memory.
>>
>> This still need some work (doing that now), but do people like the idea?
>>
>> The current "vdso_disabled" flag merely disabled the ELF note, so it
>> needs to be made a little stronger, to not set up the vdso at all.
>>   
>
> I had just sent this out for internal review...
>

I think Jan's approach is better if it works (since there's no
compromise), but this is better if you want to get something working in
the near term.

    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  1:14       ` Jeremy Fitzhardinge
@ 2007-03-06  1:51         ` Zachary Amsden
  2007-03-06  1:53           ` Jeremy Fitzhardinge
  2007-03-06  7:35         ` [patch] paravirt: VDSO page is essential Ingo Molnar
  1 sibling, 1 reply; 86+ messages in thread
From: Zachary Amsden @ 2007-03-06  1:51 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Rusty Russell, virtualization, Andrew Morton, Ingo Molnar,
	Roland McGrath, linux-kernel, Jan Beulich

Jeremy Fitzhardinge wrote:
> I think Jan's approach is better if it works (since there's no
> compromise), but this is better if you want to get something working in
> the near term.
>   

Is Jan dynamically relinking the vdso at runtime?  Because that is what 
we will need.  Adding support for COMPAT_VDSO creates a major 
performance problem for us.  We need a way to force the VDSO to be 
enabled with a boot parameter.

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  1:51         ` Zachary Amsden
@ 2007-03-06  1:53           ` Jeremy Fitzhardinge
  2007-03-06  8:19             ` Xen & VMI? Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06  1:53 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Rusty Russell, virtualization, Andrew Morton, Ingo Molnar,
	Roland McGrath, linux-kernel, Jan Beulich

Zachary Amsden wrote:
> Jeremy Fitzhardinge wrote:
>> I think Jan's approach is better if it works (since there's no
>> compromise), but this is better if you want to get something working in
>> the near term.
>>   
>
> Is Jan dynamically relinking the vdso at runtime?  
Yes.  His patch adds a runtime relocation for the vdso.
http://marc.theaimsgroup.com/?l=xen-devel&m=117309332600075&w=2

    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  1:14       ` Jeremy Fitzhardinge
  2007-03-06  1:51         ` Zachary Amsden
@ 2007-03-06  7:35         ` Ingo Molnar
  2007-03-06  7:42           ` Zachary Amsden
  1 sibling, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06  7:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, Rusty Russell, virtualization, Andrew Morton,
	Roland McGrath, linux-kernel


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> > I had just sent this out for internal review...
> 
> I think Jan's approach is better if it works (since there's no 
> compromise), but this is better if you want to get something working 
> in the near term.

yeah. (plus my patches of course that remove the current papering-over 
hackery and restores COMPAT_VDSO.)

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  7:35         ` [patch] paravirt: VDSO page is essential Ingo Molnar
@ 2007-03-06  7:42           ` Zachary Amsden
  2007-03-06  7:50             ` Ingo Molnar
  2007-03-06 18:48             ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 86+ messages in thread
From: Zachary Amsden @ 2007-03-06  7:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Rusty Russell, virtualization,
	Andrew Morton, Roland McGrath, linux-kernel

Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>>> I had just sent this out for internal review...
>>>       
>> I think Jan's approach is better if it works (since there's no 
>> compromise), but this is better if you want to get something working 
>> in the near term.
>>     
>
> yeah. (plus my patches of course that remove the current papering-over 
> hackery and restores COMPAT_VDSO.)
>   

Yes, I don't have a problem with your patch, I just wish I had been cc'd 
on it.  Fixing this is rather tricky, but I believe no strange build 
magic is required, it can be done in kernel init code.  Still building 
my SUSE 9.0 guest to test.  SUSE 9.0 is one of those that requires 
COMPAT_VDSO, yes?

Thanks,

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  7:42           ` Zachary Amsden
@ 2007-03-06  7:50             ` Ingo Molnar
  2007-03-06 18:48             ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06  7:50 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Jeremy Fitzhardinge, Rusty Russell, virtualization,
	Andrew Morton, Roland McGrath, linux-kernel


* Zachary Amsden <zach@vmware.com> wrote:

> > yeah. (plus my patches of course that remove the current 
> > papering-over hackery and restores COMPAT_VDSO.)
> 
> Yes, I don't have a problem with your patch, I just wish I had been 
> cc'd on it. [...]

(i Cc:-ed you to the other ones - i simply forgot and bounced it to you 
a few hours down the line - sorry!)

> [...] Fixing this is rather tricky, but I believe no strange build 
> magic is required, it can be done in kernel init code.  Still building 
> my SUSE 9.0 guest to test.  SUSE 9.0 is one of those that requires 
> COMPAT_VDSO, yes?

yeah, and a handful of other ones. It depends on the glibc version: 
early vdso glibs were buggy and assumed a few things about the vdso, so 
they would segfault on the new-style vdso which is fully relocatable 
(and hence mergable into the vma space, randomizable, etc.).

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Xen & VMI?
  2007-03-06  1:53           ` Jeremy Fitzhardinge
@ 2007-03-06  8:19             ` Ingo Molnar
  2007-03-06  8:37               ` Gerd Hoffmann
  2007-03-06  9:07               ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06  8:19 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, Rusty Russell, virtualization, Andrew Morton,
	Linus Torvalds, Roland McGrath, Andi Kleen, linux-kernel,
	Jan Beulich


btw., while we have everyone on the phone and talking ;) Technologically 
it would save us a whole lot of trouble in Linux if 'external' 
hypervisors could standardize around a single ABI - such as VMI. Is 
there any deep reason why Xen couldnt use VMI to talk to Linux? I 
suspect a range of VMI vectors could be set aside for Xen's dom0 (and 
other) APIs that have no current VMI equivalent - if there's broad 
agreement on the current 60+ base VMI vectors that center around basic 
x86 CPU capabilities - which make up the largest portion of our 
paravirtualization complexity. Pipe dream?

there are already 5 major hypervisors we are going to support (in 
alphabetical order):

 - KVM
 - lguest
 - Windows
 - VMWare
 - Xen

the QA matrix is gonna be a _mess_. Okay, lguest and KVM is special 
because both the client and the server side is in the same source code, 
so the ABI [if any] is alot easier to manage. That still leaves another 
three...

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-05 22:58       ` Roland McGrath
  2007-03-05 23:03         ` Jeremy Fitzhardinge
@ 2007-03-06  8:34         ` Ingo Molnar
  2007-03-06  9:13           ` Roland McGrath
  1 sibling, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06  8:34 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Jeremy Fitzhardinge, Andi Kleen, Andrew Morton, linux-kernel,
	Rusty Russell, Jan Beulich


* Roland McGrath <roland@redhat.com> wrote:

> > Jan Beulich just posted a patch to do just this - relocate the 
> > vdso's ELF header.  If that's all that's really required to keep 
> > COMPAT_VDSO viable under PARAVIRT, then it seems like the way to go.
> 
> I found 
> http://marc.theaimsgroup.com/?l=xen-devel&m=117309332600075&w=2 and 
> that must be the one you meant.  The ELF-grokking form of that is 
> exactly what I had in mind.  The "find relocs with cmp" scheme is 
> pretty silly, but also works fine.  It trades tweaky ELF knowledge 
> with tweaky fragile build methods, but it's all about the same to me.

this looks good to me too in principle, the #else branch. But the actual 
implementation will have to be redone quite a bit i fear. Some details: 
relocate_vdso() needs some major coding style cleanups. This bit:

-# define VDSO_PRELINK          VDSO_HIGH_BASE
+# ifndef CONFIG_XEN
+#  define VDSO_PRELINK         VDSO_HIGH_BASE
+# else
+#  define VDSO_PRELINK         (0UL - FIX_VDSO * PAGE_SIZE)
+# endif

should be Kconfig driven, not #ifdef driven, due to cleanliness and also 
because lguest wants to have the same thing. Plus:

+#if defined(CONFIG_XEN) && defined(CONFIG_COMPAT_VDSO)

i'd just make this depend on CONFIG_COMPAT_VDSO, always. Same here:

+#if defined(CONFIG_XEN) && defined(CONFIG_COMPAT_VDSO)
+static void __init relocate_vdso

just make this driven in the normal CONFIG_COMPAT_VDSO case too - even 
though we 'prelink' the VDSO to the usual address - we better run 
through the same code all the time and reduce the number of variants as 
much as possible.

furthermore, there should be a paravirt_ops method to chose the 
relocation address, unless i'm missing something. On the native kernel 
that address will default to 0xffffe000. (if CONFIG_COMPAT_VDSO is 
selected)

this way there will only be two main variants to worry about: compat and 
modern (which is the current status quo anyway), instead of 4-5 
variants.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  8:19             ` Xen & VMI? Ingo Molnar
@ 2007-03-06  8:37               ` Gerd Hoffmann
  2007-03-06  8:48                 ` Zachary Amsden
  2007-03-06  8:52                 ` Ingo Molnar
  2007-03-06  9:07               ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 86+ messages in thread
From: Gerd Hoffmann @ 2007-03-06  8:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel

Ingo Molnar wrote:
> btw., while we have everyone on the phone and talking ;) Technologically 
> it would save us a whole lot of trouble in Linux if 'external' 
> hypervisors could standardize around a single ABI - such as VMI. Is 
> there any deep reason why Xen couldnt use VMI to talk to Linux? I 
> suspect a range of VMI vectors could be set aside for Xen's dom0 (and 
> other) APIs that have no current VMI equivalent - if there's broad 
> agreement on the current 60+ base VMI vectors that center around basic 
> x86 CPU capabilities - which make up the largest portion of our 
> paravirtualization complexity. Pipe dream?

IIRC there was some proof-of-concept at least for xen guests.

> there are already 5 major hypervisors we are going to support (in 
> alphabetical order):
> 
>  - KVM
>  - lguest
>  - Windows
>  - VMWare
>  - Xen
> 
> the QA matrix is gonna be a _mess_.

I fail to see how xen-via-vmirom instead of xen-via-paravirt_ops reduces
the QA effort.  You still have 5 Hypervisors you have to test against.

cheers,
  Gerd

-- 
Gerd Hoffmann <kraxel@suse.de>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  8:37               ` Gerd Hoffmann
@ 2007-03-06  8:48                 ` Zachary Amsden
  2007-03-06  8:52                 ` Ingo Molnar
  1 sibling, 0 replies; 86+ messages in thread
From: Zachary Amsden @ 2007-03-06  8:48 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Ingo Molnar, Jeremy Fitzhardinge, virtualization, Jan Beulich,
	Andrew Morton, Linus Torvalds, Roland McGrath, linux-kernel,
	Rusty Russell

Gerd Hoffmann wrote:
> I fail to see how xen-via-vmirom instead of xen-via-paravirt_ops reduces
> the QA effort.  You still have 5 Hypervisors you have to test against.
>   

You've also got a frozen, multi-vendor binary interface, which was the 
straw which broke our original intentions for VMI.  Try as you can, you 
cannot get around this, and there is just no way that list of players 
are going to remain friendly and happy with each other (embrace, extend, 
conquer, anyone?).  There are already differences with VMI / lhype and 
Xen (shadow vs. direct page tables), and also with KVM vs VMI / Xen / 
lhype (required HVM vs optional HVM).  Like it or not, incompatible 
changes will happen, and with no official standard body to mediate so 
that it can't be dominated by one party, I don't see how it can work.  
Which is why I now prefer the flexible in kernel API of paravirt-ops.

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  8:37               ` Gerd Hoffmann
  2007-03-06  8:48                 ` Zachary Amsden
@ 2007-03-06  8:52                 ` Ingo Molnar
  2007-03-06  9:03                   ` Zachary Amsden
                                     ` (2 more replies)
  1 sibling, 3 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06  8:52 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jeremy Fitzhardinge, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel


* Gerd Hoffmann <kraxel@suse.de> wrote:

> Ingo Molnar wrote:
> > btw., while we have everyone on the phone and talking ;) Technologically 
> > it would save us a whole lot of trouble in Linux if 'external' 
> > hypervisors could standardize around a single ABI - such as VMI. Is 
> > there any deep reason why Xen couldnt use VMI to talk to Linux? I 
> > suspect a range of VMI vectors could be set aside for Xen's dom0 (and 
> > other) APIs that have no current VMI equivalent - if there's broad 
> > agreement on the current 60+ base VMI vectors that center around basic 
> > x86 CPU capabilities - which make up the largest portion of our 
> > paravirtualization complexity. Pipe dream?
> 
> IIRC there was some proof-of-concept at least for xen guests.

yes - but de-facto contradicted by the Xen paravirt_ops patches sent to 
lkml ;)

> > there are already 5 major hypervisors we are going to support (in 
> > alphabetical order):
> > 
> >  - KVM
> >  - lguest
> >  - Windows
> >  - VMWare
> >  - Xen
> > 
> > the QA matrix is gonna be a _mess_.
> 
> I fail to see how xen-via-vmirom instead of xen-via-paravirt_ops 
> reduces the QA effort.  You still have 5 Hypervisors you have to test 
> against.

yes, just like we have thousands of separate PC boards to support. But 
as long as the basic ABI is the same, the QA effort on the Linux kernel 
side is alot more focused. (Distros still have 18446744073709551616 
total combinations to QA, and have to make an educated guess to reduce 
that to a more manageable number.)

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  8:52                 ` Ingo Molnar
@ 2007-03-06  9:03                   ` Zachary Amsden
  2007-03-06  9:10                     ` Ingo Molnar
  2007-03-06  9:15                   ` Gerd Hoffmann
  2007-03-06 19:46                   ` Chris Wright
  2 siblings, 1 reply; 86+ messages in thread
From: Zachary Amsden @ 2007-03-06  9:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gerd Hoffmann, Jeremy Fitzhardinge, virtualization, Jan Beulich,
	Andrew Morton, Linus Torvalds, Roland McGrath, linux-kernel

Ingo Molnar wrote:
>>> there are already 5 major hypervisors we are going to support (in 
>>> alphabetical order):
>>>
>>>  - KVM
>>>  - lguest
>>>  - Windows
>>>  - VMWare
>>>  - Xen
>>>
>>> the QA matrix is gonna be a _mess_.
>>>       
>> I fail to see how xen-via-vmirom instead of xen-via-paravirt_ops 
>> reduces the QA effort.  You still have 5 Hypervisors you have to test 
>> against.
>>     
>
> yes, just like we have thousands of separate PC boards to support. But 
> as long as the basic ABI is the same, the QA effort on the Linux kernel 
> side is alot more focused. (Distros still have 18446744073709551616 
> total combinations to QA, and have to make an educated guess to reduce 
> that to a more manageable number.)
>   

But hardware PC boards don't do anything as remotely complicate as 
changing the semantics required for correctness in you MMU 
implementation.  There might be some weird MTRR and caching things, 
which are a property of the architecture, and which all modern boards 
have in common.  You don't have completely diverse implementation 
properties like shadow vs direct vs native page tables.  Or hardware 
virtualization vs direct CPL raised execution.  You simply can't test 
this diversity by making an educated guess, because in this case, 
something will always be omitted.  The test matrix has to be raised, and 
if that is a problem, the burden of proper testing shifted onto the 
manufacturers, just as you would with some new PC board or new 
architecture that wanted to be Linux friendly but was radically 
different in some way.

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  8:19             ` Xen & VMI? Ingo Molnar
  2007-03-06  8:37               ` Gerd Hoffmann
@ 2007-03-06  9:07               ` Jeremy Fitzhardinge
  2007-03-06  9:26                 ` Ingo Molnar
  1 sibling, 1 reply; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06  9:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Rusty Russell, virtualization, Andrew Morton,
	Linus Torvalds, Roland McGrath, Andi Kleen, linux-kernel,
	Jan Beulich

Ingo Molnar wrote:
> btw., while we have everyone on the phone and talking ;) Technologically 
> it would save us a whole lot of trouble in Linux if 'external' 
> hypervisors could standardize around a single ABI - such as VMI. Is 
> there any deep reason why Xen couldnt use VMI to talk to Linux? I 
> suspect a range of VMI vectors could be set aside for Xen's dom0 (and 
> other) APIs that have no current VMI equivalent - if there's broad 
> agreement on the current 60+ base VMI vectors that center around basic 
> x86 CPU capabilities - which make up the largest portion of our 
> paravirtualization complexity. Pipe dream?

Well, we went around this about six months ago, and decided the best way
forward is the current paravirt_ops approach. 

The Xen and VMI interfaces are quite different, and have different
design goals.  VMI is fairly low-level, and is approximately a software
implementation of VT/SVM with a couple of extra bits, whereas the Xen
interface is intended to be higher level, with the expectation that the
guest cooperates more with its virtualization.  They have similarities
by necessity, but they have some fairly basic differences.

You could come up with some shim layer which makes the two interfaces
appear similar, and you could spell the name of that shim "VMI".  Or you
could call it "paravirt_ops", which is the name we chose.  And you could
implement the interface to that layer as a binary ABI, or you could make
it a normal source-level Linux kernel interface, which is what we chose
to do.

Either way, its still one of native hardware, VMI/ESX, Xen or something
else under the interface layer, and you need to test for each case
regardless of what the interface looks like.


    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  9:03                   ` Zachary Amsden
@ 2007-03-06  9:10                     ` Ingo Molnar
  0 siblings, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06  9:10 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Gerd Hoffmann, Jeremy Fitzhardinge, virtualization, Jan Beulich,
	Andrew Morton, Linus Torvalds, Roland McGrath, linux-kernel


* Zachary Amsden <zach@vmware.com> wrote:

> > > reduces the QA effort.  You still have 5 Hypervisors you have to 
> > > test against.
> >
> > yes, just like we have thousands of separate PC boards to support. 
> > But as long as the basic ABI is the same, the QA effort on the Linux 
> > kernel side is alot more focused. (Distros still have 
> > 18446744073709551616 total combinations to QA, and have to make an 
> > educated guess to reduce that to a more manageable number.)
> 
> But hardware PC boards don't do anything as remotely complicate as 
> changing the semantics required for correctness in you MMU 
> implementation. [...]

ugh, PC boards are actually far worse and far more diverse than any 
variances between hypervisors, but i digress.

anyway, my point stands: the Linux kernel is significantly more 
maintainable and easier to QA if it has only a single 'external' 
hypervisor ABI to worry about - and that might as well be VMI. This is a 
really obvious point, i expected the discussion to center more around 
the specifics of such a move ;-)

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  8:34         ` Ingo Molnar
@ 2007-03-06  9:13           ` Roland McGrath
  2007-03-06  9:14             ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 86+ messages in thread
From: Roland McGrath @ 2007-03-06  9:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Andi Kleen, Andrew Morton, linux-kernel,
	Rusty Russell, Jan Beulich

> -# define VDSO_PRELINK          VDSO_HIGH_BASE
> +# ifndef CONFIG_XEN
> +#  define VDSO_PRELINK         VDSO_HIGH_BASE
> +# else
> +#  define VDSO_PRELINK         (0UL - FIX_VDSO * PAGE_SIZE)
> +# endif
> 
> should be Kconfig driven, not #ifdef driven, due to cleanliness and also 
> because lguest wants to have the same thing. Plus:

In fact, with the relocate_vdso stuff it doesn't matter what VDSO_PRELINK
is at compile time.  It saves the small amount of startup cost if it
matches the runtime address, but that is probably not noticeable.

> furthermore, there should be a paravirt_ops method to chose the 
> relocation address, unless i'm missing something. On the native kernel 
> that address will default to 0xffffe000. (if CONFIG_COMPAT_VDSO is 
> selected)

For everything else to work, it needs to be set by changing __FIXADDR_TOP,
which seems to be done by calling reserve_top_address early enough.
It looks like that needs to be properly tied into paravirt_ops somehow.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  9:13           ` Roland McGrath
@ 2007-03-06  9:14             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06  9:14 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Ingo Molnar, Andi Kleen, Andrew Morton, linux-kernel,
	Rusty Russell, Jan Beulich

Roland McGrath wrote:
> For everything else to work, it needs to be set by changing __FIXADDR_TOP,
> which seems to be done by calling reserve_top_address early enough.
> It looks like that needs to be properly tied into paravirt_ops somehow.
>   

The startup code for whatever hypervisor you're running makes a call to
reserve_top to reserve the appropriate amount of address space.  This
happens very early.

    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  8:52                 ` Ingo Molnar
  2007-03-06  9:03                   ` Zachary Amsden
@ 2007-03-06  9:15                   ` Gerd Hoffmann
  2007-03-06  9:34                     ` Ingo Molnar
  2007-03-06  9:55                     ` Avi Kivity
  2007-03-06 19:46                   ` Chris Wright
  2 siblings, 2 replies; 86+ messages in thread
From: Gerd Hoffmann @ 2007-03-06  9:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel

Ingo Molnar wrote:
> * Gerd Hoffmann <kraxel@suse.de> wrote:
> 
>>> [using vmi rom]
>> IIRC there was some proof-of-concept at least for xen guests.
> 
> yes - but de-facto contradicted by the Xen paravirt_ops patches sent to 
> lkml ;)

Yep.  The fact that it is possible to do that doesn't imply that it is
the best solution.

Oh, and btw:  What was the reason why kvm paravirtualization doesn't use
the vmi interface?

>>> the QA matrix is gonna be a _mess_.
>> I fail to see how xen-via-vmirom instead of xen-via-paravirt_ops 
>> reduces the QA effort.  You still have 5 Hypervisors you have to test 
>> against.
> 
> yes, just like we have thousands of separate PC boards to support. But 
> as long as the basic ABI is the same, the QA effort on the Linux kernel 
> side is alot more focused.

xen and vmware are still two very different hypervisors from the memory
mangement point of view.  I doubt moving the abstraction line within the
linux kernel from paravirt_ops to vmi makes QA easier.

cheers,
  Gerd

-- 
Gerd Hoffmann <kraxel@suse.de>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  9:07               ` Jeremy Fitzhardinge
@ 2007-03-06  9:26                 ` Ingo Molnar
  2007-03-06 16:42                   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06  9:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, Rusty Russell, virtualization, Andrew Morton,
	Linus Torvalds, Roland McGrath, Andi Kleen, linux-kernel,
	Jan Beulich


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> You could come up with some shim layer which makes the two interfaces 
> appear similar, and you could spell the name of that shim "VMI".  Or 
> you could call it "paravirt_ops", which is the name we chose.  And you 
> could implement the interface to that layer as a binary ABI, or you 
> could make it a normal source-level Linux kernel interface, which is 
> what we chose to do.

i think you are missing my point.

paravirt_ops is a Linux-internal abstraction that tries to make our life 
easier but it has no relevance whatsoever to an external hypervisor - be 
that Xen, VMWare/ESX or Windows/Longhorn.

What matters is the /ABI/ that the hypervisor uses to talk to a Linux 
guest. In the VMWare/ESX case that's VMI. In the Xen case that's the 
hypercall page call-table ABI or the legacy int $0x82 ABI.

My suggestion would be for Linux to make only a /single/ external ABI 
promise: VMI. (and we can extend it with higher-level paravirt ops, 
etc.)

paravirt_ops has ZERO relevance here... Anyone who suggests that 
paravirt_ops somehow magically hides the ABIs that are behind it (and 
its effects on Linux) is smoking something real funny ;-)

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  9:15                   ` Gerd Hoffmann
@ 2007-03-06  9:34                     ` Ingo Molnar
  2007-03-06 10:15                       ` Gerd Hoffmann
  2007-03-06  9:55                     ` Avi Kivity
  1 sibling, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06  9:34 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jeremy Fitzhardinge, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel


* Gerd Hoffmann <kraxel@suse.de> wrote:

> Oh, and btw: What was the reason why kvm paravirtualization doesn't 
> use the vmi interface?

cleanliness and performance: KVM doesnt need any artificial indirection. 
IMO the GPL-ed ROM portion of VMI was a bad idea to begin with. Also, 
lguest and KVM is Linux-internal, so there's a natural match between the 
guest and the host APIs.

> > yes, just like we have thousands of separate PC boards to support. 
> > But as long as the basic ABI is the same, the QA effort on the Linux 
> > kernel side is alot more focused.
> 
> xen and vmware are still two very different hypervisors from the 
> memory mangement point of view.  I doubt moving the abstraction line 
> within the linux kernel from paravirt_ops to vmi makes QA easier.

well, the VMI patches got into Linux with the claim that it's also 
useful for Xen. So that claim was ... not actually true?

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  9:15                   ` Gerd Hoffmann
  2007-03-06  9:34                     ` Ingo Molnar
@ 2007-03-06  9:55                     ` Avi Kivity
  2007-03-06 10:23                       ` Gerd Hoffmann
  1 sibling, 1 reply; 86+ messages in thread
From: Avi Kivity @ 2007-03-06  9:55 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Ingo Molnar, Jeremy Fitzhardinge, virtualization, Jan Beulich,
	Andrew Morton, Linus Torvalds, Roland McGrath, linux-kernel

Gerd Hoffmann wrote:
> Oh, and btw:  What was the reason why kvm paravirtualization doesn't use
> the vmi interface?
>
>   

There actually was proof of concept code to do just that (by Anthony 
Liguori).  For Linux, I feel paravirt_ops is superior as we can extend 
it if something is missing.  If VMI is adopted by non-Linux guests, we 
may support it as a quick way to add paravirt support for those guests.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  9:34                     ` Ingo Molnar
@ 2007-03-06 10:15                       ` Gerd Hoffmann
  2007-03-06 10:26                         ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Gerd Hoffmann @ 2007-03-06 10:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel

Ingo Molnar wrote:
> * Gerd Hoffmann <kraxel@suse.de> wrote:
> 
>> Oh, and btw: What was the reason why kvm paravirtualization doesn't 
>> use the vmi interface?
> 
> cleanliness and performance: KVM doesnt need any artificial indirection.

Xen doesn't need it either.

> IMO the GPL-ed ROM portion of VMI was a bad idea to begin with.

So why do you want xen use vmi then?

> well, the VMI patches got into Linux with the claim that it's also 
> useful for Xen. So that claim was ... not actually true?

As mentioned there was a proof-of-concept VMI ROM done by vmware.  As
far I know it translated the VMI ROM interface calls into xen hypercalls
somehow, Zach probably has more details.

So in the end you would still have two different hypervisor ABI's, the
VMI ROM just hides that.

cheers,
  Gerd

-- 
Gerd Hoffmann <kraxel@suse.de>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  9:55                     ` Avi Kivity
@ 2007-03-06 10:23                       ` Gerd Hoffmann
  2007-03-06 10:31                         ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Gerd Hoffmann @ 2007-03-06 10:23 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Jeremy Fitzhardinge, virtualization, Jan Beulich,
	Andrew Morton, Linus Torvalds, Roland McGrath, linux-kernel

Avi Kivity wrote:
> Gerd Hoffmann wrote:
>> Oh, and btw:  What was the reason why kvm paravirtualization doesn't use
>> the vmi interface?
>>
>>   
> 
> There actually was proof of concept code to do just that (by Anthony
> Liguori).  For Linux, I feel paravirt_ops is superior as we can extend
> it if something is missing.

Thanks.  That is actually the point I want make: although it is
*possible* to do that via VMI ROM, doing that using paravirt_ops is
*better* (no matter whenever the hypervisor is xen or kvm).  Thats why
we actually have it.  The very same discussion a couple months ago came
to  exactly that conclusion.

cheers,
  Gerd

-- 
Gerd Hoffmann <kraxel@suse.de>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 10:15                       ` Gerd Hoffmann
@ 2007-03-06 10:26                         ` Ingo Molnar
  2007-03-06 11:04                           ` Gerd Hoffmann
  0 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06 10:26 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jeremy Fitzhardinge, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel


* Gerd Hoffmann <kraxel@suse.de> wrote:

> > cleanliness and performance: KVM doesnt need any artificial 
> > indirection.
> 
> Xen doesn't need it either.
> 
> > IMO the GPL-ed ROM portion of VMI was a bad idea to begin with.
> 
> So why do you want xen use vmi then?

due to the other argument i listed:

|| Also, lguest and KVM is Linux-internal, so there's a natural match 
|| between the guest and the host APIs.

It's a basic kernel maintainance issue: lguest/KVM and Linux host and 
guest will co-evolve foward in a natural way as they are in essence 
Linux-internal technologies. They /will/ harmonize. There is no such 
guarantee with Xen/VMWare/etc. (which are distinctly separate 
technologies) - so any ABIs towards them could become (and are already 
becoming) a drag and distraction.

> > well, the VMI patches got into Linux with the claim that it's also 
> > useful for Xen. So that claim was ... not actually true?
> 
> As mentioned there was a proof-of-concept VMI ROM done by vmware.  As 
> far I know it translated the VMI ROM interface calls into xen 
> hypercalls somehow, Zach probably has more details.
> 
> So in the end you would still have two different hypervisor ABI's, the 
> VMI ROM just hides that.

oh, but that way i have cleverly pushed the problem out of Linux and 
into the VMI-ROM's domain ;) Which is all i care about.

really, one of my jobs as a maintainer is to keep crap out of Linux (we 
are capable of adding enough crap ourselves ;-). Having multiple, 
overlapping, technologically redundant ABIs between /software/, embedded 
into the heart of the kernel (paravirt ops nonwithstanding), for 
perpetuity, is one such type of crap. It is in fact the kind of crap i'm 
/most/ worried about, because it sticks forever.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 10:23                       ` Gerd Hoffmann
@ 2007-03-06 10:31                         ` Ingo Molnar
  0 siblings, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06 10:31 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Avi Kivity, Jeremy Fitzhardinge, virtualization, Jan Beulich,
	Andrew Morton, Linus Torvalds, Roland McGrath, linux-kernel


* Gerd Hoffmann <kraxel@suse.de> wrote:

> Thanks.  That is actually the point I want make: although it is 
> *possible* to do that via VMI ROM, doing that using paravirt_ops is 
> *better* (no matter whenever the hypervisor is xen or kvm). [...]

but paravirt_ops is not an ABI ... nor will it ever become one. So the 
fact remains: the extra ABIs towards external hypervisors are quite an 
issue.

> [...] The very same discussion a couple months ago came to exactly 
> that conclusion.

uhm, that discussion mostly only involved people interested in external 
hypervisors, not those poor souls who have to fix up the mess in the end 
within Linux ;-) [ I am very sure all foxes would come to the happy 
conclusion that there is no need for any stinkin' lock on the barn ;-) ]

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 10:26                         ` Ingo Molnar
@ 2007-03-06 11:04                           ` Gerd Hoffmann
  2007-03-06 11:59                             ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Gerd Hoffmann @ 2007-03-06 11:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel

Ingo Molnar wrote:
>>> IMO the GPL-ed ROM portion of VMI was a bad idea to begin with.
>> So why do you want xen use vmi then?
> 
> due to the other argument i listed:
> 
> || Also, lguest and KVM is Linux-internal, so there's a natural match 
> || between the guest and the host APIs.
>
> It's a basic kernel maintainance issue: lguest/KVM and Linux host and 
> guest will co-evolve foward in a natural way as they are in essence 
> Linux-internal technologies. They /will/ harmonize.

Yes for lguest, that is a linux-only ground for play and research and
that will most likely not change in near future.

It is complete bullshit for kvm.  You can run almost anything as guest
in kvm, and I certainly wouldn't be surprised if we see other operating
systems start using the kvm paravirt interface.

> There is no such 
> guarantee with Xen/VMWare/etc. (which are distinctly separate 
> technologies) - so any ABIs towards them could become (and are already 
> becoming) a drag and distraction.

We'll certainly need some stable ABI for kvm too, so you can mix kernel
versions on host and guest.  I really can't see why do you think kvm is
special in any way (except that it is your favorite toy at the moment).

>> So in the end you would still have two different hypervisor ABI's, the 
>> VMI ROM just hides that.
> 
> oh, but that way i have cleverly pushed the problem out of Linux and 
> into the VMI-ROM's domain ;) Which is all i care about.

Fine, so lets move kvm paravirtualitzation into vmi too (proof of
concept code by Anthony Liguori exists) and kill one more item on the
(linux) QA test matrix?  (just following your arguments, not that I'm
confident it would actually help reducing QA effort).

cheers,
  Gerd

-- 
Gerd Hoffmann <kraxel@suse.de>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 11:04                           ` Gerd Hoffmann
@ 2007-03-06 11:59                             ` Ingo Molnar
  2007-03-06 12:34                               ` Gerd Hoffmann
                                                 ` (2 more replies)
  0 siblings, 3 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06 11:59 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jeremy Fitzhardinge, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel


* Gerd Hoffmann <kraxel@suse.de> wrote:

> > || Also, lguest and KVM is Linux-internal, so there's a natural 
> > || match between the guest and the host APIs.
> >
> > It's a basic kernel maintainance issue: lguest/KVM and Linux host 
> > and guest will co-evolve foward in a natural way as they are in 
> > essence Linux-internal technologies. They /will/ harmonize.
> 
> Yes for lguest, that is a linux-only ground for play and research and 
> that will most likely not change in near future.

i actually disagree here: lguest is growing up and will inevitably have 
to spend a small amount of its attention on the world beyond the 
playground too - even if Rusty doesnt want that =B-) That will happen 
de-facto the first time a distro ships an lguest-capable kernel, so it's 
unavoidable.

> It is complete bullshit for kvm.  You can run almost anything as guest 
> in kvm, and I certainly wouldn't be surprised if we see other 
> operating systems start using the kvm paravirt interface.

legacy support has to be ensured, but it does not hugely matter in terms 
of the designing our future. What matters is that once we change some 
fundamental aspect of Linux, we can adopt lguest/KVM immediately. With 
'external' hypervisors there is no such compulsory forward motion, and 
my fear is that by giving them ABI interfaces to the innards of the 
Linux guest they will just stick with those ABIs - and worse, drag Linux 
along with them. (because distros will be forced by the legacy 
assumptions to carry those ABIs along.)

> > There is no such guarantee with Xen/VMWare/etc. (which are 
> > distinctly separate technologies) - so any ABIs towards them could 
> > become (and are already becoming) a drag and distraction.
> 
> We'll certainly need some stable ABI for kvm too, so you can mix 
> kernel versions on host and guest.  I really can't see why do you 
> think kvm is special in any way [...]

lguest/KVM is fundamentally special because its future evolution is 
naturally aligned with that of Linux. Sure, legacies will have to be 
taken care of (just like Linux supports old system-calls and even old 
driver APIs in some circumstances), but there is no danger of KVM 
staying in legacy land forever. With Xen and VMWare i see no guarantee 
at all that Linux wont be hindered by their legacies (or by any plain 
diverging approaches) forever.

so for example, if we change some fundamental thing that can be 
implemented via the legacy ABI but only slowly, that's not a problem 
because new-lguest/new-KVM will use the new approach, so there's a 
straightforward technology-based migratory path out of the legacy. But 
if Xen or VMWare were to stick with that legacy ABI forever (for 
whatever reason), we couldnt solve that situation on the Linux side at 
all, via technological measures.

> >> So in the end you would still have two different hypervisor ABI's, 
> >> the VMI ROM just hides that.
> > 
> > oh, but that way i have cleverly pushed the problem out of Linux and 
> > into the VMI-ROM's domain ;) Which is all i care about.
> 
> Fine, so lets move kvm paravirtualitzation into vmi too (proof of 
> concept code by Anthony Liguori exists) and kill one more item on the 
> (linux) QA test matrix?  (just following your arguments, not that I'm 
> confident it would actually help reducing QA effort).

yes - although obviously a KVM Linux guest does not need such an 
interface - but it's a nice proof of concept to integrate other guest 
OSs into KVM.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 11:59                             ` Ingo Molnar
@ 2007-03-06 12:34                               ` Gerd Hoffmann
  2007-03-06 15:03                               ` Anthony Liguori
  2007-03-06 16:27                               ` Jeremy Fitzhardinge
  2 siblings, 0 replies; 86+ messages in thread
From: Gerd Hoffmann @ 2007-03-06 12:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel

  Hi,

> legacy support has to be ensured, but it does not hugely matter in terms 
> of the designing our future. What matters is that once we change some 
> fundamental aspect of Linux, we can adopt lguest/KVM immediately. With 
> 'external' hypervisors there is no such compulsory forward motion, and 
> my fear is that by giving them ABI interfaces to the innards of the 
> Linux guest they will just stick with those ABIs - and worse, drag Linux 
> along with them. (because distros will be forced by the legacy 
> assumptions to carry those ABIs along.)

I don't share your fear when it comes to Xen.

The Xen ABI did involve too, there are a few hypercalls with old and new
versions, just like different linux syscall versions exist.  Guests then
can choose to either fallback to the slow, old version in case the new
hypercall returns -ENOSYS or raise the minimum required xen version to
the one with the new hypercall and leave out the legacy cruft.

The current xen hypercall ABI isn't set in stone, it can and will evolve
too ...

> but there is no danger of KVM 
> staying in legacy land forever. With Xen and VMWare i see no guarantee 
> at all that Linux wont be hindered by their legacies (or by any plain 
> diverging approaches) forever.

I don't expect that being a problem with Xen.

> so for example, if we change some fundamental thing that can be 
> implemented via the legacy ABI but only slowly, that's not a problem 
> because new-lguest/new-KVM will use the new approach, so there's a 
> straightforward technology-based migratory path out of the legacy.

See above, that path exists with Xen too.

cheers,
  Gerd

-- 
Gerd Hoffmann <kraxel@suse.de>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 11:59                             ` Ingo Molnar
  2007-03-06 12:34                               ` Gerd Hoffmann
@ 2007-03-06 15:03                               ` Anthony Liguori
  2007-03-06 17:17                                 ` Nakajima, Jun
  2007-03-06 16:27                               ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 86+ messages in thread
From: Anthony Liguori @ 2007-03-06 15:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gerd Hoffmann, Jeremy Fitzhardinge, virtualization, Jan Beulich,
	Andrew Morton, Linus Torvalds, Roland McGrath, linux-kernel

Ingo Molnar wrote:
> * Gerd Hoffmann <kraxel@suse.de> wrote:
> 
>>>> So in the end you would still have two different hypervisor ABI's, 
>>>> the VMI ROM just hides that.
>>> oh, but that way i have cleverly pushed the problem out of Linux and 
>>> into the VMI-ROM's domain ;) Which is all i care about.
>> Fine, so lets move kvm paravirtualitzation into vmi too (proof of 
>> concept code by Anthony Liguori exists) and kill one more item on the 
>> (linux) QA test matrix?  (just following your arguments, not that I'm 
>> confident it would actually help reducing QA effort).
> 
> yes - although obviously a KVM Linux guest does not need such an 
> interface - but it's a nice proof of concept to integrate other guest 
> OSs into KVM.

I disagree that a KVM Linux guest does not benefit from VMI.  Right now, 
your KVM paravirt interface only covers CR3 target caching and apic 
enhancements (neither of which I believe have made it into 2.6.21). 
Inevitably, things like MMU batching will be added.

Using paravirt_ops, this is going to require new kernels for the guests. 
  Every new paravirtualization feature will require a new guest kernel. 
  With VMI, one can add these features to any 2.6.21+ guest by just 
modifying the ROM (assuming a newer host).  Some features will require 
new VMI entry points but quite a lot will fall under the current entry 
points.

Of all the hypervisors, KVM is the easiest to use VMI with.  QEMU 
already supports option ROM loading and Zach just made some changes to 
allow a native ROM to be implemented very easily.

If we're going to use VMI for anything other than VMware, it seems to be 
that KVM should be what we use it for.

Regards,

Anthony Liguori

> 	Ingo


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 11:59                             ` Ingo Molnar
  2007-03-06 12:34                               ` Gerd Hoffmann
  2007-03-06 15:03                               ` Anthony Liguori
@ 2007-03-06 16:27                               ` Jeremy Fitzhardinge
  2007-03-06 17:11                                 ` Ingo Molnar
  2 siblings, 1 reply; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 16:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gerd Hoffmann, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel

Ingo Molnar wrote:
> legacy support has to be ensured, but it does not hugely matter in terms 
> of the designing our future. What matters is that once we change some 
> fundamental aspect of Linux, we can adopt lguest/KVM immediately. With 
> 'external' hypervisors there is no such compulsory forward motion, and 
> my fear is that by giving them ABI interfaces to the innards of the 
> Linux guest they will just stick with those ABIs - and worse, drag Linux 
> along with them. (because distros will be forced by the legacy 
> assumptions to carry those ABIs along.)
>   

So what's your argument here?  That if Xen/ESX/whatever are evolved
separately from Linux, then they should be tied together behind one ABI,
so that even if one could support a new Linux feature it can't move
until everyone else implementing the ABI too?  How does that help anyone?

The big practical difficulty is that there are no examples of an ABI
with multiple implementations all being sufficiently cross-compatible
that a user of that ABI can actually rely on it working consistently
(and certainly not to the extent that it actually reduces the test
matrix size).  There are masses of subtle details that are hard to get
right, and making sure that multiple implementations get all this right
is apparently impossible (otherwise ACPI would be a dream to work with...).

> lguest/KVM is fundamentally special because its future evolution is 
> naturally aligned with that of Linux.

lguest is special, because it is purely kernel-internal (and because
Rusty wrote it, of course).  kvm is not, and I don't see why you think
it is.  In its simplest form its a thin layer which exposes hardware
virtualization to usermode, which happens to be qemu at the moment.  Its
evolving some bells and whistles, but at heart it's being developed by a
company with just as much commercial interest and backing as Xen and
VMI, and it has the scope to be just as cross-platform.

>  Sure, legacies will have to be 
> taken care of (just like Linux supports old system-calls and even old 
> driver APIs in some circumstances), but there is no danger of KVM 
> staying in legacy land forever.

Sure there is.  It needs the usermode part to keep sync; that's easy if
its qemu, but you're stuck if its a proprietary usermode implementing
that end of the kvm equation.

>  With Xen and VMWare i see no guarantee 
> at all that Linux wont be hindered by their legacies (or by any plain 
> diverging approaches) forever.
>   

Well, if the kernel goes one way and Xen doesn't follow, then that
pretty severely restricts Xen's usefulness - there's a pretty strong
incentive for Xen to keep supporting Linux.  And besides, Xen is all
GPL, so anyone who's sufficiently motivated can do this work.

> so for example, if we change some fundamental thing that can be 
> implemented via the legacy ABI but only slowly, that's not a problem 
> because new-lguest/new-KVM will use the new approach, so there's a 
> straightforward technology-based migratory path out of the legacy. But 
> if Xen or VMWare were to stick with that legacy ABI forever (for 
> whatever reason), we couldnt solve that situation on the Linux side at 
> all, via technological measures.
>   

Well, that's the reason for minimizing Linux's exposure to any ABI. 
Clearly the interface to a hypervisor has to be *some* ABI.  But it
doesn't help anyone if there's an extra uber-ABI which is a superset of
all the underlying hypervisor ABIs; that's just a big rigid, brittle mess.

The whole point of pv_ops is to allow the hypervisors interfaces to
evolve at their own pace without having to constrain the core kernel's
development

> yes - although obviously a KVM Linux guest does not need such an 
> interface - but it's a nice proof of concept to integrate other guest 
> OSs into KVM.
>   

Well, if there were any other guests that used VMI that might be
useful.  You'd get further using the Xen ABI.


    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  9:26                 ` Ingo Molnar
@ 2007-03-06 16:42                   ` Jeremy Fitzhardinge
  2007-03-06 17:18                     ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 16:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Rusty Russell, virtualization, Andrew Morton,
	Linus Torvalds, Roland McGrath, Andi Kleen, linux-kernel,
	Jan Beulich

Ingo Molnar wrote:
> i think you are missing my point.
>
> paravirt_ops is a Linux-internal abstraction that tries to make our life 
> easier but it has no relevance whatsoever to an external hypervisor - be 
> that Xen, VMWare/ESX or Windows/Longhorn.
>   

Of course it has relevance.  Linux is an important guest system, and not
supporting it would be a major problem.  I already have a list of things
which need to be done to Xen to make it work better with current kernels
- none of them mandatory, but nice to have.

> My suggestion would be for Linux to make only a /single/ external ABI 
> promise: VMI. (and we can extend it with higher-level paravirt ops, 
> etc.)
>   

"VMI" is not a promise, it's just three letters.  It doesn't even mean
the same thing now as it did 12 months ago.  Turning "VMI" from three
letters into anything remotely like a promise is a huge amount of work
which requires:

   1. someone actually sit down and fully document what all those
      entrypoints are going to do
   2. everyone to implement them
   3. someone to test that all the implementations conform to the
      document (bearing in mind that if anyone is going to go to all
      this effort, they're going to use this with non-Linux guests)
   4. and repeat all that every subsequent update

That's assuming all the parties involved are approaching the process
with goodwill.  If someone wanted to effectively stall the kernel's
development (at least in terms of useful kernels that people can keep
shipping and deploying), then all they have to do is drag their heels on
this "VMI" interface process.  All this heavyweight process is very
enterprisy, but it doesn't help Linux at all.

And given that the idea of multiple implementations of a complex ABI
like this will actually conform to a spec and/or each other is pretty
much pie-in-the-sky, the kernel will still have to deal with all the
quirks of multiple implementations anyway.

    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 16:27                               ` Jeremy Fitzhardinge
@ 2007-03-06 17:11                                 ` Ingo Molnar
  2007-03-07  2:16                                   ` Zachary Amsden
  0 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06 17:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Gerd Hoffmann, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, Roland McGrath, linux-kernel


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> The whole point of pv_ops is to allow the hypervisors interfaces to 
> evolve at their own pace without having to constrain the core kernel's 
> development

unfortunately that's a self-serving oxymoron, contradicted by real life 
;) Pretty much the only way to ensure a sane ABI is to do it like we do 
it with the Linux syscall ABI:

	_to have only one_

We do not let OpenOffice or Evolution have its own separate ABI to Linux 
so that they 'can evolve at their own pace'... We want them to cooperate 
and come up with a common ABI (or rather, we try to come up with the 
right syscalls ourselves), because divering, overlapping ABIs are a huge 
PITA.

We do not unify their pointlessly diverging ABIs to within the kernel 
via say office_ops (while we could) because that's crappy on its face. 
Hypervisors arent in any way different, they just _think_ they are 
special because they are relatively new. But hey, i dont expect you to 
concede this point ;)

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* RE: Xen & VMI?
  2007-03-06 15:03                               ` Anthony Liguori
@ 2007-03-06 17:17                                 ` Nakajima, Jun
  2007-03-06 17:32                                   ` Anthony Liguori
  2007-03-06 20:37                                   ` Ingo Molnar
  0 siblings, 2 replies; 86+ messages in thread
From: Nakajima, Jun @ 2007-03-06 17:17 UTC (permalink / raw)
  To: Anthony Liguori, Ingo Molnar
  Cc: virtualization, Roland McGrath, Andrew Morton, Linus Torvalds,
	Jan Beulich, linux-kernel

Anthony Liguori wrote:
> Ingo Molnar wrote:
>> * Gerd Hoffmann <kraxel@suse.de> wrote:
>> 
>>>>> So in the end you would still have two different hypervisor ABI's,
>>>>> the VMI ROM just hides that.
>>>> oh, but that way i have cleverly pushed the problem out of Linux
>>>> and into the VMI-ROM's domain ;) Which is all i care about.
>>> Fine, so lets move kvm paravirtualitzation into vmi too (proof of
>>> concept code by Anthony Liguori exists) and kill one more item on
>>> the (linux) QA test matrix?  (just following your arguments, not
>>> that I'm confident it would actually help reducing QA effort).
>> 
>> yes - although obviously a KVM Linux guest does not need such an
>> interface - but it's a nice proof of concept to integrate other guest
>> OSs into KVM.
> 
> I disagree that a KVM Linux guest does not benefit from VMI.  Right
> now, your KVM paravirt interface only covers CR3 target caching and
> apic enhancements (neither of which I believe have made it into
> 2.6.21). Inevitably, things like MMU batching will be added.

I think a KVM Linux would benefit more from paravirt ops, rather than
VMI. The higher-level interface such as the one in Xen, espeically for
I/O, interrupt controllers, timer, SMP, etc. actually simplifies the
implementation of the VMM, and improve performance of the guest. Even
for MMU, direct page tables, for example, would work better for
hardware-based virtualization because the processor can use the native
page tables. 

> 
> Using paravirt_ops, this is going to require new kernels for the
>   guests. Every new paravirtualization feature will require a new
>   guest kernel. With VMI, one can add these features to any 2.6.21+
> guest by just modifying the ROM (assuming a newer host).  Some
> features will require new VMI entry points but quite a lot will fall
> under the current entry points.
> 
> Of all the hypervisors, KVM is the easiest to use VMI with.  QEMU
> already supports option ROM loading and Zach just made some changes to
> allow a native ROM to be implemented very easily.
> 
> If we're going to use VMI for anything other than VMware, it seems to
> be that KVM should be what we use it for.
> 
> Regards,
> 
> Anthony Liguori
> 
>> 	Ingo
> 

Jun
---
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 16:42                   ` Jeremy Fitzhardinge
@ 2007-03-06 17:18                     ` Ingo Molnar
  2007-03-06 18:04                       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06 17:18 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, Rusty Russell, virtualization, Andrew Morton,
	Linus Torvalds, Roland McGrath, Andi Kleen, linux-kernel,
	Jan Beulich


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> > My suggestion would be for Linux to make only a /single/ external 
> > ABI promise: VMI. (and we can extend it with higher-level paravirt 
> > ops, etc.)
> 
> "VMI" is not a promise, it's just three letters.  It doesn't even mean 
> the same thing now as it did 12 months ago.  Turning "VMI" from three 
> letters into anything remotely like a promise is a huge amount of work 
> which requires:
> 
>    1. someone actually sit down and fully document what all those
>       entrypoints are going to do
>    2. everyone to implement them
>    3. someone to test that all the implementations conform to the
>       document (bearing in mind that if anyone is going to go to all
>       this effort, they're going to use this with non-Linux guests)
>    4. and repeat all that every subsequent update

There's no process needed. The only thing needed is to treat the Linux 
implementation as the reference design, documentation and specification. 
Treat it as we treat the Linux system calls. We promise not to change 
them. There's no "process" for that either, other than our promise, our 
taste and our best efforts - plus the backing of all distributions and 
the threat of a few million users who start yelling (or worse) if we 
break it ;)

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 17:17                                 ` Nakajima, Jun
@ 2007-03-06 17:32                                   ` Anthony Liguori
  2007-03-06 20:37                                   ` Ingo Molnar
  1 sibling, 0 replies; 86+ messages in thread
From: Anthony Liguori @ 2007-03-06 17:32 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: Ingo Molnar, virtualization, Roland McGrath, Andrew Morton,
	Linus Torvalds, Jan Beulich, linux-kernel

Nakajima, Jun wrote:
> Anthony Liguori wrote:
>   
>> Ingo Molnar wrote:
>>     
>>> * Gerd Hoffmann <kraxel@suse.de> wrote:
>>>
>>>       
>>>>>> So in the end you would still have two different hypervisor ABI's,
>>>>>> the VMI ROM just hides that.
>>>>>>             
>>>>> oh, but that way i have cleverly pushed the problem out of Linux
>>>>> and into the VMI-ROM's domain ;) Which is all i care about.
>>>>>           
>>>> Fine, so lets move kvm paravirtualitzation into vmi too (proof of
>>>> concept code by Anthony Liguori exists) and kill one more item on
>>>> the (linux) QA test matrix?  (just following your arguments, not
>>>> that I'm confident it would actually help reducing QA effort).
>>>>         
>>> yes - although obviously a KVM Linux guest does not need such an
>>> interface - but it's a nice proof of concept to integrate other guest
>>> OSs into KVM.
>>>       
>> I disagree that a KVM Linux guest does not benefit from VMI.  Right
>> now, your KVM paravirt interface only covers CR3 target caching and
>> apic enhancements (neither of which I believe have made it into
>> 2.6.21). Inevitably, things like MMU batching will be added.
>>     
>
> I think a KVM Linux would benefit more from paravirt ops, rather than
> VMI. 

Functionally speaking, the only difference between using VMI and 
paravirt_ops is that with VMI you redirect the paravirt_ops to a ROM 
area.  This has the following effects:

1) you cannot call back into Linux from the op implementation
2) you can change the implementation of the op w/o rebuilding the kernel

1 & 2 are trade-offs.  For everything that KVM can do wrt 
paravirtualization, there really isn't a need for #1 at the moment.  Xen 
is much more challenging to do with VMI as there are a lot of instances 
where #1 is quite useful.  I think you pretty much have to target 
paravirt_ops for Xen.

> The higher-level interface such as the one in Xen, espeically for
> I/O, interrupt controllers, timer, SMP, etc. actually simplifies the
> implementation of the VMM,

Right, but those higher-level interfaces can certainly be implemented 
within the context of a VMI rom.  For instance, VMI already defines a 
paravirtual timer.  In the case of interrupt control, it just provides 
hooks for APIC reads/writes with the assumption (presumably) that the 
ROM will implement APIC emulation and bridge to whatever the hypervisor 
abstraction is.

>  and improve performance of the guest. Even
> for MMU, direct page tables, for example, would work better for
> hardware-based virtualization because the processor can use the native
> page tables. 
>   

Direct paging is a whole other can of worms.  Fortunately, EPT and NPT 
will eliminate the need to worry about this in the future for things 
like KVM/HVM :-)

Regards,

Anthony Liguori

>> Using paravirt_ops, this is going to require new kernels for the
>>   guests. Every new paravirtualization feature will require a new
>>   guest kernel. With VMI, one can add these features to any 2.6.21+
>> guest by just modifying the ROM (assuming a newer host).  Some
>> features will require new VMI entry points but quite a lot will fall
>> under the current entry points.
>>
>> Of all the hypervisors, KVM is the easiest to use VMI with.  QEMU
>> already supports option ROM loading and Zach just made some changes to
>> allow a native ROM to be implemented very easily.
>>
>> If we're going to use VMI for anything other than VMware, it seems to
>> be that KVM should be what we use it for.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>     
>>> 	Ingo
>>>       
>
> Jun
> ---
> Intel Open Source Technology Center
>
>   


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 17:18                     ` Ingo Molnar
@ 2007-03-06 18:04                       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 18:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Rusty Russell, virtualization, Andrew Morton,
	Linus Torvalds, Roland McGrath, Andi Kleen, linux-kernel,
	Jan Beulich

Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>>> My suggestion would be for Linux to make only a /single/ external 
>>> ABI promise: VMI. (and we can extend it with higher-level paravirt 
>>> ops, etc.)
>>>       
>> "VMI" is not a promise, it's just three letters.  It doesn't even mean 
>> the same thing now as it did 12 months ago.  Turning "VMI" from three 
>> letters into anything remotely like a promise is a huge amount of work 
>> which requires:
>>
>>    1. someone actually sit down and fully document what all those
>>       entrypoints are going to do
>>    2. everyone to implement them
>>    3. someone to test that all the implementations conform to the
>>       document (bearing in mind that if anyone is going to go to all
>>       this effort, they're going to use this with non-Linux guests)
>>    4. and repeat all that every subsequent update
>>     
>
> There's no process needed. The only thing needed is to treat the Linux 
> implementation as the reference design, documentation and specification. 
>   
What Linux implementation?  Linux as a client of this interface?  So
everyone implements as much of the ABI as needed to get Linux to boot?

We have an excellent example of how well that model works.  Thousands of
BIOS implementations all implementing just enough of the various BIOS
interfaces to get Windows to boot.  And then all fall over as soon as
you try to be non-Windows, or even a later version of Windows.  It's the
path to being absolutely inundated with legacy crap.

You're arguing that we should have a single hypervisor ABI in order to,
among other things, reduce the test matrix, and yet the ABI is entirely
defined by testing to see how well a given implementation runs some
random version of Linux.  And if Linux wants to use that interface in a
different way, everyone is supposed to magically keep up.

And all this is supposed to be managed by multiple disparate independent
out-of-tree implementations?

Yep, I want a pony too.

> Treat it as we treat the Linux system calls. We promise not to change 
> them.

OK.  How does that help with "not hindering Linux's development"?

    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [patch] paravirt: VDSO page is essential
  2007-03-06  7:42           ` Zachary Amsden
  2007-03-06  7:50             ` Ingo Molnar
@ 2007-03-06 18:48             ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 18:48 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Ingo Molnar, Rusty Russell, virtualization, Andrew Morton,
	Roland McGrath, linux-kernel

Zachary Amsden wrote:
> Yes, I don't have a problem with your patch, I just wish I had been
> cc'd on it.  Fixing this is rather tricky, but I believe no strange
> build magic is required, it can be done in kernel init code.  Still
> building my SUSE 9.0 guest to test.  SUSE 9.0 is one of those that
> requires COMPAT_VDSO, yes?

Where do we stand on VDSO reloc?  Are you turning this into a proper
patch, or shall I have a look at it?

    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06  8:52                 ` Ingo Molnar
  2007-03-06  9:03                   ` Zachary Amsden
  2007-03-06  9:15                   ` Gerd Hoffmann
@ 2007-03-06 19:46                   ` Chris Wright
  2007-03-06 20:30                     ` Ingo Molnar
  2 siblings, 1 reply; 86+ messages in thread
From: Chris Wright @ 2007-03-06 19:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gerd Hoffmann, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, linux-kernel, Roland McGrath

* Ingo Molnar (mingo@elte.hu) wrote:
> yes - but de-facto contradicted by the Xen paravirt_ops patches sent to 
> lkml ;)

There's no intrinsic value to the Xen on VMI approach that's superior
to Xen on pv_ops (not to mention the complications that it causes).

What are you driving at?  You seem to be arguing that abstractions
are bad unless done via ABI's.  ACPI....

thanks,
-chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 19:46                   ` Chris Wright
@ 2007-03-06 20:30                     ` Ingo Molnar
  2007-03-06 20:53                       ` Chris Wright
  0 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06 20:30 UTC (permalink / raw)
  To: Chris Wright
  Cc: Gerd Hoffmann, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, linux-kernel, Roland McGrath


* Chris Wright <chrisw@sous-sol.org> wrote:

> What are you driving at?  You seem to be arguing that abstractions are 
> bad unless done via ABI's. [...]

i'm still arguing the same: that doing the same thing via overlapping, 
conflicting, redundant ABIs is crazy and contrary to the basic interests 
of Linux. It's like having 5 different, parallel variants of sys_open(), 
interfaced via a convoluted open_ops.

having data ABI coupling is one thing (filesystems, network formats, 
etc.). But having a 5-way function ABI coupling between system software 
running on the /same piece of hardware/, doing the same thing in essence 
is just madness in my book.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 17:17                                 ` Nakajima, Jun
  2007-03-06 17:32                                   ` Anthony Liguori
@ 2007-03-06 20:37                                   ` Ingo Molnar
  2007-03-06 21:02                                     ` Jeremy Fitzhardinge
                                                       ` (2 more replies)
  1 sibling, 3 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06 20:37 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: Anthony Liguori, virtualization, Roland McGrath, Andrew Morton,
	Linus Torvalds, Jan Beulich, linux-kernel


* Nakajima, Jun <jun.nakajima@intel.com> wrote:

> I think a KVM Linux would benefit more from paravirt ops, rather than 
> VMI. The higher-level interface such as the one in Xen, espeically for 
> I/O, interrupt controllers, timer, SMP, etc. actually simplifies the 
> implementation of the VMM, and improve performance of the guest. Even 
> for MMU, direct page tables, for example, would work better for 
> hardware-based virtualization because the processor can use the native 
> page tables.

maybe we are talking past each other because i dont really disagree with 
that: i mentioned it right at beginning that higher-level APIs would 
have to be added to VMI. What i'd like to avoid is the ABI duplication 
for the lowlevel stuff /and/ for the highlevel stuff. Since VMI is 
mostly about lowlevel stuff right now it's obvious that it would have to 
grow more highlevel ops. Doing an IO driver via IO emulation is 
obviously pretty ... low-tech.

maybe i shouldnt call it 'VMI' but 'the paravirt ABI'. I dont mind if 
it's the Xen ABI or the VMWare ABI or a mesh of the two - everyone can 
map their own internals to that /one/ ABI.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 20:30                     ` Ingo Molnar
@ 2007-03-06 20:53                       ` Chris Wright
  2007-03-06 21:03                         ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Chris Wright @ 2007-03-06 20:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Chris Wright, Gerd Hoffmann, virtualization, Jan Beulich,
	Andrew Morton, Linus Torvalds, linux-kernel, Roland McGrath

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Chris Wright <chrisw@sous-sol.org> wrote:
> 
> > What are you driving at?  You seem to be arguing that abstractions are 
> > bad unless done via ABI's. [...]
> 
> i'm still arguing the same: that doing the same thing via overlapping, 
> conflicting, redundant ABIs is crazy and contrary to the basic interests 
> of Linux. It's like having 5 different, parallel variants of sys_open(), 
> interfaced via a convoluted open_ops.

I would've said 5 parallel implementations of inode->i_op simply given
the nature of the operations, which is entirely sane.

> having data ABI coupling is one thing (filesystems, network formats, 
> etc.). But having a 5-way function ABI coupling between system software 
> running on the /same piece of hardware/, doing the same thing in essence 
> is just madness in my book.

This is where I'm not understanding your argument.  The hardware is
somewhat irrelevant since the OS is running on a platform presented by the
hypervisor.  And the point is to allow multiple implementations of the OS
opertations that interact with the platform.  And in essence all network
stacks and file systems are doing the same thing with the same hardware.
Here's the reality.  None of these hypervisors will be ABI compliant in
the way syscalls are (namely trap insn and hypercall number).  So there's
a bunch of glue wherever you design the system.  Your arguement is that in
some (arguably random) instances it makes sense to push the glue into
the ROM.  This idea that lguest and KVM come from the same source tree
is irrelevant when the issues you site are about support matrices (which
means that guest and host may not have come from the same source afterall).
I still don't understand your issue.

thanks,
-chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 20:37                                   ` Ingo Molnar
@ 2007-03-06 21:02                                     ` Jeremy Fitzhardinge
  2007-03-06 21:11                                       ` Ingo Molnar
  2007-03-06 21:35                                     ` Nakajima, Jun
  2007-03-07  0:44                                     ` Rusty Russell
  2 siblings, 1 reply; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 21:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nakajima, Jun, virtualization, Roland McGrath, Anthony Liguori,
	Andrew Morton, Linus Torvalds, linux-kernel, Jan Beulich

Ingo Molnar wrote:
> maybe we are talking past each other because i dont really disagree with 
> that: i mentioned it right at beginning that higher-level APIs would 
> have to be added to VMI. What i'd like to avoid is the ABI duplication 
> for the lowlevel stuff /and/ for the highlevel stuff. Since VMI is 
> mostly about lowlevel stuff right now it's obvious that it would have to 
> grow more highlevel ops. Doing an IO driver via IO emulation is 
> obviously pretty ... low-tech.
>
> maybe i shouldnt call it 'VMI' but 'the paravirt ABI'. I dont mind if 
> it's the Xen ABI or the VMWare ABI or a mesh of the two - everyone can 
> map their own internals to that /one/ ABI.

Well, that's the basic force shaping paravirt_ops; many of the calls are
generally used by all backends, and some are more specific.  The
entrypoints in paravirt_ops would be an approximate model for this
hypothetical ABI you're talking about.

But the key point you're missing is that this isn't a one-way
interface.  The hypervisor backend code makes calls into the kernel's
interfaces as well.  We use memory allocation, the interrupt
infrastructure, timers, per-cpu and as many other existing interfaces as
possible, so that we don't have to bloat paravirt_ops with duplicates of
all those other interfaces.

If you're seriously talking about an ABI, then you'd also have to
present stable ABIs for all subsystems the hypervisor backends want to
call into, either by actually freezing the internal linux interfaces
into ABIs, or by effectively duplicating them across the paravirt ABI
(or whatever).

    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 20:53                       ` Chris Wright
@ 2007-03-06 21:03                         ` Ingo Molnar
  2007-03-06 21:28                           ` Chris Wright
  0 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06 21:03 UTC (permalink / raw)
  To: Chris Wright
  Cc: Gerd Hoffmann, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, linux-kernel, Roland McGrath


* Chris Wright <chrisw@sous-sol.org> wrote:

> > i'm still arguing the same: that doing the same thing via 
> > overlapping, conflicting, redundant ABIs is crazy and contrary to 
> > the basic interests of Linux. It's like having 5 different, parallel 
> > variants of sys_open(), interfaced via a convoluted open_ops.
> 
> I would've said 5 parallel implementations of inode->i_op simply given 
> the nature of the operations, which is entirely sane.

with the big freaking difference that the 5 parallel implementations of 
inode->i_op are:

	_internal to Linux_

Doh. There's only a data ABI underneath them.

every time someone tried to impose a functional/behavioral ABI on core 
bits of Linux we said: 'no way dude!'. Remember STREAMS? Remember the 
module KABI? Remember ACPI? [doh, i guess we messed up on the latter 
one. We regret that day ever since.]

(network file systems are a bit of an exception to the rule, but those 
are pretty isolated themselves and in no way as wide and central as the 
direction paravirt_ops appears to grow.)

> > having data ABI coupling is one thing (filesystems, network formats, 
> > etc.). But having a 5-way function ABI coupling between system 
> > software running on the /same piece of hardware/, doing the same 
> > thing in essence is just madness in my book.
> 
> This is where I'm not understanding your argument.  The hardware is 
> somewhat irrelevant since the OS is running on a platform presented by 
> the hypervisor.  And the point is to allow multiple implementations of 
> the OS opertations that interact with the platform.  And in essence 
> all network stacks and file systems are doing the same thing with the 
> same hardware. [...]

again, those are /DATA/ ABIs. Not function ABIs. Not behavioral ABIs. 
The coupling is /FAR/ saner and far more plannable and far more 
isolated. And even data ABIs are very non-trivial ...

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 21:02                                     ` Jeremy Fitzhardinge
@ 2007-03-06 21:11                                       ` Ingo Molnar
  2007-03-06 21:13                                         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06 21:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nakajima, Jun, virtualization, Roland McGrath, Anthony Liguori,
	Andrew Morton, Linus Torvalds, linux-kernel, Jan Beulich


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> If you're seriously talking about an ABI, [...]

HELLO, this isnt a hypothetical!! The moment there's a xen_paravirt_ops, 
Linux has DE FACTO committed itself to the Xen ABI: whatever 
functionality the hypercall_page call table plus the int $0x82 interface 
offers.

THE MOMENT any of that goes upstream and ships in a distro it's going to 
be there forever! Try to change paravirt_ops or any core bit of Linux so 
that this ABI cannot be sanely supported: 'fix it, you broke Xen!'. It 
wont matter that paravirt_ops is 'internal' to Linux.

so trying to argue as if there was no ABI imposed on Linux by hiding the 
Xen ABI behind paravirt_ops, and whistling into the air as if nothing 
happened is misguided at best.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 21:11                                       ` Ingo Molnar
@ 2007-03-06 21:13                                         ` Jeremy Fitzhardinge
  2007-03-06 21:20                                           ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 21:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nakajima, Jun, virtualization, Roland McGrath, Anthony Liguori,
	Andrew Morton, Linus Torvalds, linux-kernel, Jan Beulich

Ingo Molnar wrote:
> so trying to argue as if there was no ABI imposed on Linux by hiding the 
> Xen ABI behind paravirt_ops, and whistling into the air as if nothing 
> happened is misguided at best.

How is the situation even slightly different with a unified hypervisor ABI?

    J


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 21:13                                         ` Jeremy Fitzhardinge
@ 2007-03-06 21:20                                           ` Ingo Molnar
  2007-03-06 21:46                                             ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2007-03-06 21:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nakajima, Jun, virtualization, Roland McGrath, Anthony Liguori,
	Andrew Morton, Linus Torvalds, linux-kernel, Jan Beulich


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> > so trying to argue as if there was no ABI imposed on Linux by hiding 
> > the Xen ABI behind paravirt_ops, and whistling into the air as if 
> > nothing happened is misguided at best.
> 
> How is the situation even slightly different with a unified hypervisor 
> ABI?

1 sane ABI instead of 4 parallel ones? It's the same difference as the 
difference between 300 system calls and 1200 system calls. Alot more 
focus, alot more integration, alot less pain. Every time i change a 
detail in Linux i have to update (and think about) 1 virtualization 
aspect - not 4 (or more).

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 21:03                         ` Ingo Molnar
@ 2007-03-06 21:28                           ` Chris Wright
  2007-03-07  2:35                             ` Zachary Amsden
  0 siblings, 1 reply; 86+ messages in thread
From: Chris Wright @ 2007-03-06 21:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Chris Wright, Gerd Hoffmann, virtualization, Jan Beulich,
	Andrew Morton, Linus Torvalds, linux-kernel, Roland McGrath

* Ingo Molnar (mingo@elte.hu) wrote:
> with the big freaking difference that the 5 parallel implementations of 
> inode->i_op are:
> 
> 	_internal to Linux_

Granted.  Until it's modprobe xen.ko, vmware.ko, viridian.ko...that's
not going to go away, even with the VMI.

> Doh. There's only a data ABI underneath them.

Perhaps from the syscall perspective.  But there's also things like
internal interactions with the page cache, VMA and pt updates, etc
which are much more functional/behavioural than purely moving data.
And when we're talking about page tables or simlar for hv, I think the
data vs. functional is an oversimplification.

> every time someone tried to impose a functional/behavioral ABI on core 
> bits of Linux we said: 'no way dude!'. Remember STREAMS? Remember the 
> module KABI? Remember ACPI? [doh, i guess we messed up on the latter 
> one. We regret that day ever since.]

Heheh, OK, now I know I'm lost when we've both used ACPI as a negative
example to argue our point.  Honestly, all of the above suggest no VMI
to me (in fact, I used those type of arguments against VMI about 1 year
ago ;-).

> (network file systems are a bit of an exception to the rule, but those 
> are pretty isolated themselves and in no way as wide and central as the 
> direction paravirt_ops appears to grow.)
> 
> > > having data ABI coupling is one thing (filesystems, network formats, 
> > > etc.). But having a 5-way function ABI coupling between system 
> > > software running on the /same piece of hardware/, doing the same 
> > > thing in essence is just madness in my book.
> > 
> > This is where I'm not understanding your argument.  The hardware is 
> > somewhat irrelevant since the OS is running on a platform presented by 
> > the hypervisor.  And the point is to allow multiple implementations of 
> > the OS opertations that interact with the platform.  And in essence 
> > all network stacks and file systems are doing the same thing with the 
> > same hardware. [...]
> 
> again, those are /DATA/ ABIs. Not function ABIs. Not behavioral ABIs. 
> The coupling is /FAR/ saner and far more plannable and far more 
> isolated. And even data ABIs are very non-trivial ...

I agree that changing the interface to the low-level platform is tricky
and less isolated.  But how does the VMI protect you from those changes?
It simply doesn't, the changes are still necessary.  And the inflexibility
means the tough corner cases swept under the VMI rug are more difficult
to debug, get right, etc...

thanks,
-chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* RE: Xen & VMI?
  2007-03-06 20:37                                   ` Ingo Molnar
  2007-03-06 21:02                                     ` Jeremy Fitzhardinge
@ 2007-03-06 21:35                                     ` Nakajima, Jun
  2007-03-07  0:44                                     ` Rusty Russell
  2 siblings, 0 replies; 86+ messages in thread
From: Nakajima, Jun @ 2007-03-06 21:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, virtualization, Roland McGrath, Andrew Morton,
	Linus Torvalds, Jan Beulich, linux-kernel

Ingo Molnar wrote:
> * Nakajima, Jun <jun.nakajima@intel.com> wrote:
> 
>> I think a KVM Linux would benefit more from paravirt ops, rather than
>> VMI. The higher-level interface such as the one in Xen, espeically
>> for I/O, interrupt controllers, timer, SMP, etc. actually simplifies
>> the implementation of the VMM, and improve performance of the guest.
>> Even for MMU, direct page tables, for example, would work better for
>> hardware-based virtualization because the processor can use the
>> native page tables.
> 
> maybe we are talking past each other because i dont really disagree
> with that: i mentioned it right at beginning that higher-level APIs
> would have to be added to VMI. What i'd like to avoid is the ABI
> duplication for the lowlevel stuff /and/ for the highlevel stuff.
> Since VMI is mostly about lowlevel stuff right now it's obvious that
> it would have to grow more highlevel ops. Doing an IO driver via IO
> emulation is obviously pretty ... low-tech.
>

I agree with you.
 
> maybe i shouldnt call it 'VMI' but 'the paravirt ABI'. I dont mind if
> it's the Xen ABI or the VMWare ABI or a mesh of the two - everyone can
> map their own internals to that /one/ ABI.

To me it should be handled as 'paravirt devices', 'paravirt chipset',
etc. If we use the standard H/W detection mechanism (such as CPUID, I/O
port, etc.) used by the native kernel, we should be able to extend the
kernel cleanly (or just add device drivers). And the key here is to
define the behavior of such pseudo (or fake) H/W as we do for actual
H/W. Then 'the paravirt ABI' is the set of (high-level) operations for
such fake H/W devices.

> 
> 	Ingo

Jun
---
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 21:20                                           ` Ingo Molnar
@ 2007-03-06 21:46                                             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 86+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-06 21:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nakajima, Jun, virtualization, Roland McGrath, Anthony Liguori,
	Andrew Morton, Linus Torvalds, linux-kernel, Jan Beulich

Ingo Molnar wrote:
> 1 sane ABI instead of 4 parallel ones? It's the same difference as the 
> difference between 300 system calls and 1200 system calls. Alot more 
> focus, alot more integration, alot less pain. Every time i change a 
> detail in Linux i have to update (and think about) 1 virtualization 
> aspect - not 4 (or more).
>   

But you're just pushing the problem off; it doesn't go away, and it
doesn't really stop being your problem.  If you change the way you use
the abi and something breaks in one of its implementations, you're still
going to get "wahh, you broke Xen/ESX/whatever".  And then you either
have to wait for the ABI implementation to get fixed, or work around it
on the Linux side.

I don't see why you think an ABI is easier to reason about than the
pv_ops api?  If you're going to make a kernel change that has an effect
on the API, its also going to have an effect on the ABI, but with the
ABI you're stuck waiting for the ABI to catch up before you can do
something.  If your kernel change relies on pv_ops but won't need to
change it, then reasoning about the pv_ops API is enough to make things
work.

pvops has, what, about 90 calls in it, which break down into a few broad
classes:

   1. various setup things, which more or less correspond to subsystems
      (interrupts, time, memory, etc)
   2. calls which are direct analogues of hardware operations
   3. pagetable/tlb operations
   4. random corner cases (like the apic stuff to plug into ESX's apic
      emulation)

The (functionally) large missing component is SMP support, though I
expect that will only come down to a handful of extra operations.

All of these operations are pretty easily understood in their own terms
at the pv_ops interface level.  Knowing how the pv_ops backends map
these onto a particular hypercall interface is not really necessary to
understand the pv_ops interface, though its readily visible by reading
the source.  But if you need to look and change it, you can; hiding it
under a layer of magic ABI dust isn't going to fix that.

The other big part that's missing from paravirt_ops is all the hooks for
calling back into the kernel - because they're not necessary.  You
skipped that point in my earlier mail, but it is important.  A basic
element in Xen's design is that it interacts with the guest kernel in a
fairly high level, and makes use of existing kernel mechanisms whereever
possible.  How do you see that working in your proposal?

    J

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 20:37                                   ` Ingo Molnar
  2007-03-06 21:02                                     ` Jeremy Fitzhardinge
  2007-03-06 21:35                                     ` Nakajima, Jun
@ 2007-03-07  0:44                                     ` Rusty Russell
  2007-03-07  0:54                                       ` Anthony Liguori
                                                         ` (2 more replies)
  2 siblings, 3 replies; 86+ messages in thread
From: Rusty Russell @ 2007-03-07  0:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nakajima, Jun, virtualization, Roland McGrath, Anthony Liguori,
	Andrew Morton, Linus Torvalds, linux-kernel, Jan Beulich

On Tue, 2007-03-06 at 21:37 +0100, Ingo Molnar wrote:
> maybe i shouldnt call it 'VMI' but 'the paravirt ABI'. I dont mind if 
> it's the Xen ABI or the VMWare ABI or a mesh of the two - everyone can 
> map their own internals to that /one/ ABI.

I think it's an excellent aim, but it's *HARD*.  I rejected this
approach earlier because I'm just not smart enough.  (Yet?)

The Linux side is fairly stable.  The hardware side is changing, and the
hypervisor side is changing.  This means the ABI will churn fairly fast.
The hypervisors are very different, which means the ABI will be very
wide.

We could start with VMI and try to support Xen, KVM and lguest.  It
would at least give us a better idea of the scope of the problem.  But
IMHO it's a *huge* job.

Rusty.



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-07  0:44                                     ` Rusty Russell
@ 2007-03-07  0:54                                       ` Anthony Liguori
  2007-03-07  3:06                                       ` Zachary Amsden
  2007-03-07  8:15                                       ` Ingo Molnar
  2 siblings, 0 replies; 86+ messages in thread
From: Anthony Liguori @ 2007-03-07  0:54 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Nakajima, Jun, virtualization, Roland McGrath,
	Andrew Morton, Linus Torvalds, linux-kernel, Jan Beulich

Rusty Russell wrote:
> On Tue, 2007-03-06 at 21:37 +0100, Ingo Molnar wrote:
>   
>> maybe i shouldnt call it 'VMI' but 'the paravirt ABI'. I dont mind if 
>> it's the Xen ABI or the VMWare ABI or a mesh of the two - everyone can 
>> map their own internals to that /one/ ABI.
>>     
>
> I think it's an excellent aim, but it's *HARD*.  I rejected this
> approach earlier because I'm just not smart enough.  (Yet?)
>
> The Linux side is fairly stable.  The hardware side is changing, and the
> hypervisor side is changing.  This means the ABI will churn fairly fast.
> The hypervisors are very different, which means the ABI will be very
> wide.
>
> We could start with VMI and try to support Xen, KVM and lguest.

There is one more here.  We also have Xen HVM which will soon want to be 
paravirtualized too.  We don't want the current xen paravirt_ops for 
that as they have a lot of things that HVM does not need.

Since KVM and Xen HVM have the least requirements in term of guest 
modifications, they are probably the obviously places to start.

Regards,

Anthony Liguori

>   It
> would at least give us a better idea of the scope of the problem.  But
> IMHO it's a *huge* job.
>
> Rusty.
>
>
>
>   


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 17:11                                 ` Ingo Molnar
@ 2007-03-07  2:16                                   ` Zachary Amsden
  0 siblings, 0 replies; 86+ messages in thread
From: Zachary Amsden @ 2007-03-07  2:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, virtualization, Jan Beulich, Linus Torvalds,
	Andrew Morton, Roland McGrath, linux-kernel

Ingo Molnar wrote:

> We do not let OpenOffice or Evolution have its own separate ABI to Linux 
> so that they 'can evolve at their own pace'... We want them to cooperate 
> and come up with a common ABI (or rather, we try to come up with the 
> right syscalls ourselves), because divering, overlapping ABIs are a huge 
> PITA.
>   

OpenOffice or Evolution are the completely wrong example.  They disprove 
your point more than they prove it.  Consider any significantly large 
cross-platform software like OpenOffice, Evolution, Firefox.  You don't 
let or restrict what these pieces of software do at all.  They evolve at 
their own pace, and they all build their very complicated and divergent 
cross platform compatibility layers, with huge, overlapping APIs, 
converging in places, diverging in others.

> We do not unify their pointlessly diverging ABIs to within the kernel 
> via say office_ops (while we could) because that's crappy on its face. 
> Hypervisors arent in any way different, they just _think_ they are 
> special because they are relatively new. But hey, i dont expect you to 
> concede this point ;)

No, you don't.  The developers of Office and Evolution and Firefox do 
that for you.  And it's not crappy on its face because it provides real 
value to them - the ability to run heterogeneously in multiple different 
environments and across many different platforms and operating systems.

Where your analogy is wrong is that in this case, Linux is very much 
like one of those large software systems.  It has complicated features 
that require special plugins to work efficiently in different hypervisor 
environments.  And paravirt-ops is providing that functionality to 
Linux, just as the platform layer of any large software system does and 
very much should do.

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-06 21:28                           ` Chris Wright
@ 2007-03-07  2:35                             ` Zachary Amsden
  0 siblings, 0 replies; 86+ messages in thread
From: Zachary Amsden @ 2007-03-07  2:35 UTC (permalink / raw)
  To: Chris Wright, Ingo Molnar
  Cc: Gerd Hoffmann, virtualization, Jan Beulich, Andrew Morton,
	Linus Torvalds, linux-kernel, Roland McGrath

Chris Wright wrote:
> I agree that changing the interface to the low-level platform is tricky
> and less isolated.  But how does the VMI protect you from those changes?
> It simply doesn't, the changes are still necessary.  And the inflexibility
> means the tough corner cases swept under the VMI rug are more difficult
> to debug, get right, etc...
>   

I actually disagree here.  Yes, it will change over time.  VMI was 
designed to be extensible and flexible - you can omit implementation for 
any calls you don't require, and with consensus, you can add new flags, 
fields, and calls where you need them.  But VMI as it stands today is 
simply not sufficient to support the hypervisors which are here now.  
There are gaps, particularly with SMP support, which require significant 
changes to either the hypervisors, the kernel, or the VMI itself.  There 
are many reasons these gaps still exist, but most prominently, the big 
reason is that nobody wanted to use a single ABI to interface to the 
hypervisor a year ago when we first proposed the VMI interface as a 
virtualization solution for Linux.  In the end, I see no reason the 
technical issues can't be solved, but the larger questions about the 
future evolution of the interface and also some largely non-technical 
points, valid or not, have stalled the growth which we originally desired.

At this point, the question of whether to pursue a common ABI is no 
longer a technical issue, it's no longer a management or evolutionary 
issue at all.  It's a pragmatic issue about getting code that works into 
Linux today.  It's about working together using what we have as a base, 
which is paravirt-ops, to get working code to users.  We can always 
evolve the code in tree if we find a workable cross-vendor ABI that 
solves everyone's problems.  But that is neither here nor there, because 
it isn't here today.

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-07  0:44                                     ` Rusty Russell
  2007-03-07  0:54                                       ` Anthony Liguori
@ 2007-03-07  3:06                                       ` Zachary Amsden
  2007-03-07  8:15                                       ` Ingo Molnar
  2 siblings, 0 replies; 86+ messages in thread
From: Zachary Amsden @ 2007-03-07  3:06 UTC (permalink / raw)
  To: Rusty Russell, Ingo Molnar
  Cc: Nakajima, Jun, virtualization, Roland McGrath, Anthony Liguori,
	Andrew Morton, Linus Torvalds, linux-kernel, Jan Beulich

Rusty Russell wrote:
> On Tue, 2007-03-06 at 21:37 +0100, Ingo Molnar wrote:
>   
>> maybe i shouldnt call it 'VMI' but 'the paravirt ABI'. I dont mind if 
>> it's the Xen ABI or the VMWare ABI or a mesh of the two - everyone can 
>> map their own internals to that /one/ ABI.
>>     
>
> I think it's an excellent aim, but it's *HARD*.  I rejected this
> approach earlier because I'm just not smart enough.  (Yet?)
>   

With VMI, I think we came within 90% of getting a cross vendor 
paravirt-ABI that satisfied everyone's needs.  Nobody is smart enough to 
figure out the last 10% - it needs cooperation, trial, error, and 
experience dealing with each other's hypervisors.

> The Linux side is fairly stable.  The hardware side is changing, and the
> hypervisor side is changing.  This means the ABI will churn fairly fast.
> The hypervisors are very different, which means the ABI will be very
> wide.
>
> We could start with VMI and try to support Xen, KVM and lguest.  It
> would at least give us a better idea of the scope of the problem.  But
> IMHO it's a *huge* job.
>   

Surely, given time, the technical issues can be worked out.  In the 
meantime, the hardware has evolved, and many of the points that are now 
important have changed - and new issues have come into play that we 
can't anticipate yet.  At some point, we will hopefully converge, but we 
might not, and it is a huge job.  UDI had similarly lofty goals.  It was 
started in 1998.  Where is it today?

But this isn't the problem.  The problem is that nobody wants a single 
ABI.  Just like no hardware vendors want a fixed ABI for their 
hardware.  They need to innovate independently, and time to market and 
features are more important than being binary compatible with a bunch of 
competing vendors.  They want to differentiate, and break away from an 
ABI, and as history repeats, again and again, this happens eventually 
with every ABI.

So once the ivory tower is built, and you let all the kids in to play, 
they are going to have a party and you are going to start noticing chips 
and eventually cracks, and eventually the tower will go into disrepair 
and fall because somebody else has built a new and better one further 
down the road.  Why go through that exercise if nobody sees any tangible 
benefit from it today?

Paravirt-ops avoids this because it is an API, and because it is 
flexible, and because it can change with the kernel, and because it 
doesn't lock you into a legacy way of doing things, it allows you to 
fork and adapt and push legacy and future compatibility issues into the 
vendor backend modules, like VMI, where they should belong.

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-07  0:44                                     ` Rusty Russell
  2007-03-07  0:54                                       ` Anthony Liguori
  2007-03-07  3:06                                       ` Zachary Amsden
@ 2007-03-07  8:15                                       ` Ingo Molnar
  2007-03-07  9:17                                         ` Zachary Amsden
  2007-03-07 19:14                                         ` Dan Hecht
  2 siblings, 2 replies; 86+ messages in thread
From: Ingo Molnar @ 2007-03-07  8:15 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Nakajima, Jun, virtualization, Roland McGrath, Anthony Liguori,
	Andrew Morton, Linus Torvalds, linux-kernel, Jan Beulich,
	linux-kernel


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Tue, 2007-03-06 at 21:37 +0100, Ingo Molnar wrote:
> > maybe i shouldnt call it 'VMI' but 'the paravirt ABI'. I dont mind if 
> > it's the Xen ABI or the VMWare ABI or a mesh of the two - everyone can 
> > map their own internals to that /one/ ABI.
> 
> I think it's an excellent aim, but it's *HARD*.  I rejected this 
> approach earlier because I'm just not smart enough.  (Yet?)
> 
> The Linux side is fairly stable.  The hardware side is changing, and 
> the hypervisor side is changing.  This means the ABI will churn fairly 
> fast. The hypervisors are very different, which means the ABI will be 
> very wide.

the 'hardware is changing fast so we cannot do a sane API' argument 
sounds good at first but in this context it is still fundamentally 
wrong. Hardware has changed /dramatically/ since we started Linux, still 
we didnt have to do dramatic changes to the system call API/ABI. Why? 
Because hardware too is fundamentally controlled by the rules of this 
world. So if you know the laws of physics, math and computer science, 
you /CAN/ do a sane API that lives for quite some time. We had these 
kinds of discussions when Linux was just a few years old - many people 
were worried about 'the hardware changes too fast' - but if the 
fundamentals are strong, it _doesnt really matter_, as long as our 
interfaces are sane and we quickly adopt our internals.

On the other hand, Linux's internal details, semantics, approaches are a 
lot more ad-hoc and alot more affected by changes in the hardware 
environment - that's why i'd not like to see some external ABI 
constraint limit aspects of those internals.

For example, VMI_CALL_SetAlarm takes a 'cycles' argument. Cycles is a 
quite bad unit for an API, it should be absolute time, nanosec or 
picosec based instead. We could easily see CPUs that have /no concept of 
cycles/, at all! Even today's CPUs have hardly any fix concept of 
cycles, due to cpufreq. It's as if 15 years ago we had based sys_mmap() 
around the concept of '16-bit segments'. We could certainly make it work 
on current hardware but it would look pretty awkward today.

in fact hardware changes alot more by just going from one Linux arch to 
another - still the system call API is essentially the same. (with 
small, non-fundamental variations)

furthermore, most of the details in VMI or in Xen's lowlevel APIs (where 
most of the overlap is currently - VMI doesnt have all that many 
highlevel APIs) are cast into stone. The i386 arch is not going to 
change, ever. Most details of the x86_64 arch is not going to change, 
ever. It's unclear whether there will ever be the need for any x86_128 
arch (for humans). So it should be quite possible to come up with 
something sane for these lowlevel details, and cast it into stone, for 
everyone. Just like the chip makers cast it into silicon.

the more nontrivial (and thus more harmful, because more 
design-limiting) bits are the highlevel APIs.

> We could start with VMI and try to support Xen, KVM and lguest.  It 
> would at least give us a better idea of the scope of the problem.  But 
> IMHO it's a *huge* job.

yeah, it's a nontrivial job - like writing a sane OS. But it's doable 
and we are in fact out here trying to do exactly that, right? ;-)

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-07  8:15                                       ` Ingo Molnar
@ 2007-03-07  9:17                                         ` Zachary Amsden
  2007-03-07 11:15                                           ` Thomas Gleixner
  2007-03-07 19:14                                         ` Dan Hecht
  1 sibling, 1 reply; 86+ messages in thread
From: Zachary Amsden @ 2007-03-07  9:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Nakajima, Jun, virtualization, Roland McGrath,
	Anthony Liguori, Andrew Morton, Linus Torvalds, linux-kernel,
	Jan Beulich, Daniel Hecht

Ingo Molnar wrote:
> For example, VMI_CALL_SetAlarm takes a 'cycles' argument. Cycles is a 
> quite bad unit for an API, it should be absolute time, nanosec or 
> picosec based instead. We could easily see CPUs that have /no concept of 
>   

Actually, putting the unit in terms of cycles is more portable and 
flexible.  Rather than perform a conversion from cycles to 
nano/femto/pico seconds, the raw cycle count is exposed, along with the 
current clock frequency.  This allows the timer infrastructure to merely 
do one conversion, from cycles to real time, rather than converting to 
an arbitrary time unit which may change with operating systems and time 
and thus break the ABI.

Zach

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-07  9:17                                         ` Zachary Amsden
@ 2007-03-07 11:15                                           ` Thomas Gleixner
  0 siblings, 0 replies; 86+ messages in thread
From: Thomas Gleixner @ 2007-03-07 11:15 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Ingo Molnar, Rusty Russell, Nakajima, Jun, virtualization,
	Roland McGrath, Anthony Liguori, Andrew Morton, Linus Torvalds,
	linux-kernel, Jan Beulich, Daniel Hecht

On Wed, 2007-03-07 at 01:17 -0800, Zachary Amsden wrote:
> Ingo Molnar wrote:
> > For example, VMI_CALL_SetAlarm takes a 'cycles' argument. Cycles is a 
> > quite bad unit for an API, it should be absolute time, nanosec or 
> > picosec based instead. We could easily see CPUs that have /no concept of 
> >   
> 
> Actually, putting the unit in terms of cycles is more portable and 
> flexible.  Rather than perform a conversion from cycles to 
> nano/femto/pico seconds, the raw cycle count is exposed, along with the 
> current clock frequency.  This allows the timer infrastructure to merely 
> do one conversion, from cycles to real time, rather than converting to 
> an arbitrary time unit which may change with operating systems and time 
> and thus break the ABI.

Putting the unit in terms of cycles is just ugly. Virtual hardware
should provide the easiest interface and in case of time this _IS_
nanoseconds. 

nanoseconds is neither an arbitrary time unit nor will it change anytime
soon to femtoseconds. So your argument that the ABI might break is just
a strawman.

The cycles conversion gets ugly as hell, as you want to have absolute
time for your clock event reprogramming. This requires 128 bit math in
the reprogramming path for nothing and I'm not going to put it there.

Even worse on a Linux host we would convert ktime_t to some virtual
hardware clock on the guest side, feed it through paravirt to the host
and convert it back to ktime_t as the host uses an hrtimer to schedule
the next guest event.

The whole rush of paravirt ops leads to an arbitrary number of virtual
clock source and clock event devices instead of having one virtual
silicon with a sane design and a per hypervisor backend. Paravirtualized
kernels should provide _sane_ silicon emulations rather than dumping
more crap on the kernel developers by competing with the real silicon
vendors for the BDHA (Brain Damaged Hardware Award).

	tglx



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Xen & VMI?
  2007-03-07  8:15                                       ` Ingo Molnar
  2007-03-07  9:17                                         ` Zachary Amsden
@ 2007-03-07 19:14                                         ` Dan Hecht
  1 sibling, 0 replies; 86+ messages in thread
From: Dan Hecht @ 2007-03-07 19:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, virtualization, Jan Beulich, Anthony Liguori,
	Andrew Morton, Linus Torvalds, linux-kernel, Roland McGrath,
	Dan Hecht

On 03/07/2007 12:15 AM, Ingo Molnar wrote:
> On the other hand, Linux's internal details, semantics, approaches are a 
> lot more ad-hoc and alot more affected by changes in the hardware 
> environment - that's why i'd not like to see some external ABI 
> constraint limit aspects of those internals.
> 
> For example, VMI_CALL_SetAlarm takes a 'cycles' argument. Cycles is a 
> quite bad unit for an API, it should be absolute time, nanosec or 
> picosec based instead. We could easily see CPUs that have /no concept of 
> cycles/, at all! Even today's CPUs have hardly any fix concept of 
> cycles, due to cpufreq. It's as if 15 years ago we had based sys_mmap() 
> around the concept of '16-bit segments'. We could certainly make it work 
> on current hardware but it would look pretty awkward today.
> 
>

Ingo,

In the VMI definition, "cycles" does not mean "cpu cycles".  It is used 
in the normal way to mean "an interval of time during which a sequence 
of a recurring succession of events or phenomena is completed" 
[Merriam-Webster].  In this case, the recurring event is the increment 
of a counter.  The routine VMI_CALL_GetCycleFrequency defines how many 
of these events occur per second.  The rate is not variable, so is not 
subject to cpu phenomena such as cpufreq.  And it does not need to be 
tied in any way to cpu cycle frequency.  How your cpu is implemented is 
not relevant.

If a hypervisor wishes to expose it's time counters in units of 
nanoseconds, then it simply returns 1000000000 from GetCycleFrequency.

Dan

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2007-03-07 19:19 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-05 12:06 [patch] paravirt: VDSO page is essential Ingo Molnar
2007-03-05 12:36 ` Avi Kivity
2007-03-05 12:40   ` Ingo Molnar
2007-03-05 13:00     ` Avi Kivity
2007-03-05 13:32       ` Rusty Russell
2007-03-05 14:28   ` Andi Kleen
2007-03-05 13:48     ` Ingo Molnar
2007-03-05 14:58       ` Andi Kleen
2007-03-05 13:59         ` Ingo Molnar
2007-03-05 14:10           ` Avi Kivity
2007-03-05 14:10             ` Ingo Molnar
2007-03-05 13:28 ` Rusty Russell
2007-03-05 13:38   ` Ingo Molnar
2007-03-05 14:34   ` Andi Kleen
2007-03-05 13:46     ` [patch] paravirt: re-enable COMPAT_VDSO Ingo Molnar
2007-03-05 13:48     ` [patch] paravirt: VDSO page is essential Ingo Molnar
2007-03-05 20:11     ` Zachary Amsden
2007-03-05 20:16       ` Andi Kleen
2007-03-05 20:33         ` Zachary Amsden
2007-03-05 20:19       ` Ingo Molnar
2007-03-05 20:42         ` Zachary Amsden
2007-03-06  0:57   ` Rusty Russell
2007-03-06  1:03     ` Zachary Amsden
2007-03-06  1:11       ` Rusty Russell
2007-03-06  1:14       ` Jeremy Fitzhardinge
2007-03-06  1:51         ` Zachary Amsden
2007-03-06  1:53           ` Jeremy Fitzhardinge
2007-03-06  8:19             ` Xen & VMI? Ingo Molnar
2007-03-06  8:37               ` Gerd Hoffmann
2007-03-06  8:48                 ` Zachary Amsden
2007-03-06  8:52                 ` Ingo Molnar
2007-03-06  9:03                   ` Zachary Amsden
2007-03-06  9:10                     ` Ingo Molnar
2007-03-06  9:15                   ` Gerd Hoffmann
2007-03-06  9:34                     ` Ingo Molnar
2007-03-06 10:15                       ` Gerd Hoffmann
2007-03-06 10:26                         ` Ingo Molnar
2007-03-06 11:04                           ` Gerd Hoffmann
2007-03-06 11:59                             ` Ingo Molnar
2007-03-06 12:34                               ` Gerd Hoffmann
2007-03-06 15:03                               ` Anthony Liguori
2007-03-06 17:17                                 ` Nakajima, Jun
2007-03-06 17:32                                   ` Anthony Liguori
2007-03-06 20:37                                   ` Ingo Molnar
2007-03-06 21:02                                     ` Jeremy Fitzhardinge
2007-03-06 21:11                                       ` Ingo Molnar
2007-03-06 21:13                                         ` Jeremy Fitzhardinge
2007-03-06 21:20                                           ` Ingo Molnar
2007-03-06 21:46                                             ` Jeremy Fitzhardinge
2007-03-06 21:35                                     ` Nakajima, Jun
2007-03-07  0:44                                     ` Rusty Russell
2007-03-07  0:54                                       ` Anthony Liguori
2007-03-07  3:06                                       ` Zachary Amsden
2007-03-07  8:15                                       ` Ingo Molnar
2007-03-07  9:17                                         ` Zachary Amsden
2007-03-07 11:15                                           ` Thomas Gleixner
2007-03-07 19:14                                         ` Dan Hecht
2007-03-06 16:27                               ` Jeremy Fitzhardinge
2007-03-06 17:11                                 ` Ingo Molnar
2007-03-07  2:16                                   ` Zachary Amsden
2007-03-06  9:55                     ` Avi Kivity
2007-03-06 10:23                       ` Gerd Hoffmann
2007-03-06 10:31                         ` Ingo Molnar
2007-03-06 19:46                   ` Chris Wright
2007-03-06 20:30                     ` Ingo Molnar
2007-03-06 20:53                       ` Chris Wright
2007-03-06 21:03                         ` Ingo Molnar
2007-03-06 21:28                           ` Chris Wright
2007-03-07  2:35                             ` Zachary Amsden
2007-03-06  9:07               ` Jeremy Fitzhardinge
2007-03-06  9:26                 ` Ingo Molnar
2007-03-06 16:42                   ` Jeremy Fitzhardinge
2007-03-06 17:18                     ` Ingo Molnar
2007-03-06 18:04                       ` Jeremy Fitzhardinge
2007-03-06  7:35         ` [patch] paravirt: VDSO page is essential Ingo Molnar
2007-03-06  7:42           ` Zachary Amsden
2007-03-06  7:50             ` Ingo Molnar
2007-03-06 18:48             ` Jeremy Fitzhardinge
2007-03-05 14:27 ` Andi Kleen
2007-03-05 21:58   ` Roland McGrath
2007-03-05 22:01     ` Jeremy Fitzhardinge
2007-03-05 22:58       ` Roland McGrath
2007-03-05 23:03         ` Jeremy Fitzhardinge
2007-03-06  8:34         ` Ingo Molnar
2007-03-06  9:13           ` Roland McGrath
2007-03-06  9:14             ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).