LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/3] Kvm clocksource, new spin
@ 2007-11-08 22:39 Glauber de Oliveira Costa
  2007-11-08 22:39 ` [PATCH 1/3] include files for kvmclock Glauber de Oliveira Costa
  0 siblings, 1 reply; 16+ messages in thread
From: Glauber de Oliveira Costa @ 2007-11-08 22:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: jeremy, avi, aliguori, kvm-devel, hollisb

Hi folks,

Here's a new spin of the clocksource implementation.
In this new version:
* followed avi's suggestion of:
  - letting the cpu itself register its memory area.
  - using a gfn instead of a phys addr as a parameter, to be sure we can 
    cover the whole memory area
  - write guest time at exits.

Also, I 'm not using an anonymous struct in the kvm_hv_clock union, so the
vcpu struct can grab just what it needs, and not the whole padding the guest needs

This is it.

Have fun



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] include files for kvmclock
  2007-11-08 22:39 [PATCH 0/3] Kvm clocksource, new spin Glauber de Oliveira Costa
@ 2007-11-08 22:39 ` Glauber de Oliveira Costa
  2007-11-08 22:39   ` [PATCH 2/3] kvmclock - the host part Glauber de Oliveira Costa
  2007-11-09  8:37   ` [PATCH 1/3] include files for kvmclock Gerd Hoffmann
  0 siblings, 2 replies; 16+ messages in thread
From: Glauber de Oliveira Costa @ 2007-11-08 22:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: jeremy, avi, aliguori, kvm-devel, hollisb, Glauber de Oliveira Costa

This patch introduces the include files for kvm clock.
They'll be needed for both guest and host part.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
 include/asm-x86/kvm_para.h |   25 +++++++++++++++++++++++++
 include/linux/kvm.h        |    1 +
 include/linux/kvm_para.h   |    2 ++
 3 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index c6f3fd8..0f6b813 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -10,15 +10,40 @@
  * paravirtualization, the appropriate feature bit should be checked.
  */
 #define KVM_CPUID_FEATURES	0x40000001
+#define KVM_FEATURE_CLOCKSOURCE 0
 
 #ifdef __KERNEL__
 #include <asm/processor.h>
+extern void kvmclock_init(void);
+
+/*
+ * Guest has page alignment and padding requirements. At the host, it will
+ * only lead to wasted space at the vcpu struct. For this reason, the struct
+ * is not anonymous
+ */
+union kvm_hv_clock {
+	struct kvm_hv_clock_s {
+		u64 tsc_mult;
+		u64 now_ns;
+		/* That's the wall clock, not the water closet */
+		u64 wc_sec;
+		u64 last_tsc;
+		/* At first, we could use the tsc value as a marker, but Jeremy
+		 * well noted that it will cause us locking problems in 32-bit
+		 * sys, so we have a special version field */
+		u32 version;
+	} fields;
+	char page_align[PAGE_SIZE];
+};
+
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
  * trap that we will then rewrite to the appropriate instruction.
  */
 #define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"
 
+#define KVM_HCALL_REGISTER_CLOCK	1
+
 /* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun
  * instruction.  The hypervisor may replace it with something else but only the
  * instructions are guaranteed to be supported.
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 71d33d6..9862241 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -359,6 +359,7 @@ struct kvm_signal_mask {
 #define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2
 #define KVM_CAP_USER_MEMORY 3
 #define KVM_CAP_SET_TSS_ADDR 4
+#define KVM_CAP_CLOCKSOURCE	  5
 
 /*
  * ioctls for VM fds
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index e4db25f..094efc7 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -11,6 +11,8 @@
 
 /* Return values for hypercalls */
 #define KVM_ENOSYS		1000
+#define KVM_EINVAL		1019
+#define KVM_ENODEV		1022
 
 #ifdef __KERNEL__
 /*
-- 
1.5.0.6


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] kvmclock - the host part.
  2007-11-08 22:39 ` [PATCH 1/3] include files for kvmclock Glauber de Oliveira Costa
@ 2007-11-08 22:39   ` Glauber de Oliveira Costa
  2007-11-08 22:39     ` [PATCH 3/3] kvmclock implementation, the guest part Glauber de Oliveira Costa
                       ` (2 more replies)
  2007-11-09  8:37   ` [PATCH 1/3] include files for kvmclock Gerd Hoffmann
  1 sibling, 3 replies; 16+ messages in thread
From: Glauber de Oliveira Costa @ 2007-11-08 22:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: jeremy, avi, aliguori, kvm-devel, hollisb, Glauber de Oliveira Costa

This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writting to it periodically.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
 drivers/kvm/kvm_main.c |    1 +
 drivers/kvm/x86.c      |   32 ++++++++++++++++++++++++++++++++
 drivers/kvm/x86.h      |    4 ++++
 3 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index d095002..c2c79b8 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1243,6 +1243,7 @@ static long kvm_dev_ioctl(struct file *filp,
 		case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
 		case KVM_CAP_USER_MEMORY:
 		case KVM_CAP_SET_TSS_ADDR:
+		case KVM_CAP_CLOCKSOURCE:
 			r = 1;
 			break;
 		default:
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index e905d46..ef31fed 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -19,6 +19,7 @@
 #include "segment_descriptor.h"
 #include "irq.h"
 
+#include <linux/clocksource.h>
 #include <linux/kvm.h>
 #include <linux/fs.h>
 #include <linux/vmalloc.h>
@@ -1628,6 +1629,28 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_halt);
 
+static void kvm_write_guest_time(struct kvm_vcpu *vcpu)
+{
+	struct timespec ts;
+	int r;
+
+	if (!vcpu->clock_gpa)
+		return;
+
+	/* Updates version to the next odd number, indicating we're writing */
+	vcpu->hv_clock.version++;
+	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, &vcpu->hv_clock, PAGE_SIZE);
+
+	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
+			  &vcpu->hv_clock.last_tsc);
+
+	ktime_get_ts(&ts);
+	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec);
+	vcpu->hv_clock.wc_sec = get_seconds();
+	vcpu->hv_clock.version++;
+	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, &vcpu->hv_clock, PAGE_SIZE);
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -1648,7 +1671,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		a3 &= 0xFFFFFFFF;
 	}
 
+	ret = 0;
 	switch (nr) {
+	case  KVM_HCALL_REGISTER_CLOCK:
+
+		vcpu->clock_gpa = a0 << PAGE_SHIFT;
+		vcpu->hv_clock.tsc_mult = clocksource_khz2mult(tsc_khz, 22);
+
+		break;
+
 	default:
 		ret = -KVM_ENOSYS;
 		break;
@@ -1924,6 +1955,7 @@ out:
 		goto preempted;
 	}
 
+	kvm_write_guest_time(vcpu);
 	post_kvm_run_save(vcpu, kvm_run);
 
 	return r;
diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h
index 663b822..fd77b66 100644
--- a/drivers/kvm/x86.h
+++ b/drivers/kvm/x86.h
@@ -83,6 +83,10 @@ struct kvm_vcpu {
 	/* emulate context */
 
 	struct x86_emulate_ctxt emulate_ctxt;
+
+	struct kvm_hv_clock_s hv_clock;
+	gpa_t clock_gpa; /* guest frame number, physical addr */
+
 };
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code);
-- 
1.5.0.6


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] kvmclock implementation, the guest part.
  2007-11-08 22:39   ` [PATCH 2/3] kvmclock - the host part Glauber de Oliveira Costa
@ 2007-11-08 22:39     ` Glauber de Oliveira Costa
  2007-11-11 10:17     ` [PATCH 2/3] kvmclock - the host part Avi Kivity
  2007-11-13  5:00     ` [kvm-devel] " Dong, Eddie
  2 siblings, 0 replies; 16+ messages in thread
From: Glauber de Oliveira Costa @ 2007-11-08 22:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: jeremy, avi, aliguori, kvm-devel, hollisb, Glauber de Oliveira Costa

This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.

We do use it, however, for fine-grained adjustment.

Other than that, time comes from the host.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
 arch/x86/Kconfig.i386       |   10 +++
 arch/x86/kernel/Makefile_32 |    1 +
 arch/x86/kernel/kvmclock.c  |  171 +++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup_32.c  |    5 +
 4 files changed, 187 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/kvmclock.c

diff --git a/arch/x86/Kconfig.i386 b/arch/x86/Kconfig.i386
index 7331efe..5fe4025 100644
--- a/arch/x86/Kconfig.i386
+++ b/arch/x86/Kconfig.i386
@@ -257,6 +257,16 @@ config VMI
 	  at the moment), by linking the kernel to a GPL-ed ROM module
 	  provided by the hypervisor.
 
+config KVM_CLOCK
+	bool "KVM paravirtualized clock"
+	select PARAVIRT
+	help
+	  Turning on this option will allow you to run a paravirtualized clock
+	  when running over the KVM hypervisor. Instead of relying on a PIT
+	  (or probably other) emulation by the underlying device model, the host
+	  provides the guest with timing infrastructure, as time of day, and
+	  timer expiration.
+
 source "arch/x86/lguest/Kconfig"
 
 endif
diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32
index b9d6798..df76d8c 100644
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -43,6 +43,7 @@ obj-$(CONFIG_K8_NB)		+= k8.o
 obj-$(CONFIG_MGEODE_LX)		+= geode_32.o mfgpt_32.o
 
 obj-$(CONFIG_VMI)		+= vmi_32.o vmiclock_32.o
+obj-$(CONFIG_KVM_CLOCK)		+= kvmclock.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt_32.o
 obj-y				+= pcspeaker.o
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
new file mode 100644
index 0000000..df14613
--- /dev/null
+++ b/arch/x86/kernel/kvmclock.c
@@ -0,0 +1,171 @@
+/*  KVM paravirtual clock driver. A clocksource implementation
+    Copyright (C) 2007 Glauber de Oliveira Costa, Red Hat Inc.
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+*/
+
+#include <linux/clocksource.h>
+#include <linux/clockchips.h>
+#include <linux/interrupt.h>
+#include <linux/kvm_para.h>
+#include <linux/ktime.h>
+#include <asm/arch_hooks.h>
+#include <asm/i8253.h>
+
+#include <mach_ipi.h>
+#include <irq_vectors.h>
+
+#define KVM_SCALE 22
+
+#define get_clock(cpu, field) hv_clock[cpu].fields.field
+
+static int kvmclock = 1;
+
+static int parse_no_kvmclock(char *arg)
+{
+	kvmclock = 0;
+	return 0;
+}
+early_param("no-kvmclock", parse_no_kvmclock);
+
+/* The hypervisor will put information about time periodically here */
+union kvm_hv_clock hv_clock[NR_CPUS] __attribute__((__aligned__(PAGE_SIZE)));
+
+static inline u64 kvm_get_delta(u64 last_tsc)
+{
+	int cpu = smp_processor_id();
+	u64 delta = native_read_tsc() - last_tsc;
+	return (delta * get_clock(cpu, tsc_mult)) >> KVM_SCALE;
+}
+
+/*
+ * The wallclock is the time of day when we booted. Since then, some time may
+ * have elapsed since the hypervisor wrote the data. So we try to account for
+ * that. Even if the tsc is not accurate, it gives us a more accurate timing
+ * than not adjusting at all
+ */
+unsigned long kvm_get_wallclock(void)
+{
+	u64 wc_sec, delta, last_tsc;
+	struct timespec ts;
+	int version, nsec, cpu = smp_processor_id();
+
+	do {
+		version = get_clock(cpu, version);
+		rmb();
+		last_tsc = get_clock(cpu, last_tsc);
+		rmb();
+		wc_sec = get_clock(cpu, wc_sec);
+		rmb();
+	} while ((get_clock(cpu, version) != version) && !(version & 1));
+
+	delta = kvm_get_delta(last_tsc);
+	nsec = do_div(delta, NSEC_PER_SEC);
+	set_normalized_timespec(&ts, wc_sec + delta, nsec);
+
+	/*
+	 * Of all mechanisms of time adjustment I've tested, this one
+	 * was the champion!
+	 */
+	return ts.tv_sec + 1;
+}
+
+int kvm_set_wallclock(unsigned long now)
+{
+	return 0;
+}
+
+/*
+ * This is our read_clock function. The host puts an tsc timestamp each time
+ * it updates a new time, and then we can use it to derive a slightly more
+ * precise notion of elapsed time, converted to nanoseconds.
+ */
+static cycle_t kvm_clock_read(void)
+{
+
+	u64 last_tsc, now;
+	u32 version;
+	int cpu = smp_processor_id();
+
+	do {
+		version = get_clock(cpu, version);
+		rmb();
+		last_tsc = get_clock(cpu, last_tsc);
+		rmb();
+		now = get_clock(cpu, now_ns);
+		rmb();
+	} while ((get_clock(cpu, version) != version) && !(version & 1));
+
+	return now + kvm_get_delta(last_tsc);
+}
+
+static struct clocksource kvm_clock = {
+	.name = "kvm-clock",
+	.read = kvm_clock_read,
+	.rating = 400,
+	.mask = CLOCKSOURCE_MASK(64),
+	.mult = 1 << KVM_SCALE,
+	.shift = KVM_SCALE,
+	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+unsigned long long kvm_sched_clock(void)
+{
+	return kvm_clock_read();
+}
+
+static int kvm_register_clock(unsigned int cpu)
+{
+	unsigned long kvm_clock_info = __pa((unsigned long)&hv_clock[cpu]);
+	kvm_clock_info >>= PAGE_SHIFT; /* page frame number */
+	return kvm_hypercall1(KVM_HCALL_REGISTER_CLOCK, kvm_clock_info);
+}
+
+void kvm_setup_secondary_clock(void)
+{
+	/*
+	 * Now that the first cpu already had this clocksource initialized,
+	 * we shouldn't fail.
+	 */
+	int cpu = smp_processor_id();
+	WARN_ON(kvm_register_clock(cpu));
+	/* ok, done with our trickery, call native */
+	setup_secondary_APIC_clock();
+}
+
+void __init kvmclock_init(void)
+{
+	int cpu = smp_processor_id();
+	int r;
+
+	/*
+	 * If we can't use the paravirt clock, just go with
+	 * the usual timekeeping
+	 */
+	if (!kvm_para_available())
+		return;
+
+	r = kvm_register_clock(cpu);
+	if (r)
+		return;
+
+	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
+		pv_time_ops.get_wallclock = kvm_get_wallclock;
+		pv_time_ops.set_wallclock = kvm_set_wallclock;
+		pv_time_ops.sched_clock = kvm_sched_clock;
+		pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock;
+		clocksource_register(&kvm_clock);
+	}
+}
diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c
index e1e18c3..9f13ff6 100644
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -44,6 +44,7 @@
 #include <linux/crash_dump.h>
 #include <linux/dmi.h>
 #include <linux/pfn.h>
+#include <linux/kvm_para.h>
 
 #include <video/edid.h>
 
@@ -614,6 +615,10 @@ void __init setup_arch(char **cmdline_p)
 
 	max_low_pfn = setup_memory();
 
+#ifdef CONFIG_KVM_CLOCK
+	kvmclock_init();
+#endif
+
 #ifdef CONFIG_VMI
 	/*
 	 * Must be after max_low_pfn is determined, and before kernel
-- 
1.5.0.6


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] include files for kvmclock
  2007-11-08 22:39 ` [PATCH 1/3] include files for kvmclock Glauber de Oliveira Costa
  2007-11-08 22:39   ` [PATCH 2/3] kvmclock - the host part Glauber de Oliveira Costa
@ 2007-11-09  8:37   ` Gerd Hoffmann
  2007-11-11  9:15     ` Avi Kivity
  1 sibling, 1 reply; 16+ messages in thread
From: Gerd Hoffmann @ 2007-11-09  8:37 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: linux-kernel, jeremy, avi, aliguori, kvm-devel, hollisb

> +/*
> + * Guest has page alignment and padding requirements. At the host, it will
> + * only lead to wasted space at the vcpu struct. For this reason, the struct
> + * is not anonymous
> + */
> +union kvm_hv_clock {
> +	struct kvm_hv_clock_s {
> +		u64 tsc_mult;
> +		u64 now_ns;
> +		/* That's the wall clock, not the water closet */
> +		u64 wc_sec;
> +		u64 last_tsc;
> +		/* At first, we could use the tsc value as a marker, but Jeremy
> +		 * well noted that it will cause us locking problems in 32-bit
> +		 * sys, so we have a special version field */
> +		u32 version;
> +	} fields;
> +	char page_align[PAGE_SIZE];
> +};

What is the point in using a whole page per vcpu?  You probably don't
want struct kvm_hv_clock_s cross a page border.  Is that the only reason
or are there other constrains too?

As the kvm clock looks quite simliar to what xen does, how about making
the structs binary-compatible or simply reuse the xen version (struct
vcpu_time_info in xen/interface/xen.h)?

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] include files for kvmclock
  2007-11-09  8:37   ` [PATCH 1/3] include files for kvmclock Gerd Hoffmann
@ 2007-11-11  9:15     ` Avi Kivity
  0 siblings, 0 replies; 16+ messages in thread
From: Avi Kivity @ 2007-11-11  9:15 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Glauber de Oliveira Costa, linux-kernel, jeremy, aliguori,
	kvm-devel, hollisb

Gerd Hoffmann wrote:
>> +/*
>> + * Guest has page alignment and padding requirements. At the host, it will
>> + * only lead to wasted space at the vcpu struct. For this reason, the struct
>> + * is not anonymous
>> + */
>> +union kvm_hv_clock {
>> +	struct kvm_hv_clock_s {
>> +		u64 tsc_mult;
>> +		u64 now_ns;
>> +		/* That's the wall clock, not the water closet */
>> +		u64 wc_sec;
>> +		u64 last_tsc;
>> +		/* At first, we could use the tsc value as a marker, but Jeremy
>> +		 * well noted that it will cause us locking problems in 32-bit
>> +		 * sys, so we have a special version field */
>> +		u32 version;
>> +	} fields;
>> +	char page_align[PAGE_SIZE];
>> +};
>>     
>
> What is the point in using a whole page per vcpu?  You probably don't
> want struct kvm_hv_clock_s cross a page border.  Is that the only reason
> or are there other constrains too?
>   

We don't even have the cross-page-boundary constraint.

The real issue is that on i386 physical addresses are 64-bit while 
hypercall arguments are 32-bit.  A page frame number is 32-bit and so 
poses no issues.

I see two workarounds for this:
- make the first hypercall argument a u64 rather than a long (using two 
registers on i386)
- use an msr for registering the clock address

The first is more general, while the second has the nice property of 
taking care of live migration automatically.

> As the kvm clock looks quite simliar to what xen does, how about making
> the structs binary-compatible or simply reuse the xen version (struct
> vcpu_time_info in xen/interface/xen.h)?
>   

If there are no technical issues, I have no objections.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
  2007-11-08 22:39   ` [PATCH 2/3] kvmclock - the host part Glauber de Oliveira Costa
  2007-11-08 22:39     ` [PATCH 3/3] kvmclock implementation, the guest part Glauber de Oliveira Costa
@ 2007-11-11 10:17     ` Avi Kivity
  2007-11-13 11:28       ` Glauber de Oliveira Costa
  2007-11-13  5:00     ` [kvm-devel] " Dong, Eddie
  2 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2007-11-11 10:17 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: linux-kernel, jeremy, aliguori, kvm-devel, hollisb

Glauber de Oliveira Costa wrote:
> This is the host part of kvm clocksource implementation. As it does
> not include clockevents, it is a fairly simple implementation. We
> only have to register a per-vcpu area, and start writting to it periodically.
>
>   

Missing live migration support  (a way for userspace to read and write 
the guest clock address).  Should probably be in a separate patch.

> @@ -1924,6 +1955,7 @@ out:
>  		goto preempted;
>  	}
>  
> +	kvm_write_guest_time(vcpu);
>  	post_kvm_run_save(vcpu, kvm_run);
>   

Why here?  Seems like we're leaving the guest for a while at this place.

Suggest putting it on top of __vcpu_run(), guarded by a flag, and 
setting the flag every time we put the vcpu.

 



-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [kvm-devel] [PATCH 2/3] kvmclock - the host part.
  2007-11-08 22:39   ` [PATCH 2/3] kvmclock - the host part Glauber de Oliveira Costa
  2007-11-08 22:39     ` [PATCH 3/3] kvmclock implementation, the guest part Glauber de Oliveira Costa
  2007-11-11 10:17     ` [PATCH 2/3] kvmclock - the host part Avi Kivity
@ 2007-11-13  5:00     ` Dong, Eddie
  2007-11-13 11:54       ` Glauber de Oliveira Costa
  2 siblings, 1 reply; 16+ messages in thread
From: Dong, Eddie @ 2007-11-13  5:00 UTC (permalink / raw)
  To: Glauber de Oliveira Costa, linux-kernel; +Cc: jeremy, hollisb, kvm-devel, avi


> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
> +	struct timespec ts;
> +	int r;
> +
> +	if (!vcpu->clock_gpa)
> +		return;
> +
> +	/* Updates version to the next odd number, indicating
> we're writing */
> +	vcpu->hv_clock.version++;
> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
> &vcpu->hv_clock, PAGE_SIZE);
> +
> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
> +			  &vcpu->hv_clock.last_tsc);
> +
> +	ktime_get_ts(&ts);

Do we need to disable preemption here?

> +	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC *
> (u64)ts.tv_sec); +	vcpu->hv_clock.wc_sec = get_seconds();
> +	vcpu->hv_clock.version++;
> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
> &vcpu->hv_clock, PAGE_SIZE);
> +}
> +
thx,eddie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
  2007-11-11 10:17     ` [PATCH 2/3] kvmclock - the host part Avi Kivity
@ 2007-11-13 11:28       ` Glauber de Oliveira Costa
  2007-11-13 14:44         ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Glauber de Oliveira Costa @ 2007-11-13 11:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linux-kernel, jeremy, aliguori, kvm-devel, hollisb

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Avi Kivity escreveu:
> Glauber de Oliveira Costa wrote:
>> This is the host part of kvm clocksource implementation. As it does
>> not include clockevents, it is a fairly simple implementation. We
>> only have to register a per-vcpu area, and start writting to it
>> periodically.
>>
>>   
> 
> Missing live migration support  (a way for userspace to read and write
> the guest clock address).  Should probably be in a separate patch.

I think it's a matter of issuing a hypercall for reading the clock
address. It's fair simple, and can be done in a later version of this patch.
As for writting, the register hypercall itself can be used. It has no
special side-effects we should care about.

>> @@ -1924,6 +1955,7 @@ out:
>>          goto preempted;
>>      }
>>  
>> +    kvm_write_guest_time(vcpu);
>>      post_kvm_run_save(vcpu, kvm_run);
>>   
> 
> Why here?  Seems like we're leaving the guest for a while at this place.
> 
> Suggest putting it on top of __vcpu_run(), guarded by a flag, and
> setting the flag every time we put the vcpu.

No special preference. It just sounded exity enough to me. I can move to
where you suggest.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHOYpojYI8LaFUWXMRApf8AJ4jQ/ZTBlub1IwFkJrYZyart+f7bwCfT+9m
l1Rblsmw96ZatCf60g2dNYY=
=DBpn
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.
  2007-11-13  5:00     ` [kvm-devel] " Dong, Eddie
@ 2007-11-13 11:54       ` Glauber de Oliveira Costa
  2007-11-13 12:08         ` Izik Eidus
  2007-11-13 14:47         ` Avi Kivity
  0 siblings, 2 replies; 16+ messages in thread
From: Glauber de Oliveira Costa @ 2007-11-13 11:54 UTC (permalink / raw)
  To: Dong, Eddie; +Cc: linux-kernel, jeremy, hollisb, kvm-devel, avi

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dong, Eddie escreveu:
>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>> +	struct timespec ts;
>> +	int r;
>> +
>> +	if (!vcpu->clock_gpa)
>> +		return;
>> +
>> +	/* Updates version to the next odd number, indicating
>> we're writing */
>> +	vcpu->hv_clock.version++;
>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>> &vcpu->hv_clock, PAGE_SIZE);
>> +
>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>> +			  &vcpu->hv_clock.last_tsc);
>> +
>> +	ktime_get_ts(&ts);
> 
> Do we need to disable preemption here?
After thinking for a little while, you are theoretically right.
In the current state, we could even be preempted between all operations
;-) Maybe after avi's suggestion of moving the call to it it will end up
in a preempt safe region, but anyway, it's safer to add the preempt
markers here.
I'll put it in next version, thanks

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHOZBrjYI8LaFUWXMRAo81AKCfbkzhLl7F6BUjzUHVyErCFeHxFACg1teB
eqsOnJiAqB3JiYf+2YdMZ4o=
=ENKU
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.
  2007-11-13 11:54       ` Glauber de Oliveira Costa
@ 2007-11-13 12:08         ` Izik Eidus
  2007-11-13 14:47         ` Avi Kivity
  1 sibling, 0 replies; 16+ messages in thread
From: Izik Eidus @ 2007-11-13 12:08 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: Dong, Eddie, kvm-devel, jeremy, linux-kernel, hollisb, avi

Glauber de Oliveira Costa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dong, Eddie escreveu:
>   
>>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>>> +	struct timespec ts;
>>> +	int r;
>>> +
>>> +	if (!vcpu->clock_gpa)
>>> +		return;
>>> +
>>> +	/* Updates version to the next odd number, indicating
>>> we're writing */
>>> +	vcpu->hv_clock.version++;
>>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>>> &vcpu->hv_clock, PAGE_SIZE);
>>> +
>>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>>> +			  &vcpu->hv_clock.last_tsc);
>>> +
>>> +	ktime_get_ts(&ts);
>>>       
>> Do we need to disable preemption here?
>>     
> After thinking for a little while, you are theoretically right.
> In the current state, we could even be preempted between all operations
> ;-) Maybe after avi's suggestion of moving the call to it it will end up
> in a preempt safe region, but anyway, it's safer to add the preempt
> markers here.
> I'll put it in next version, thanks
>   
note that you cant call to kvm_write_guest with preemption disabled 
(witch you do few lines below)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
  2007-11-13 11:28       ` Glauber de Oliveira Costa
@ 2007-11-13 14:44         ` Avi Kivity
  0 siblings, 0 replies; 16+ messages in thread
From: Avi Kivity @ 2007-11-13 14:44 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: linux-kernel, jeremy, aliguori, kvm-devel, hollisb

Glauber de Oliveira Costa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Avi Kivity escreveu:
>   
>> Glauber de Oliveira Costa wrote:
>>     
>>> This is the host part of kvm clocksource implementation. As it does
>>> not include clockevents, it is a fairly simple implementation. We
>>> only have to register a per-vcpu area, and start writting to it
>>> periodically.
>>>
>>>   
>>>       
>> Missing live migration support  (a way for userspace to read and write
>> the guest clock address).  Should probably be in a separate patch.
>>     
>
> I think it's a matter of issuing a hypercall for reading the clock
> address. It's fair simple, and can be done in a later version of this patch.
> As for writting, the register hypercall itself can be used. It has no
> special side-effects we should care about.
>   

kvm live migration is done (at least thus far) without guest 
involvement.  So you need the host to be able to transfer this state.


>   
>>> @@ -1924,6 +1955,7 @@ out:
>>>          goto preempted;
>>>      }
>>>  
>>> +    kvm_write_guest_time(vcpu);
>>>      post_kvm_run_save(vcpu, kvm_run);
>>>   
>>>       
>> Why here?  Seems like we're leaving the guest for a while at this place.
>>
>> Suggest putting it on top of __vcpu_run(), guarded by a flag, and
>> setting the flag every time we put the vcpu.
>>     
>
> No special preference. It just sounded exity enough to me. I can move to
> where you suggest.
>
>   

It's more than a place, it's a set of rules:

- if the vcpu is migrated, we need a new timebase
- ditto if we're descheduled (well that's the same thing)
- if "some time passes"


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.
  2007-11-13 11:54       ` Glauber de Oliveira Costa
  2007-11-13 12:08         ` Izik Eidus
@ 2007-11-13 14:47         ` Avi Kivity
  2007-11-13 15:23           ` Dong, Eddie
  1 sibling, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2007-11-13 14:47 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: Dong, Eddie, linux-kernel, jeremy, hollisb, kvm-devel

Glauber de Oliveira Costa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dong, Eddie escreveu:
>   
>>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>>> +	struct timespec ts;
>>> +	int r;
>>> +
>>> +	if (!vcpu->clock_gpa)
>>> +		return;
>>> +
>>> +	/* Updates version to the next odd number, indicating
>>> we're writing */
>>> +	vcpu->hv_clock.version++;
>>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>>> &vcpu->hv_clock, PAGE_SIZE);
>>> +
>>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>>> +			  &vcpu->hv_clock.last_tsc);
>>> +
>>> +	ktime_get_ts(&ts);
>>>       
>> Do we need to disable preemption here?
>>     
> After thinking for a little while, you are theoretically right.
> In the current state, we could even be preempted between all operations
> ;-) Maybe after avi's suggestion of moving the call to it it will end up
> in a preempt safe region, but anyway, it's safer to add the preempt
> markers here.
> I'll put it in next version, thanks
>
>   

Well, you can't kvm_write_guest() with preemption enabled.

preempt notifiers to the rescue!  We have a callout during preemption, 
so you can just zero out a flag there, and when we're scheduled again 
retry the whole thing.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [kvm-devel] [PATCH 2/3] kvmclock - the host part.
  2007-11-13 14:47         ` Avi Kivity
@ 2007-11-13 15:23           ` Dong, Eddie
  2007-11-13 16:12             ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Dong, Eddie @ 2007-11-13 15:23 UTC (permalink / raw)
  To: Avi Kivity, Glauber de Oliveira Costa
  Cc: linux-kernel, jeremy, hollisb, kvm-devel

Avi Kivity wrote:
> Glauber de Oliveira Costa wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> Dong, Eddie escreveu:
>> 
>>>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>>>> +	struct timespec ts; +	int r;
>>>> +
>>>> +	if (!vcpu->clock_gpa)
>>>> +		return;
>>>> +
>>>> +	/* Updates version to the next odd number, indicating we're
>>>> writing */ +	vcpu->hv_clock.version++;
>>>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>>>> &vcpu->hv_clock, PAGE_SIZE);
>>>> +
>>>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>>>> +			  &vcpu->hv_clock.last_tsc);
>>>> +
>>>> +	ktime_get_ts(&ts);
>>>> 
>>> Do we need to disable preemption here?
>>> 
>> After thinking for a little while, you are theoretically right.
>> In the current state, we could even be preempted between all
>> operations ;-) Maybe after avi's suggestion of moving the call to it
>> it will end up in a preempt safe region, but anyway, it's safer to
>> add the preempt markers here. I'll put it in next version, thanks
>> 
>> 
> 
> Well, you can't kvm_write_guest() with preemption enabled.
> 
> preempt notifiers to the rescue!  We have a callout during preemption,
> so you can just zero out a flag there, and when we're scheduled again
> retry the whole thing. 
> 

The preemption issue is within following code which need to be done in a
short enough period.

+	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
+			  &vcpu->hv_clock.last_tsc);
+
+	ktime_get_ts(&ts);
+	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC *
(u64)ts.tv_sec);
+	vcpu->hv_clock.wc_sec = get_seconds();

I am even thinking we have to disable interrupt between these lines,
otherwise
guest wall clock may see backward time source when calculating the
delta TSC since last vcpu->hv_clock.now_ns update.

Also why we use both ktime_get_ts(&ts) & get_seconds() together? they
are not atomic
and may cause issues?

thx,eddie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.
  2007-11-13 15:23           ` Dong, Eddie
@ 2007-11-13 16:12             ` Avi Kivity
  2007-11-14  0:41               ` Dong, Eddie
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2007-11-13 16:12 UTC (permalink / raw)
  To: Dong, Eddie
  Cc: Glauber de Oliveira Costa, linux-kernel, jeremy, hollisb, kvm-devel

Dong, Eddie wrote:
>>>
>>> After thinking for a little while, you are theoretically right.
>>> In the current state, we could even be preempted between all
>>> operations ;-) Maybe after avi's suggestion of moving the call to it
>>> it will end up in a preempt safe region, but anyway, it's safer to
>>> add the preempt markers here. I'll put it in next version, thanks
>>>
>>>
>>>       
>> Well, you can't kvm_write_guest() with preemption enabled.
>>
>> preempt notifiers to the rescue!  We have a callout during preemption,
>> so you can just zero out a flag there, and when we're scheduled again
>> retry the whole thing. 
>>
>>     
>
> The preemption issue is within following code which need to be done in a
> short enough period.
>
> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
> +			  &vcpu->hv_clock.last_tsc);
> +
> +	ktime_get_ts(&ts);
> +	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC *
> (u64)ts.tv_sec);
> +	vcpu->hv_clock.wc_sec = get_seconds();
>
> I am even thinking we have to disable interrupt between these lines,
> otherwise
> guest wall clock may see backward time source when calculating the
> delta TSC since last vcpu->hv_clock.now_ns update.
>   

That's true.  While we do need to handle vcpu migration and 
descheduling, the code sequence you note needs to be as atomic as possible.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [kvm-devel] [PATCH 2/3] kvmclock - the host part.
  2007-11-13 16:12             ` Avi Kivity
@ 2007-11-14  0:41               ` Dong, Eddie
  0 siblings, 0 replies; 16+ messages in thread
From: Dong, Eddie @ 2007-11-14  0:41 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Glauber de Oliveira Costa, linux-kernel, jeremy, hollisb, kvm-devel

>> 
>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>> +			  &vcpu->hv_clock.last_tsc);
>> +
>> +	ktime_get_ts(&ts);
>> +	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC *
>> (u64)ts.tv_sec); +	vcpu->hv_clock.wc_sec = get_seconds();
>> 
>> I am even thinking we have to disable interrupt between these lines,
>> otherwise guest wall clock may see backward time source when
>> calculating the delta TSC since last vcpu->hv_clock.now_ns update.
>> 
> 
> That's true.  While we do need to handle vcpu migration and
> descheduling, the code sequence you note needs to be as atomic as
> possible. 
> 

Yes VCPU migration is another issue. Since the physical platform TSC is 
totally different across migration, so we can't expose host TSC to
guest.
For KVM, it should be simple to just use guest_TSC (=HOST_TSC + offset).

thanks, eddie

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-11-14  0:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-08 22:39 [PATCH 0/3] Kvm clocksource, new spin Glauber de Oliveira Costa
2007-11-08 22:39 ` [PATCH 1/3] include files for kvmclock Glauber de Oliveira Costa
2007-11-08 22:39   ` [PATCH 2/3] kvmclock - the host part Glauber de Oliveira Costa
2007-11-08 22:39     ` [PATCH 3/3] kvmclock implementation, the guest part Glauber de Oliveira Costa
2007-11-11 10:17     ` [PATCH 2/3] kvmclock - the host part Avi Kivity
2007-11-13 11:28       ` Glauber de Oliveira Costa
2007-11-13 14:44         ` Avi Kivity
2007-11-13  5:00     ` [kvm-devel] " Dong, Eddie
2007-11-13 11:54       ` Glauber de Oliveira Costa
2007-11-13 12:08         ` Izik Eidus
2007-11-13 14:47         ` Avi Kivity
2007-11-13 15:23           ` Dong, Eddie
2007-11-13 16:12             ` Avi Kivity
2007-11-14  0:41               ` Dong, Eddie
2007-11-09  8:37   ` [PATCH 1/3] include files for kvmclock Gerd Hoffmann
2007-11-11  9:15     ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).