LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC] Stable kvm userspace interface
@ 2007-01-09 13:37 Avi Kivity
  2007-01-09 13:47 ` Jeff Garzik
  2007-01-11  7:26 ` Arnd Bergmann
  0 siblings, 2 replies; 12+ messages in thread
From: Avi Kivity @ 2007-01-09 13:37 UTC (permalink / raw)
  To: kvm-devel; +Cc: linux-kernel, Andrew Morton

I had originally hoped to get this in for 2.6.20.  It now looks like .20 
will have a shorter cycle than usual, and the mmu took a bit longer than 
expected, so it's more realistic to aim for 2.6.21.

The current kvm userspace interface has several deficiencies:

- open("/dev/kvm") returns a different object (a new vm) per invocation; 
this is "unusual" by Linux standards
- all vcpus share the same inode and struct file, which can cause 
scalability problems on very large smps.  This isn't a problem for 
current hardware, which has moderate core counts and huge vmexit 
latencies, not to mention a limit of one vcpu per vm, but I'd like to 
future-proof the interface.
- the KVM_VCPU_RUN ioctl() copies a needless chuck of data back and forth
- the PIO handlers communicate by means of registers (for single I/O) or 
virtual addresses (for string I/O).  Instead the values should be 
explicit fields in some structure, and physical addresses should be used 
to remove the need to translate addresses in userspace.
- the interrupt code still needs work to properly support the local apic 
with Windows guests.
- userspace must rely on delivered signals, which are slow, and cannot 
use queued signals (a la pselect()/ppoll()).

I propose the following as the new, stable, kvm api:

// open a handle to the kvm interface.  does not create a vm.
int kvm_fd = open("/dev/kvm", O_RDWR);

// the kvm interface supports just three ioctls:
ioctl(kvm_fd, KVM_GET_API_VERSION, 0);
ioctl(kvm_fd, KVM_GET_MSR_LIST, &msr_list);
int vm_fd = ioctl(kvm_fd, KVM_CREATE_VM, 0);

// vm ioctls:
ioctl(vm_fd, KVM_VM_CREATE_MEMORY_REGION, &slot);
ioctl(vm_fd, KVM_VM_GET_DIRTY_LOG, &dirty_log);
int vcpu_fd = ioctl(vm_fd, KVM_CREATE_VCPU, vcpu_slot_number);

// each vcpu is a separate fd/inode.  this ensures no cacheline bouncing
// when the kernel refcounts the inodes on syscalls.

// kvm_vcpu_area contains the exit reasons and associated data, and
// results returned by userspace to resolve the exit reasons.
struct kvm_vcpu_area *vcpu_area = mmap(NULL, PAGE_SIZE, ..., vcpu_fd, 0);

struct kvm_vcpu_area {
    u32 vcpu_area_size;
    u32 exit_reason;

    sigset_t sigmask;  // for use during vcpu execution

    union {
	struct kvm_pio pio;
	struct kvm_mmio mmio;
	struct kvm_cpuid cpuid;
	// etc.
	char padding[...];
    };

    struct kvm_irq irq; // acks from vm; injection from userspace
};


// vcpu ioctls

ioctl(vcpu_fd, KVM_VCPU_RUN, 0); // all comms through mmap()ed  vcpu_area
ioctl(vcpu_fd, KVM_VCPU_GET_REGS, &regs);
ioctl(vcpu_fd, KVM_VCPU_SET_REGS, &regs);
ioctl(vcpu_fd, KVM_VCPU_GET_SREGS, &sregs);
ioctl(vcpu_fd, KVM_VCPU_SET_SREGS, &sregs);
ioctl(vcpu_fd, KVM_VCPU_GET_MSRS, &msrs);
ioctl(vcpu_fd, KVM_VCPU_SET_MSRS, &msrs);
ioctl(vcpu_fd, KVM_VCPU_DEBUG_GUEST, &debug);


/* for KVM_VM_CREATE_MEMORY_REGION */
struct kvm_memory_region {
	__u32 slot;
	__u32 flags;
	__u64 guest_phys_addr;
	__u64 memory_size; /* bytes */
};

/* for kvm_memory_region::flags */
#define KVM_MEM_LOG_DIRTY_PAGES  1UL


#define KVM_EXIT_TYPE_FAIL_ENTRY 1
#define KVM_EXIT_TYPE_VM_EXIT    2

enum kvm_exit_reason {
	KVM_EXIT_UNKNOWN          = 0,
	KVM_EXIT_EXCEPTION        = 1,
	KVM_EXIT_IO               = 2,
	KVM_EXIT_CPUID            = 3,
	KVM_EXIT_DEBUG            = 4,
	KVM_EXIT_HLT              = 5,
	KVM_EXIT_MMIO             = 6,
	KVM_EXIT_IRQ_WINDOW_OPEN  = 7,
	KVM_EXIT_HYPERCALL        = 8,
};


/* for KVM_GET_REGS and KVM_SET_REGS */
struct kvm_regs {
        // note: no vcpu!

	/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
	__u64 rax, rbx, rcx, rdx;
	__u64 rsi, rdi, rsp, rbp;
	__u64 r8,  r9,  r10, r11;
	__u64 r12, r13, r14, r15;
	__u64 rip, rflags;
};

struct kvm_segment {
	__u64 base;
	__u32 limit;
	__u16 selector;
	__u8  type;
	__u8  present, dpl, db, s, l, g, avl;
	__u8  unusable;
	__u8  padding;
};

struct kvm_dtable {
	__u64 base;
	__u16 limit;
	__u16 padding[3];
};

/* for KVM_VCPU_GET_SREGS and KVM_VCPU_SET_SREGS */
struct kvm_sregs {
	/* out (KVM_GET_SREGS) / in (KVM_SET_SREGS) */
	struct kvm_segment cs, ds, es, fs, gs, ss;
	struct kvm_segment tr, ldt;
	struct kvm_dtable gdt, idt;
	__u64 cr0, cr2, cr3, cr4, cr8;
};

struct kvm_msr_entry {
	__u32 index;
	__u32 reserved;
	__u64 data;
};

/* for KVM_VCPU_GET_MSRS and KVM_VCPU_SET_MSRS */
struct kvm_msrs {
	__u32 nmsrs; /* number of msrs in entries */
	__u32 padding;

	struct kvm_msr_entry entries[0];
};

/* for KVM_GET_MSR_INDEX_LIST */
struct kvm_msr_list {
	__u32 nmsrs; /* number of msrs in entries */
	__u32 indices[0];
};

struct kvm_breakpoint {
	__u32 enabled;
	__u32 padding;
	__u64 address;
};

/* for KVM_VCPU_DEBUG_GUEST */
struct kvm_debug_guest {
	__u32 enabled;
	__u32 singlestep;
	struct kvm_breakpoint breakpoints[4];
};

/* for KVM_VM_GET_DIRTY_LOG */
struct kvm_dirty_log {
	__u32 slot;
	__u32 padding;
	union {
		void __user *dirty_bitmap; /* one bit per page */
		__u64 padding;
	};
};


Comments and questions are welcome.


Thanks to Arnd Bergmann for his contributions and advice on this issue.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stable kvm userspace interface
  2007-01-09 13:37 [RFC] Stable kvm userspace interface Avi Kivity
@ 2007-01-09 13:47 ` Jeff Garzik
  2007-01-09 14:02   ` [kvm-devel] " James Morris
                     ` (2 more replies)
  2007-01-11  7:26 ` Arnd Bergmann
  1 sibling, 3 replies; 12+ messages in thread
From: Jeff Garzik @ 2007-01-09 13:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, linux-kernel, Andrew Morton

Can we please avoid adding a ton of new ioctls?  ioctls inevitably 
require 64-bit compat code for certain architectures, whereas 
sysfs/procfs does not.

	Jeff




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [kvm-devel] [RFC] Stable kvm userspace interface
  2007-01-09 13:47 ` Jeff Garzik
@ 2007-01-09 14:02   ` James Morris
  2007-01-09 14:11   ` Avi Kivity
  2007-01-11  7:34   ` [kvm-devel] " Arnd Bergmann
  2 siblings, 0 replies; 12+ messages in thread
From: James Morris @ 2007-01-09 14:02 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Avi Kivity, kvm-devel, Andrew Morton, linux-kernel

On Tue, 9 Jan 2007, Jeff Garzik wrote:

> Can we please avoid adding a ton of new ioctls?  ioctls inevitably 
> require 64-bit compat code for certain architectures, whereas 
> sysfs/procfs does not.

I guess ioctl is not as important now if the API is now always talking to 
one VM.


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stable kvm userspace interface
  2007-01-09 13:47 ` Jeff Garzik
  2007-01-09 14:02   ` [kvm-devel] " James Morris
@ 2007-01-09 14:11   ` Avi Kivity
  2007-01-11  7:34   ` [kvm-devel] " Arnd Bergmann
  2 siblings, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2007-01-09 14:11 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: kvm-devel, linux-kernel, Andrew Morton

Jeff Garzik wrote:
> Can we please avoid adding a ton of new ioctls?  ioctls inevitably 
> require 64-bit compat code for certain architectures, whereas 
> sysfs/procfs does not.
>

I don't see how the procfs or sysfs models fit kvm.  wrt compat code, 
the current kvm abi (also ioctl based) is 32/64 bit safe without compat 
code, and I certainly don't intend to break it.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [kvm-devel] [RFC] Stable kvm userspace interface
  2007-01-09 13:37 [RFC] Stable kvm userspace interface Avi Kivity
  2007-01-09 13:47 ` Jeff Garzik
@ 2007-01-11  7:26 ` Arnd Bergmann
  2007-01-11  8:02   ` Avi Kivity
  1 sibling, 1 reply; 12+ messages in thread
From: Arnd Bergmann @ 2007-01-11  7:26 UTC (permalink / raw)
  To: kvm-devel; +Cc: Avi Kivity, Andrew Morton, linux-kernel, Jeff Garzik

On Tuesday 09 January 2007 14:37, Avi Kivity wrote:
> struct kvm_vcpu_area {
>     u32 vcpu_area_size;
>     u32 exit_reason;
> 
>     sigset_t sigmask;  // for use during vcpu execution

Since Jeff brought up the point of 32 bit compatibility:
When this structure is shared between 64 bit kernel and
32 bit user space, you sigmask should be a __u64 in order
to guarantee compatibility.

	Arnd <><

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [kvm-devel] [RFC] Stable kvm userspace interface
  2007-01-09 13:47 ` Jeff Garzik
  2007-01-09 14:02   ` [kvm-devel] " James Morris
  2007-01-09 14:11   ` Avi Kivity
@ 2007-01-11  7:34   ` Arnd Bergmann
  2007-01-11  8:03     ` Avi Kivity
                       ` (2 more replies)
  2 siblings, 3 replies; 12+ messages in thread
From: Arnd Bergmann @ 2007-01-11  7:34 UTC (permalink / raw)
  To: kvm-devel; +Cc: Jeff Garzik, Avi Kivity, Andrew Morton, linux-kernel

On Tuesday 09 January 2007 14:47, Jeff Garzik wrote:
> Can we please avoid adding a ton of new ioctls?  ioctls inevitably 
> require 64-bit compat code for certain architectures, whereas 
> sysfs/procfs does not.

For performance reasons, an ascii string based interface is not
desireable here, some of these calls should be optimized to
the point of counting cycles.

Sysfs also does not fit the use case at all, and procfs only
makes sense if you really want to keep all information about the
guest as part of the process directory it belongs to.

I still think that in the long term, we should migrate to
new system calls and a special file system for kvm, which
might be non-mountable. Those will of course have the same
32 bit compat problems as the ioctl approach, but so far,
Avi has kept a good watch on avoiding these problems.

As long as we think the interface is likely to change (which it
certainly is right now), I believe that ioctl is the right
interface. We can think about retiring it when the interface has
stabilized enough to be converted to syscalls.

	Arnd <><

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [kvm-devel] [RFC] Stable kvm userspace interface
  2007-01-11  7:26 ` Arnd Bergmann
@ 2007-01-11  8:02   ` Avi Kivity
  0 siblings, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2007-01-11  8:02 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: kvm-devel, Andrew Morton, linux-kernel, Jeff Garzik

Arnd Bergmann wrote:
> On Tuesday 09 January 2007 14:37, Avi Kivity wrote:
>   
>> struct kvm_vcpu_area {
>>     u32 vcpu_area_size;
>>     u32 exit_reason;
>>
>>     sigset_t sigmask;  // for use during vcpu execution
>>     
>
> Since Jeff brought up the point of 32 bit compatibility:
> When this structure is shared between 64 bit kernel and
> 32 bit user space, you sigmask should be a __u64 in order
> to guarantee compatibility.
>   

Right.  Thanks.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [kvm-devel] [RFC] Stable kvm userspace interface
  2007-01-11  7:34   ` [kvm-devel] " Arnd Bergmann
@ 2007-01-11  8:03     ` Avi Kivity
  2007-01-11  8:26     ` Jeff Garzik
  2007-01-11 17:40     ` David Lang
  2 siblings, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2007-01-11  8:03 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: kvm-devel, Jeff Garzik, Andrew Morton, linux-kernel

Arnd Bergmann wrote:
> I still think that in the long term, we should migrate to
> new system calls and a special file system for kvm, which
> might be non-mountable. 

The inode-per-vm and inode-per-vcpu approach sort-of-implies a 
nonmountable special filesystem, so with the proposed change, we'll be 
halfway there.



-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [kvm-devel] [RFC] Stable kvm userspace interface
  2007-01-11  7:34   ` [kvm-devel] " Arnd Bergmann
  2007-01-11  8:03     ` Avi Kivity
@ 2007-01-11  8:26     ` Jeff Garzik
  2007-01-11  8:32       ` Avi Kivity
  2007-01-12 11:19       ` Pavel Machek
  2007-01-11 17:40     ` David Lang
  2 siblings, 2 replies; 12+ messages in thread
From: Jeff Garzik @ 2007-01-11  8:26 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: kvm-devel, Avi Kivity, Andrew Morton, linux-kernel

Arnd Bergmann wrote:
> On Tuesday 09 January 2007 14:47, Jeff Garzik wrote:
>> Can we please avoid adding a ton of new ioctls?  ioctls inevitably 
>> require 64-bit compat code for certain architectures, whereas 
>> sysfs/procfs does not.
> 
> For performance reasons, an ascii string based interface is not
> desireable here, some of these calls should be optimized to
> the point of counting cycles.

sysfs does not require ASCII...

	Jeff




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [kvm-devel] [RFC] Stable kvm userspace interface
  2007-01-11  8:26     ` Jeff Garzik
@ 2007-01-11  8:32       ` Avi Kivity
  2007-01-12 11:19       ` Pavel Machek
  1 sibling, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2007-01-11  8:32 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Arnd Bergmann, kvm-devel, Andrew Morton, linux-kernel

Jeff Garzik wrote:
> Arnd Bergmann wrote:
>> On Tuesday 09 January 2007 14:47, Jeff Garzik wrote:
>>> Can we please avoid adding a ton of new ioctls?  ioctls inevitably 
>>> require 64-bit compat code for certain architectures, whereas 
>>> sysfs/procfs does not.
>>
>> For performance reasons, an ascii string based interface is not
>> desireable here, some of these calls should be optimized to
>> the point of counting cycles.
>
> sysfs does not require ASCII...
>

The main kvm ioctl switches the execution mode to guest mode.  Just like 
a syscall enters kernel mode, ioctl(vcpu_fd, KVM_VCPU_RUN) enters the 
guest address space and begins executing guest code.

I don't see how to model that with sysfs.

There are other objections as well. sysfs is a public interface, whereas 
kvm is a process private attribute.  These objections don't apply to 
/proc though.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [kvm-devel] [RFC] Stable kvm userspace interface
  2007-01-11  7:34   ` [kvm-devel] " Arnd Bergmann
  2007-01-11  8:03     ` Avi Kivity
  2007-01-11  8:26     ` Jeff Garzik
@ 2007-01-11 17:40     ` David Lang
  2 siblings, 0 replies; 12+ messages in thread
From: David Lang @ 2007-01-11 17:40 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: kvm-devel, Jeff Garzik, Avi Kivity, Andrew Morton, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1742 bytes --]

On Thu, 11 Jan 2007, Arnd Bergmann wrote:

> On Tuesday 09 January 2007 14:47, Jeff Garzik wrote:
>> Can we please avoid adding a ton of new ioctls?  ioctls inevitably
>> require 64-bit compat code for certain architectures, whereas
>> sysfs/procfs does not.
>
> For performance reasons, an ascii string based interface is not
> desireable here, some of these calls should be optimized to
> the point of counting cycles.

why is this? most of the API that is being discussed is run once when the VM is 
being setup.

there may be some calls that are performance sensitive, but for things like 
seperating the page tables, the cost of doing the work will swamp any ASCII 
conversion costs.

David Lang

> Sysfs also does not fit the use case at all, and procfs only
> makes sense if you really want to keep all information about the
> guest as part of the process directory it belongs to.
>
> I still think that in the long term, we should migrate to
> new system calls and a special file system for kvm, which
> might be non-mountable. Those will of course have the same
> 32 bit compat problems as the ioctl approach, but so far,
> Avi has kept a good watch on avoiding these problems.
>
> As long as we think the interface is likely to change (which it
> certainly is right now), I believe that ioctl is the right
> interface. We can think about retiring it when the interface has
> stabilized enough to be converted to syscalls.
>
> 	Arnd <><
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [kvm-devel] [RFC] Stable kvm userspace interface
  2007-01-11  8:26     ` Jeff Garzik
  2007-01-11  8:32       ` Avi Kivity
@ 2007-01-12 11:19       ` Pavel Machek
  1 sibling, 0 replies; 12+ messages in thread
From: Pavel Machek @ 2007-01-12 11:19 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Arnd Bergmann, kvm-devel, Avi Kivity, Andrew Morton, linux-kernel


Hi!

> >>Can we please avoid adding a ton of new ioctls?  
> >>ioctls inevitably require 64-bit compat code for 
> >>certain architectures, whereas sysfs/procfs does not.
> >
> >For performance reasons, an ascii string based 
> >interface is not
> >desireable here, some of these calls should be 
> >optimized to
> >the point of counting cycles.
> 
> sysfs does not require ASCII...

Yep, but at that point you have 32 vs. 64bit nightmare back... and
stronger, because sysfs does not have compat handling.

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-01-12 11:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-09 13:37 [RFC] Stable kvm userspace interface Avi Kivity
2007-01-09 13:47 ` Jeff Garzik
2007-01-09 14:02   ` [kvm-devel] " James Morris
2007-01-09 14:11   ` Avi Kivity
2007-01-11  7:34   ` [kvm-devel] " Arnd Bergmann
2007-01-11  8:03     ` Avi Kivity
2007-01-11  8:26     ` Jeff Garzik
2007-01-11  8:32       ` Avi Kivity
2007-01-12 11:19       ` Pavel Machek
2007-01-11 17:40     ` David Lang
2007-01-11  7:26 ` Arnd Bergmann
2007-01-11  8:02   ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).