LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode)
@ 2015-02-12 18:41 Radim Krčmář
  2015-02-12 18:41 ` [PATCH v2 1/4] KVM: x86: use MDA for interrupt matching Radim Krčmář
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Radim Krčmář @ 2015-02-12 18:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Nadav Amit, Gleb Natapov

Each patch has a diff from v1, here is only a prologue on the mythical
mixed xAPIC and x2APIC mode:

There is one interesting alias in xAPIC and x2APIC ICR destination, the
0xff000000, which is a broadcast in xAPIC and either a physical message
to high x2APIC ID or a message to an empty set in x2APIC cluster.

This corner case in mixed mode can be triggered by a weird message
 1) when we have no x2APIC ID 0xff000000, but send x2APIC message there

or after something that isn't forbidden in SDM most likely because they
didn't think that anyone would ever consider it
 2) we have x2APIC ID 0xff000000 and reset other x2APICs into xAPIC mode

Current KVM doesn't need to consider (2), so there only is a slim chance
that some hobbyist OS pushes the specification to its limits.

The problem is that SDM for xAPIC doesn't believably specify[1] if all
messages beginning with 0xff are considered as broadcasts, 10.6.2.1:
  A broadcast IPI (bits 28-31 of the MDA are 1's)

No volunteer came to clear this up, so I hacked Linux to have one x2APIC
core between xAPIC ones.  Physical IPI to 0xff000000 got delivered only
to CPU0, like most other destinations, Logical IPI to 0xff000000 got
dropped and only 0xffffffff worked as a broadcast in both modes.

I think it is because Intel never considered mixed mode to be valid, and
seen delivery might be an emergent feature of QPI routing.
Luckily, broadcast from xAPIC isn't delivered to x2APIC.

Real exploration would require greater modifications to Linux (ideally
writing a custom kernel), so this series implements something that makes
some sense and isn't too far from reality.


Radim Krčmář (4):
  KVM: x86: use MDA for interrupt matching
  KVM: x86: fix mixed APIC mode broadcast
  KVM: x86: avoid logical_map when it is invalid
  KVM: x86: simplify kvm_apic_map

 arch/x86/include/asm/kvm_host.h |   8 ++-
 arch/x86/kvm/lapic.c            | 138 +++++++++++++++++++++++-----------------
 arch/x86/kvm/lapic.h            |  15 -----
 3 files changed, 85 insertions(+), 76 deletions(-)

-- 
2.3.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] KVM: x86: use MDA for interrupt matching
  2015-02-12 18:41 [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Radim Krčmář
@ 2015-02-12 18:41 ` Radim Krčmář
  2015-02-12 18:41 ` [PATCH v2 2/4] KVM: x86: fix mixed APIC mode broadcast Radim Krčmář
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2015-02-12 18:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Nadav Amit, Gleb Natapov

In mixed modes, we musn't deliver xAPIC IPIs like x2APIC and vice versa.
Instead of preserving the information in apic_send_ipi(), we regain it
by converting all destinations into correct MDA in the slow path.
This allows easier reasoning about subsequent matching.

Our kvm_apic_broadcast() had an interesting design decision: it didn't
consider IOxAPIC 0xff as broadcast in x2APIC mode ...
everything worked because IOxAPIC can't set that in physical mode and
logical mode considered it as a message for first 8 VCPUs.
This patch interprets IOxAPIC 0xff as x2APIC broadcast.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2
 - accept two 'struct lapic *' in kvm_apic_mda() for nicer code [Paolo]
 - removed XXX comment about checking if x2APIC broadcast is accepted
   (it is, and the code also accepts any x2APIC message starting with
    0xff, like v1, for simplicity.)


 arch/x86/kvm/lapic.c | 40 +++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 86609c15726f..4812bde5090c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -585,15 +585,23 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
 	apic_update_ppr(apic);
 }
 
-static bool kvm_apic_broadcast(struct kvm_lapic *apic, u32 dest)
+static bool kvm_apic_broadcast(struct kvm_lapic *apic, u32 mda)
 {
-	return dest == (apic_x2apic_mode(apic) ?
-			X2APIC_BROADCAST : APIC_BROADCAST);
+	if (apic_x2apic_mode(apic))
+		return mda == X2APIC_BROADCAST;
+
+	return GET_APIC_DEST_FIELD(mda) == APIC_BROADCAST;
 }
 
-static bool kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 dest)
+static bool kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 mda)
 {
-	return kvm_apic_id(apic) == dest || kvm_apic_broadcast(apic, dest);
+	if (kvm_apic_broadcast(apic, mda))
+		return true;
+
+	if (apic_x2apic_mode(apic))
+		return mda == kvm_apic_id(apic);
+
+	return mda == SET_APIC_DEST_FIELD(kvm_apic_id(apic));
 }
 
 static bool kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
@@ -610,6 +618,7 @@ static bool kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
 		       && (logical_id & mda & 0xffff) != 0;
 
 	logical_id = GET_APIC_LOGICAL_ID(logical_id);
+	mda = GET_APIC_DEST_FIELD(mda);
 
 	switch (kvm_apic_get_reg(apic, APIC_DFR)) {
 	case APIC_DFR_FLAT:
@@ -624,10 +633,27 @@ static bool kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
 	}
 }
 
+/* KVM APIC implementation has two quirks
+ *  - dest always begins at 0 while xAPIC MDA has offset 24,
+ *  - IOxAPIC messages have to be delivered (directly) to x2APIC.
+ */
+static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source,
+                                              struct kvm_lapic *target)
+{
+	bool ipi = source != NULL;
+	bool x2apic_mda = apic_x2apic_mode(ipi ? source : target);
+
+	if (!ipi && dest_id == APIC_BROADCAST && x2apic_mda)
+		return X2APIC_BROADCAST;
+
+	return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id);
+}
+
 bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
 			   int short_hand, unsigned int dest, int dest_mode)
 {
 	struct kvm_lapic *target = vcpu->arch.apic;
+	u32 mda = kvm_apic_mda(dest, source, target);
 
 	apic_debug("target %p, source %p, dest 0x%x, "
 		   "dest_mode 0x%x, short_hand 0x%x\n",
@@ -637,9 +663,9 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
 	switch (short_hand) {
 	case APIC_DEST_NOSHORT:
 		if (dest_mode == APIC_DEST_PHYSICAL)
-			return kvm_apic_match_physical_addr(target, dest);
+			return kvm_apic_match_physical_addr(target, mda);
 		else
-			return kvm_apic_match_logical_addr(target, dest);
+			return kvm_apic_match_logical_addr(target, mda);
 	case APIC_DEST_SELF:
 		return target == source;
 	case APIC_DEST_ALLINC:
-- 
2.3.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] KVM: x86: fix mixed APIC mode broadcast
  2015-02-12 18:41 [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Radim Krčmář
  2015-02-12 18:41 ` [PATCH v2 1/4] KVM: x86: use MDA for interrupt matching Radim Krčmář
@ 2015-02-12 18:41 ` Radim Krčmář
  2015-02-12 18:41 ` [PATCH v2 3/4] KVM: x86: avoid logical_map when it is invalid Radim Krčmář
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2015-02-12 18:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Nadav Amit, Gleb Natapov

Broadcast allowed only one global APIC mode, but mixed modes are
theoretically possible.  x2APIC IPI doesn't mean 0xff as broadcast,
the rest does.

x2APIC broadcasts are accepted by xAPIC.  If we take SDM to be logical,
even addreses beginning with 0xff should be accepted, but real hardware
disagrees.  This patch aims for simple code by considering most of real
behavior as undefined.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2
 - don't deliver all phys messages beginning with 0xff to xAPIC

 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/lapic.c            | 9 ++++-----
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a236e39cc385..ea673e18b20f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -554,7 +554,7 @@ struct kvm_apic_map {
 	struct rcu_head rcu;
 	u8 ldr_bits;
 	/* fields bellow are used to decode ldr values in different modes */
-	u32 cid_shift, cid_mask, lid_mask, broadcast;
+	u32 cid_shift, cid_mask, lid_mask;
 	struct kvm_lapic *phys_map[256];
 	/* first index is cluster id second is cpu id in a cluster */
 	struct kvm_lapic *logical_map[16][16];
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4812bde5090c..729bd7714790 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -151,7 +151,6 @@ static void recalculate_apic_map(struct kvm *kvm)
 	new->cid_shift = 8;
 	new->cid_mask = 0;
 	new->lid_mask = 0xff;
-	new->broadcast = APIC_BROADCAST;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvm_lapic *apic = vcpu->arch.apic;
@@ -163,7 +162,6 @@ static void recalculate_apic_map(struct kvm *kvm)
 			new->ldr_bits = 32;
 			new->cid_shift = 16;
 			new->cid_mask = new->lid_mask = 0xffff;
-			new->broadcast = X2APIC_BROADCAST;
 		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
 							APIC_DFR_CLUSTER) {
@@ -687,6 +685,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 	struct kvm_lapic **dst;
 	int i;
 	bool ret = false;
+	bool x2apic_ipi = src && apic_x2apic_mode(src);
 
 	*r = -1;
 
@@ -698,15 +697,15 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 	if (irq->shorthand)
 		return false;
 
+	if (irq->dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
+		return false;
+
 	rcu_read_lock();
 	map = rcu_dereference(kvm->arch.apic_map);
 
 	if (!map)
 		goto out;
 
-	if (irq->dest_id == map->broadcast)
-		goto out;
-
 	ret = true;
 
 	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
-- 
2.3.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] KVM: x86: avoid logical_map when it is invalid
  2015-02-12 18:41 [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Radim Krčmář
  2015-02-12 18:41 ` [PATCH v2 1/4] KVM: x86: use MDA for interrupt matching Radim Krčmář
  2015-02-12 18:41 ` [PATCH v2 2/4] KVM: x86: fix mixed APIC mode broadcast Radim Krčmář
@ 2015-02-12 18:41 ` Radim Krčmář
  2015-02-12 18:41 ` [PATCH v2 4/4] KVM: x86: simplify kvm_apic_map Radim Krčmář
  2015-02-26  1:55 ` [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Marcelo Tosatti
  4 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2015-02-12 18:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Nadav Amit, Gleb Natapov

We want to support mixed modes and the easiest solution is to avoid
optimizing those weird and unlikely scenarios.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2
 - optimize and name a check for valid map [Paolo]
 - don't use cluster id with invalid map [Paolo]
 - define KVM_APIC_MODE_* closer to kvm_apic_map
   (personal preference)

 arch/x86/include/asm/kvm_host.h |  5 +++++
 arch/x86/kvm/lapic.c            | 24 +++++++++++++++++++++++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ea673e18b20f..63a4e03fbdcf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -550,8 +550,13 @@ struct kvm_arch_memory_slot {
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 };
 
+#define KVM_APIC_MODE_XAPIC_CLUSTER          4
+#define KVM_APIC_MODE_XAPIC_FLAT             8
+#define KVM_APIC_MODE_X2APIC                16
+
 struct kvm_apic_map {
 	struct rcu_head rcu;
+	u8 mode;
 	u8 ldr_bits;
 	/* fields bellow are used to decode ldr values in different modes */
 	u32 cid_shift, cid_mask, lid_mask;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 729bd7714790..88e2bf3be235 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -133,6 +133,14 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
 	return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
 }
 
+/* The logical map is definitely wrong if we have multiple
+ * modes at the same time.  (Physical map is always right.)
+ */
+static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
+{
+	return !(map->mode & (map->mode - 1));
+}
+
 static void recalculate_apic_map(struct kvm *kvm)
 {
 	struct kvm_apic_map *new, *old = NULL;
@@ -162,16 +170,19 @@ static void recalculate_apic_map(struct kvm *kvm)
 			new->ldr_bits = 32;
 			new->cid_shift = 16;
 			new->cid_mask = new->lid_mask = 0xffff;
+			new->mode |= KVM_APIC_MODE_X2APIC;
 		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
 			if (kvm_apic_get_reg(apic, APIC_DFR) ==
 							APIC_DFR_CLUSTER) {
 				new->cid_shift = 4;
 				new->cid_mask = 0xf;
 				new->lid_mask = 0xf;
+				new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
 			} else {
 				new->cid_shift = 8;
 				new->cid_mask = 0;
 				new->lid_mask = 0xff;
+				new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
 			}
 		}
 
@@ -198,6 +209,10 @@ static void recalculate_apic_map(struct kvm *kvm)
 
 		if (aid < ARRAY_SIZE(new->phys_map))
 			new->phys_map[aid] = apic;
+
+		if (!kvm_apic_logical_map_valid(new));
+			continue;
+
 		if (lid && cid < ARRAY_SIZE(new->logical_map))
 			new->logical_map[cid][ffs(lid) - 1] = apic;
 	}
@@ -715,7 +730,14 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		dst = &map->phys_map[irq->dest_id];
 	} else {
 		u32 mda = irq->dest_id << (32 - map->ldr_bits);
-		u16 cid = apic_cluster_id(map, mda);
+		u16 cid;
+
+		if (!kvm_apic_logical_map_valid(map)) {
+			ret = false;
+			goto out;
+		}
+
+		cid = apic_cluster_id(map, mda);
 
 		if (cid >= ARRAY_SIZE(map->logical_map))
 			goto out;
-- 
2.3.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] KVM: x86: simplify kvm_apic_map
  2015-02-12 18:41 [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Radim Krčmář
                   ` (2 preceding siblings ...)
  2015-02-12 18:41 ` [PATCH v2 3/4] KVM: x86: avoid logical_map when it is invalid Radim Krčmář
@ 2015-02-12 18:41 ` Radim Krčmář
  2015-04-07 12:38   ` Paolo Bonzini
  2015-02-26  1:55 ` [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Marcelo Tosatti
  4 siblings, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2015-02-12 18:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Nadav Amit, Gleb Natapov

recalculate_apic_map() uses two passes over all VCPUs.  This is a relic
from time when we selected a global mode in the first pass and set up
the optimized table in the second pass (to have a consistent mode).

Recent changes made mixed mode unoptimized and we can do it in one pass.
Format of logical MDA is a function of the mode, so we encode it in
apic_logical_id() and drop obsoleted variables from the struct.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2
 - use different optimization to shorten apic_logical_id [Paolo]
 - move apic_logical_id to lapic.c [Paolo]
 - keep apic map validity check separate from apic_logical_id
   (It can be done now as it's optimized in a different way.)
 - rename apic_logical_id parameter from ldr to dest_id
   (To avoid hinting that it has anything to do with real APIC.)

 arch/x86/include/asm/kvm_host.h |  3 --
 arch/x86/kvm/lapic.c            | 75 ++++++++++++++---------------------------
 arch/x86/kvm/lapic.h            | 15 ---------
 3 files changed, 25 insertions(+), 68 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 63a4e03fbdcf..c55dcaa5aead 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -557,9 +557,6 @@ struct kvm_arch_memory_slot {
 struct kvm_apic_map {
 	struct rcu_head rcu;
 	u8 mode;
-	u8 ldr_bits;
-	/* fields bellow are used to decode ldr values in different modes */
-	u32 cid_shift, cid_mask, lid_mask;
 	struct kvm_lapic *phys_map[256];
 	/* first index is cluster id second is cpu id in a cluster */
 	struct kvm_lapic *logical_map[16][16];
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 88e2bf3be235..3661f79642c9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -141,6 +141,17 @@ static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
 	return !(map->mode & (map->mode - 1));
 }
 
+static inline void
+apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
+{
+	BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_CLUSTER !=  4);
+	BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_FLAT    !=  8);
+	BUILD_BUG_ON(KVM_APIC_MODE_X2APIC        != 16);
+
+	*cid = dest_id >> map->mode;
+	*lid = dest_id & ((1 << map->mode) - 1);
+}
+
 static void recalculate_apic_map(struct kvm *kvm)
 {
 	struct kvm_apic_map *new, *old = NULL;
@@ -154,49 +165,6 @@ static void recalculate_apic_map(struct kvm *kvm)
 	if (!new)
 		goto out;
 
-	new->ldr_bits = 8;
-	/* flat mode is default */
-	new->cid_shift = 8;
-	new->cid_mask = 0;
-	new->lid_mask = 0xff;
-
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		struct kvm_lapic *apic = vcpu->arch.apic;
-
-		if (!kvm_apic_present(vcpu))
-			continue;
-
-		if (apic_x2apic_mode(apic)) {
-			new->ldr_bits = 32;
-			new->cid_shift = 16;
-			new->cid_mask = new->lid_mask = 0xffff;
-			new->mode |= KVM_APIC_MODE_X2APIC;
-		} else if (kvm_apic_get_reg(apic, APIC_LDR)) {
-			if (kvm_apic_get_reg(apic, APIC_DFR) ==
-							APIC_DFR_CLUSTER) {
-				new->cid_shift = 4;
-				new->cid_mask = 0xf;
-				new->lid_mask = 0xf;
-				new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
-			} else {
-				new->cid_shift = 8;
-				new->cid_mask = 0;
-				new->lid_mask = 0xff;
-				new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
-			}
-		}
-
-		/*
-		 * All APICs have to be configured in the same mode by an OS.
-		 * We take advatage of this while building logical id loockup
-		 * table. After reset APICs are in software disabled mode, so if
-		 * we find apic with different setting we assume this is the mode
-		 * OS wants all apics to be in; build lookup table accordingly.
-		 */
-		if (kvm_apic_sw_enabled(apic))
-			break;
-	}
-
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvm_lapic *apic = vcpu->arch.apic;
 		u16 cid, lid;
@@ -204,15 +172,25 @@ static void recalculate_apic_map(struct kvm *kvm)
 
 		aid = kvm_apic_id(apic);
 		ldr = kvm_apic_get_reg(apic, APIC_LDR);
-		cid = apic_cluster_id(new, ldr);
-		lid = apic_logical_id(new, ldr);
 
 		if (aid < ARRAY_SIZE(new->phys_map))
 			new->phys_map[aid] = apic;
 
-		if (!kvm_apic_logical_map_valid(new));
+		if (apic_x2apic_mode(apic)) {
+			new->mode |= KVM_APIC_MODE_X2APIC;
+		} else if (ldr) {
+			ldr = GET_APIC_LOGICAL_ID(ldr);
+			if (kvm_apic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
+				new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
+			else
+				new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
+		}
+
+		if (!kvm_apic_logical_map_valid(new))
 			continue;
 
+		apic_logical_id(new, ldr, &cid, &lid);
+
 		if (lid && cid < ARRAY_SIZE(new->logical_map))
 			new->logical_map[cid][ffs(lid) - 1] = apic;
 	}
@@ -729,7 +707,6 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 
 		dst = &map->phys_map[irq->dest_id];
 	} else {
-		u32 mda = irq->dest_id << (32 - map->ldr_bits);
 		u16 cid;
 
 		if (!kvm_apic_logical_map_valid(map)) {
@@ -737,15 +714,13 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 			goto out;
 		}
 
-		cid = apic_cluster_id(map, mda);
+		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
 
 		if (cid >= ARRAY_SIZE(map->logical_map))
 			goto out;
 
 		dst = map->logical_map[cid];
 
-		bitmap = apic_logical_id(map, mda);
-
 		if (irq->delivery_mode == APIC_DM_LOWEST) {
 			int l = -1;
 			for_each_set_bit(i, &bitmap, 16) {
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0bc6c656625b..8ba09b338682 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -148,21 +148,6 @@ static inline bool kvm_apic_vid_enabled(struct kvm *kvm)
 	return kvm_x86_ops->vm_has_apicv(kvm);
 }
 
-static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
-{
-	u16 cid;
-	ldr >>= 32 - map->ldr_bits;
-	cid = (ldr >> map->cid_shift) & map->cid_mask;
-
-	return cid;
-}
-
-static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
-{
-	ldr >>= (32 - map->ldr_bits);
-	return ldr & map->lid_mask;
-}
-
 static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.apic->pending_events;
-- 
2.3.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode)
  2015-02-12 18:41 [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Radim Krčmář
                   ` (3 preceding siblings ...)
  2015-02-12 18:41 ` [PATCH v2 4/4] KVM: x86: simplify kvm_apic_map Radim Krčmář
@ 2015-02-26  1:55 ` Marcelo Tosatti
  2015-02-26  8:49   ` Nadav Amit
  2015-02-27 15:12   ` Radim Krčmář
  4 siblings, 2 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2015-02-26  1:55 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Paolo Bonzini, Nadav Amit, Gleb Natapov

Radim,

On Thu, Feb 12, 2015 at 07:41:30PM +0100, Radim Krčmář wrote:
> Each patch has a diff from v1, here is only a prologue on the mythical
> mixed xAPIC and x2APIC mode:
> 
> There is one interesting alias in xAPIC and x2APIC ICR destination, the
> 0xff000000, which is a broadcast in xAPIC and either a physical message
> to high x2APIC ID or a message to an empty set in x2APIC cluster.
> 
> This corner case in mixed mode can be triggered by a weird message
>  1) when we have no x2APIC ID 0xff000000, but send x2APIC message there

10.7 SYSTEM AND APIC BUS ARBITRATION

Note that except for the SIPI IPI (see Section 10.6.1, “Interrupt
Command Register (ICR)”), all bus messages that fail to be delivered
to their specified destination or destinations are automatically
retried. Software should avoid situations in which IPIs are sent to
disabled or nonexistent local APICs, causing the messages to be resent
repeatedly.

> or after something that isn't forbidden in SDM most likely because they
> didn't think that anyone would ever consider it
>  2) we have x2APIC ID 0xff000000 and reset other x2APICs into xAPIC mode

Or just x2APIC initialization in non lockstep:


	vcpu0				vcpu1
T0) 	xapic mode			xapic mode
T1)	x2apic enable
T2)					broadcast IPI

> Current KVM doesn't need to consider (2), so there only is a slim chance
> that some hobbyist OS pushes the specification to its limits.
> 
> The problem is that SDM for xAPIC doesn't believably specify[1] if all
> messages beginning with 0xff are considered as broadcasts, 10.6.2.1:
>   A broadcast IPI (bits 28-31 of the MDA are 1's)
> 
> No volunteer came to clear this up, so I hacked Linux to have one x2APIC
> core between xAPIC ones.  Physical IPI to 0xff000000 got delivered only
> to CPU0, like most other destinations, Logical IPI to 0xff000000 got
> dropped and only 0xffffffff worked as a broadcast in both modes.
> 
> I think it is because Intel never considered mixed mode to be valid, and
> seen delivery might be an emergent feature of QPI routing.

> Luckily, broadcast from xAPIC isn't delivered to x2APIC.

In real hardware?

> Real exploration would require greater modifications to Linux (ideally
> writing a custom kernel), so this series implements something that makes
> some sense and isn't too far from reality.

Ok, so the problem is that broadcast (ICR.destination == 0xff000000)
from xAPIC CPU is not delivered to x2APIC CPUs ?

> Radim Krčmář (4):
>   KVM: x86: use MDA for interrupt matching
>   KVM: x86: fix mixed APIC mode broadcast
>   KVM: x86: avoid logical_map when it is invalid
>   KVM: x86: simplify kvm_apic_map

I can't find any restriction against (cpu0==x2APIC, cpu1==xAPIC) in 
the documentation.

Anyway, emulation should match physical hardware. From your message above,
it is not clear what is the behaviour there:

"No volunteer came to clear this up, so I hacked Linux to have one x2APIC
core between xAPIC ones.  Physical IPI to 0xff000000 got delivered only
to CPU0, like most other destinations, Logical IPI to 0xff000000 got
dropped and only 0xffffffff worked as a broadcast in both modes.

Luckily, broadcast from xAPIC isn't delivered to x2APIC."

>From the x2APIC CPU or the xAPIC ones?

It should be easy to write kvm-unit-test testcases that match physical
hardware behaviour (in general, i am having a hard time figure out
in what way "mixed mode" is supposed to behave, please describe it
clearly).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode)
  2015-02-26  1:55 ` [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Marcelo Tosatti
@ 2015-02-26  8:49   ` Nadav Amit
  2015-02-26 22:31     ` Marcelo Tosatti
  2015-02-27 15:12   ` Radim Krčmář
  1 sibling, 1 reply; 11+ messages in thread
From: Nadav Amit @ 2015-02-26  8:49 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Radim Krčmář,
	linux-kernel, kvm, Paolo Bonzini, Nadav Amit, Gleb Natapov

Marcello, Radim,

As you know - I can run some tests on the patches and whether they comply
with real hardware. Please let me know which version to test and I’ll try to
do next week.

Regards,
Nadav

Marcelo Tosatti <mtosatti@redhat.com> wrote:

> Radim,
> 
> On Thu, Feb 12, 2015 at 07:41:30PM +0100, Radim Krčmář wrote:
>> Each patch has a diff from v1, here is only a prologue on the mythical
>> mixed xAPIC and x2APIC mode:
>> 
>> There is one interesting alias in xAPIC and x2APIC ICR destination, the
>> 0xff000000, which is a broadcast in xAPIC and either a physical message
>> to high x2APIC ID or a message to an empty set in x2APIC cluster.
>> 
>> This corner case in mixed mode can be triggered by a weird message
>> 1) when we have no x2APIC ID 0xff000000, but send x2APIC message there
> 
> 10.7 SYSTEM AND APIC BUS ARBITRATION
> 
> Note that except for the SIPI IPI (see Section 10.6.1, “Interrupt
> Command Register (ICR)”), all bus messages that fail to be delivered
> to their specified destination or destinations are automatically
> retried. Software should avoid situations in which IPIs are sent to
> disabled or nonexistent local APICs, causing the messages to be resent
> repeatedly.
> 
>> or after something that isn't forbidden in SDM most likely because they
>> didn't think that anyone would ever consider it
>> 2) we have x2APIC ID 0xff000000 and reset other x2APICs into xAPIC mode
> 
> Or just x2APIC initialization in non lockstep:
> 
> 
> 	vcpu0				vcpu1
> T0) 	xapic mode			xapic mode
> T1)	x2apic enable
> T2)					broadcast IPI
> 
>> Current KVM doesn't need to consider (2), so there only is a slim chance
>> that some hobbyist OS pushes the specification to its limits.
>> 
>> The problem is that SDM for xAPIC doesn't believably specify[1] if all
>> messages beginning with 0xff are considered as broadcasts, 10.6.2.1:
>>  A broadcast IPI (bits 28-31 of the MDA are 1's)
>> 
>> No volunteer came to clear this up, so I hacked Linux to have one x2APIC
>> core between xAPIC ones.  Physical IPI to 0xff000000 got delivered only
>> to CPU0, like most other destinations, Logical IPI to 0xff000000 got
>> dropped and only 0xffffffff worked as a broadcast in both modes.
>> 
>> I think it is because Intel never considered mixed mode to be valid, and
>> seen delivery might be an emergent feature of QPI routing.
> 
>> Luckily, broadcast from xAPIC isn't delivered to x2APIC.
> 
> In real hardware?
> 
>> Real exploration would require greater modifications to Linux (ideally
>> writing a custom kernel), so this series implements something that makes
>> some sense and isn't too far from reality.
> 
> Ok, so the problem is that broadcast (ICR.destination == 0xff000000)
> from xAPIC CPU is not delivered to x2APIC CPUs ?
> 
>> Radim Krčmář (4):
>>  KVM: x86: use MDA for interrupt matching
>>  KVM: x86: fix mixed APIC mode broadcast
>>  KVM: x86: avoid logical_map when it is invalid
>>  KVM: x86: simplify kvm_apic_map
> 
> I can't find any restriction against (cpu0==x2APIC, cpu1==xAPIC) in 
> the documentation.
> 
> Anyway, emulation should match physical hardware. From your message above,
> it is not clear what is the behaviour there:
> 
> "No volunteer came to clear this up, so I hacked Linux to have one x2APIC
> core between xAPIC ones.  Physical IPI to 0xff000000 got delivered only
> to CPU0, like most other destinations, Logical IPI to 0xff000000 got
> dropped and only 0xffffffff worked as a broadcast in both modes.
> 
> Luckily, broadcast from xAPIC isn't delivered to x2APIC."
> 
> From the x2APIC CPU or the xAPIC ones?
> 
> It should be easy to write kvm-unit-test testcases that match physical
> hardware behaviour (in general, i am having a hard time figure out
> in what way "mixed mode" is supposed to behave, please describe it
> clearly).
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode)
  2015-02-26  8:49   ` Nadav Amit
@ 2015-02-26 22:31     ` Marcelo Tosatti
  2015-02-27 15:22       ` Radim Krčmář
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2015-02-26 22:31 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Radim Krčmář,
	linux-kernel, kvm, Paolo Bonzini, Nadav Amit, Gleb Natapov

On Thu, Feb 26, 2015 at 10:49:53AM +0200, Nadav Amit wrote:
> Marcello, Radim,
> 
> As you know - I can run some tests on the patches and whether they comply
> with real hardware. Please let me know which version to test and I’ll try to
> do next week.
> 
> Regards,
> Nadav

>From what i understand Radim did run a set of tests (with a modified
Linux kernel) to find out the information on mixed mode.

However its not clear what that behaviour is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode)
  2015-02-26  1:55 ` [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Marcelo Tosatti
  2015-02-26  8:49   ` Nadav Amit
@ 2015-02-27 15:12   ` Radim Krčmář
  1 sibling, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2015-02-27 15:12 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, kvm, Paolo Bonzini, Nadav Amit, Gleb Natapov

2015-02-25 22:55-0300, Marcelo Tosatti:
> >  1) when we have no x2APIC ID 0xff000000, but send x2APIC message there
> 
> 10.7 SYSTEM AND APIC BUS ARBITRATION
> 
> Note that except for the SIPI IPI (see Section 10.6.1, “Interrupt
> Command Register (ICR)”), all bus messages that fail to be delivered
> to their specified destination or destinations are automatically
> retried. Software should avoid situations in which IPIs are sent to
> disabled or nonexistent local APICs, causing the messages to be resent
> repeatedly.

Perfect, thanks.
(I thought that the restriction only applies to lowest priority.)

> > or after something that isn't forbidden in SDM most likely because they
> > didn't think that anyone would ever consider it
> >  2) we have x2APIC ID 0xff000000 and reset other x2APICs into xAPIC mode
> 
> Or just x2APIC initialization in non lockstep:
> 
> 
> 	vcpu0				vcpu1
> T0) 	xapic mode			xapic mode
> T1)	x2apic enable
> T2)					broadcast IPI

Good point, there is no reason why the BIOS couldn't do that.
(If we have APIC ID > 255, OS gets CPUs in x2APIC mode, or with high IDs
 disabled; 10.12.8.1 Consistency of APIC IDs and CPUID)

> > No volunteer came to clear this up, so I hacked Linux to have one x2APIC
> > core between xAPIC ones.  Physical IPI to 0xff000000 got delivered only
> > to CPU0, like most other destinations, Logical IPI to 0xff000000 got
> > dropped and only 0xffffffff worked as a broadcast in both modes.
> > 
> > I think it is because Intel never considered mixed mode to be valid, and
> > seen delivery might be an emergent feature of QPI routing.
> 
> > Luckily, broadcast from xAPIC isn't delivered to x2APIC.
> 
> In real hardware?

Yes, it was ivy bridge, iirc.

> > Real exploration would require greater modifications to Linux (ideally
> > writing a custom kernel), so this series implements something that makes
> > some sense and isn't too far from reality.
> 
> Ok, so the problem is that broadcast (ICR.destination == 0xff000000)
> from xAPIC CPU is not delivered to x2APIC CPUs ?

Mixed mode doesn't work much without this series -- we internally shift
all destinations to 0x0-0xff range and use a map that can't handle
duplicate entries, so the broadcast IPI is just a detail.

The biggest problem that for me is that we allow x2APIC without
interrupt remapping, so all devices use (IO)xAPIC and broadcast is
inconsistent with the design -- interpreted as a 0xff x2APIC message.

> I can't find any restriction against (cpu0==x2APIC, cpu1==xAPIC) in 
> the documentation.

Me neither ... the most I found is that boot restriction mentioned
earlier. (10.12.8.1)

> Anyway, emulation should match physical hardware.

(I think that we have created a de facto standard by rejecting SDM,
 10.12.7 Initialization by System Software, and no OS that enabled
 x2APIC under KVM should do things that do not work.  The main benefit
 of this series is making the code better.)

>                                                   From your message above,
> it is not clear what is the behaviour there:
> 
> "No volunteer came to clear this up, so I hacked Linux to have one x2APIC
> core between xAPIC ones.  Physical IPI to 0xff000000 got delivered only
> to CPU0, like most other destinations, Logical IPI to 0xff000000 got
> dropped and only 0xffffffff worked as a broadcast in both modes.
> 
> Luckily, broadcast from xAPIC isn't delivered to x2APIC."
> 
> From the x2APIC CPU or the xAPIC ones?

The first paragraph was about x2APIC IPIs and their results on xAPICs.

The second paragraph was a broadcast from xAPIC. (I actually did the
reverse for that one -- one xAPIC besides x2APIC CPUs.)

> It should be easy to write kvm-unit-test testcases that match physical
> hardware behaviour (in general, i am having a hard time figure out
> in what way "mixed mode" is supposed to behave, please describe it
> clearly).

I think that mixed mode is undefined in real hardware: delivering x2APIC
0xff112233 to xAPIC 0 makes not sense.
I already wrote all that I know about it, sorry.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode)
  2015-02-26 22:31     ` Marcelo Tosatti
@ 2015-02-27 15:22       ` Radim Krčmář
  0 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2015-02-27 15:22 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Nadav Amit, linux-kernel, kvm, Paolo Bonzini, Nadav Amit, Gleb Natapov

2015-02-26 19:31-0300, Marcelo Tosatti:
> On Thu, Feb 26, 2015 at 10:49:53AM +0200, Nadav Amit wrote:
> > Marcello, Radim,
> > 
> > As you know - I can run some tests on the patches and whether they comply
> > with real hardware. Please let me know which version to test and I’ll try to
> > do next week.

Please do, this series implements a subset of what I found about real
hardware behavior, the rest didn't make much sense.
(I'd be grateful for any report about real hardware.)

> From what i understand Radim did run a set of tests (with a modified
> Linux kernel) to find out the information on mixed mode.
> 
> However its not clear what that behaviour is.

Exactly.

Thank you.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/4] KVM: x86: simplify kvm_apic_map
  2015-02-12 18:41 ` [PATCH v2 4/4] KVM: x86: simplify kvm_apic_map Radim Krčmář
@ 2015-04-07 12:38   ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2015-04-07 12:38 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel; +Cc: kvm, Nadav Amit, Gleb Natapov



On 12/02/2015 19:41, Radim Krčmář wrote:
> +static inline void
> +apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
> +{
> +	BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_CLUSTER !=  4);
> +	BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_FLAT    !=  8);
> +	BUILD_BUG_ON(KVM_APIC_MODE_X2APIC        != 16);
> +

I added

	unsigned lid_bits = map->mode;

here (used in the rest of apic_logical_id) and applied the series.

Thanks,

Paolo

> +	*cid = dest_id >> map->mode;
> +	*lid = dest_id & ((1 << map->mode) - 1);
> +}

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-04-07 12:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-12 18:41 [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Radim Krčmář
2015-02-12 18:41 ` [PATCH v2 1/4] KVM: x86: use MDA for interrupt matching Radim Krčmář
2015-02-12 18:41 ` [PATCH v2 2/4] KVM: x86: fix mixed APIC mode broadcast Radim Krčmář
2015-02-12 18:41 ` [PATCH v2 3/4] KVM: x86: avoid logical_map when it is invalid Radim Krčmář
2015-02-12 18:41 ` [PATCH v2 4/4] KVM: x86: simplify kvm_apic_map Radim Krčmář
2015-04-07 12:38   ` Paolo Bonzini
2015-02-26  1:55 ` [PATCH v2 0/4] KVM: APIC improvements (with bonus mixed mode) Marcelo Tosatti
2015-02-26  8:49   ` Nadav Amit
2015-02-26 22:31     ` Marcelo Tosatti
2015-02-27 15:22       ` Radim Krčmář
2015-02-27 15:12   ` Radim Krčmář

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).