LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v3 0/6] KVM: my debug patch queue
@ 2021-08-11 12:29 Maxim Levitsky
  2021-08-11 12:29 ` [PATCH v3 1/6] KVM: SVM: split svm_handle_invalid_exit Maxim Levitsky
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Maxim Levitsky @ 2021-08-11 12:29 UTC (permalink / raw)
  To: kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Maxim Levitsky, Vitaly Kuznetsov, Sean Christopherson,
	Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton

Hi!\r
\r
I would like to publish two debug features which were needed for other stuff\r
I work on.\r
\r
One is the reworked lx-symbols script which now actually works on at least\r
gdb 9.1 (gdb 9.2 was reported to fail to load the debug symbols from the kernel\r
for some reason, not related to this patch) and upstream qemu.\r
\r
The other feature is the ability to trap all guest exceptions (on SVM for now)\r
and see them in kvmtrace prior to potential merge to double/triple fault.\r
\r
This can be very useful and I already had to manually patch KVM a few\r
times for this.\r
I will, once time permits, implement this feature on Intel as well.\r
\r
V2:\r
\r
 * Some more refactoring and workarounds for lx-symbols script\r
\r
 * added KVM_GUESTDBG_BLOCKIRQ flag to enable 'block interrupts on\r
   single step' together with KVM_CAP_SET_GUEST_DEBUG2 capability\r
   to indicate which guest debug flags are supported.\r
\r
   This is a replacement for unconditional block of interrupts on single\r
   step that was done in previous version of this patch set.\r
   Patches to qemu to use that feature will be sent soon.\r
\r
 * Reworked the the 'intercept all exceptions for debug' feature according\r
   to the review feedback:\r
\r
   - renamed the parameter that enables the feature and\r
     moved it to common kvm module.\r
     (only SVM part is currently implemented though)\r
\r
   - disable the feature for SEV guests as was suggested during the review\r
   - made the vmexit table const again, as was suggested in the review as well.\r
\r
V3:\r
 * Modified a selftest to cover the KVM_GUESTDBG_BLOCKIRQ\r
 * Rebased on kvm/queue\r
\r
Best regards,\r
        Maxim Levitsky\r
\r
Maxim Levitsky (6):\r
  KVM: SVM: split svm_handle_invalid_exit\r
  KVM: x86: add force_intercept_exceptions_mask\r
  KVM: SVM: implement force_intercept_exceptions_mask\r
  scripts/gdb: rework lx-symbols gdb script\r
  KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ\r
  KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ\r
\r
 Documentation/virt/kvm/api.rst                |   1 +\r
 arch/x86/include/asm/kvm_host.h               |   5 +-\r
 arch/x86/include/uapi/asm/kvm.h               |   1 +\r
 arch/x86/kvm/svm/svm.c                        |  87 +++++++-\r
 arch/x86/kvm/svm/svm.h                        |   6 +-\r
 arch/x86/kvm/x86.c                            |  12 +-\r
 arch/x86/kvm/x86.h                            |   2 +\r
 kernel/module.c                               |   8 +-\r
 scripts/gdb/linux/symbols.py                  | 203 ++++++++++++------\r
 .../testing/selftests/kvm/x86_64/debug_regs.c |  24 ++-\r
 10 files changed, 266 insertions(+), 83 deletions(-)\r
\r
-- \r
2.26.3\r
\r


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 1/6] KVM: SVM: split svm_handle_invalid_exit
  2021-08-11 12:29 [PATCH v3 0/6] KVM: my debug patch queue Maxim Levitsky
@ 2021-08-11 12:29 ` Maxim Levitsky
  2021-08-11 12:29 ` [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask Maxim Levitsky
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Maxim Levitsky @ 2021-08-11 12:29 UTC (permalink / raw)
  To: kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Maxim Levitsky, Vitaly Kuznetsov, Sean Christopherson,
	Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton

Split the check for having a vmexit handler to svm_check_exit_valid,
and make svm_handle_invalid_exit only handle a vmexit that is
already not valid.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7b58e445a967..e45259177009 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3236,12 +3236,14 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	       "excp_to:", save->last_excp_to);
 }
 
-static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code)
+static bool svm_check_exit_valid(struct kvm_vcpu *vcpu, u64 exit_code)
 {
-	if (exit_code < ARRAY_SIZE(svm_exit_handlers) &&
-	    svm_exit_handlers[exit_code])
-		return 0;
+	return (exit_code < ARRAY_SIZE(svm_exit_handlers) &&
+		svm_exit_handlers[exit_code]);
+}
 
+static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code)
+{
 	vcpu_unimpl(vcpu, "svm: unexpected exit reason 0x%llx\n", exit_code);
 	dump_vmcb(vcpu);
 	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
@@ -3249,14 +3251,13 @@ static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code)
 	vcpu->run->internal.ndata = 2;
 	vcpu->run->internal.data[0] = exit_code;
 	vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
-
-	return -EINVAL;
+	return 0;
 }
 
 int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code)
 {
-	if (svm_handle_invalid_exit(vcpu, exit_code))
-		return 0;
+	if (!svm_check_exit_valid(vcpu, exit_code))
+		return svm_handle_invalid_exit(vcpu, exit_code);
 
 #ifdef CONFIG_RETPOLINE
 	if (exit_code == SVM_EXIT_MSR)
-- 
2.26.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask
  2021-08-11 12:29 [PATCH v3 0/6] KVM: my debug patch queue Maxim Levitsky
  2021-08-11 12:29 ` [PATCH v3 1/6] KVM: SVM: split svm_handle_invalid_exit Maxim Levitsky
@ 2021-08-11 12:29 ` Maxim Levitsky
  2021-09-02 16:56   ` Sean Christopherson
  2021-08-11 12:29 ` [PATCH v3 3/6] KVM: SVM: implement force_intercept_exceptions_mask Maxim Levitsky
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Maxim Levitsky @ 2021-08-11 12:29 UTC (permalink / raw)
  To: kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Maxim Levitsky, Vitaly Kuznetsov, Sean Christopherson,
	Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton, Borislav Petkov

This parameter will be used by VMX and SVM code to force
interception of a set of exceptions, given by a bitmask
for guest debug and/or kvm debug.

This is based on an idea first shown here:
https://patchwork.kernel.org/project/kvm/patch/20160301192822.GD22677@pd.tnic/

CC: Borislav Petkov <bp@suse.de>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 3 +++
 arch/x86/kvm/x86.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fdc0c18339fb..092e2fad3c0d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -184,6 +184,9 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
 int __read_mostly pi_inject_timer = -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 
+uint force_intercept_exceptions_mask;
+module_param(force_intercept_exceptions_mask, uint, S_IRUGO | S_IWUSR);
+EXPORT_SYMBOL_GPL(force_intercept_exceptions_mask);
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 7d66d63dc55a..2b59825213c7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -347,6 +347,8 @@ extern struct static_key kvm_no_apic_vcpu;
 
 extern bool report_ignored_msrs;
 
+extern uint force_intercept_exceptions_mask;
+
 static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
 {
 	return pvclock_scale_delta(nsec, vcpu->arch.virtual_tsc_mult,
-- 
2.26.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 3/6] KVM: SVM: implement force_intercept_exceptions_mask
  2021-08-11 12:29 [PATCH v3 0/6] KVM: my debug patch queue Maxim Levitsky
  2021-08-11 12:29 ` [PATCH v3 1/6] KVM: SVM: split svm_handle_invalid_exit Maxim Levitsky
  2021-08-11 12:29 ` [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask Maxim Levitsky
@ 2021-08-11 12:29 ` Maxim Levitsky
  2021-08-11 14:26   ` Maxim Levitsky
  2021-08-11 12:29 ` [PATCH v3 4/6] scripts/gdb: rework lx-symbols gdb script Maxim Levitsky
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Maxim Levitsky @ 2021-08-11 12:29 UTC (permalink / raw)
  To: kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Maxim Levitsky, Vitaly Kuznetsov, Sean Christopherson,
	Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton

Currently #TS interception is only done once.
Also exception interception is not enabled for SEV guests.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/svm/svm.c          | 70 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h          |  6 ++-
 arch/x86/kvm/x86.c              |  5 ++-
 4 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 20daaf67a5bf..72fe03506018 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1690,6 +1690,8 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
+void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
+			     u32 error_code, unsigned long payload);
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e45259177009..19f54b07161a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -233,6 +233,8 @@ static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
 #define MSRS_RANGE_SIZE 2048
 #define MSRS_IN_RANGE (MSRS_RANGE_SIZE * 8 / 2)
 
+static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code);
+
 u32 svm_msrpm_offset(u32 msr)
 {
 	u32 offset;
@@ -1153,6 +1155,22 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
 	}
 }
 
+static void svm_init_force_exceptions_intercepts(struct vcpu_svm *svm)
+{
+	int exc;
+
+	svm->force_intercept_exceptions_mask = force_intercept_exceptions_mask;
+	for (exc = 0 ; exc < 32 ; exc++) {
+		if (!(svm->force_intercept_exceptions_mask & (1 << exc)))
+			continue;
+
+		/* Those are defined to have undefined behavior in the SVM spec */
+		if (exc != 2 && exc != 9)
+			continue;
+		set_exception_intercept(svm, exc);
+	}
+}
+
 static void init_vmcb(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -1304,6 +1322,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 
 	enable_gif(svm);
 
+	if (!sev_es_guest(vcpu->kvm))
+		svm_init_force_exceptions_intercepts(svm);
+
 }
 
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -1892,6 +1913,17 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
+	if ((svm->force_intercept_exceptions_mask & (1 << PF_VECTOR)))
+		if (npt_enabled && !vcpu->arch.apf.host_apf_flags) {
+			/* If the #PF was only intercepted for debug, inject
+			 * it directly to the guest, since the kvm's mmu code
+			 * is not ready to deal with such page faults.
+			 */
+			kvm_queue_exception_e_p(vcpu, PF_VECTOR,
+						error_code, fault_address);
+			return 1;
+		}
+
 	return kvm_handle_page_fault(vcpu, error_code, fault_address,
 			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
 			svm->vmcb->control.insn_bytes : NULL,
@@ -1967,6 +1999,40 @@ static int ac_interception(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int gen_exc_interception(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Generic exception intercept handler which forwards a guest exception
+	 * as-is to the guest.
+	 * For exceptions that don't have a special intercept handler.
+	 *
+	 * Used only for 'force_intercept_exceptions_mask' KVM debug feature.
+	 */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int exc = svm->vmcb->control.exit_code - SVM_EXIT_EXCP_BASE;
+
+	/* SVM doesn't provide us with an error code for the #DF */
+	u32 err_code = exc == DF_VECTOR ? 0 : svm->vmcb->control.exit_info_1;
+
+	if (!(svm->force_intercept_exceptions_mask & (1 << exc)))
+		return svm_handle_invalid_exit(vcpu, svm->vmcb->control.exit_code);
+
+	if (exc == TS_VECTOR) {
+		/*
+		 * SVM doesn't provide us with an error code to be able to
+		 * re-inject the #TS exception, so just disable its
+		 * intercept, and let the guest re-execute the instruction.
+		 */
+		vmcb_clr_intercept(&svm->vmcb01.ptr->control,
+				   INTERCEPT_EXCEPTION_OFFSET + TS_VECTOR);
+		recalc_intercepts(svm);
+	} else if (x86_exception_has_error_code(exc))
+		kvm_queue_exception_e(vcpu, exc, err_code);
+	else
+		kvm_queue_exception(vcpu, exc);
+	return 1;
+}
+
 static bool is_erratum_383(void)
 {
 	int err, i;
@@ -3065,6 +3131,10 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_WRITE_DR5]			= dr_interception,
 	[SVM_EXIT_WRITE_DR6]			= dr_interception,
 	[SVM_EXIT_WRITE_DR7]			= dr_interception,
+
+	[SVM_EXIT_EXCP_BASE ...
+	SVM_EXIT_EXCP_BASE + 31]		= gen_exc_interception,
+
 	[SVM_EXIT_EXCP_BASE + DB_VECTOR]	= db_interception,
 	[SVM_EXIT_EXCP_BASE + BP_VECTOR]	= bp_interception,
 	[SVM_EXIT_EXCP_BASE + UD_VECTOR]	= ud_interception,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 524d943f3efc..187ada7c5b03 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -196,6 +196,7 @@ struct vcpu_svm {
 	bool ghcb_sa_free;
 
 	bool guest_state_loaded;
+	u32 force_intercept_exceptions_mask;
 };
 
 struct svm_cpu_data {
@@ -351,8 +352,11 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, u32 bit)
 	struct vmcb *vmcb = svm->vmcb01.ptr;
 
 	WARN_ON_ONCE(bit >= 32);
-	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
 
+	if ((1 << bit) & svm->force_intercept_exceptions_mask)
+		return;
+
+	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
 	recalc_intercepts(svm);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 092e2fad3c0d..e5c7b8fa1f7f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -695,12 +695,13 @@ void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr,
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception_p);
 
-static void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
-				    u32 error_code, unsigned long payload)
+void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
+			     u32 error_code, unsigned long payload)
 {
 	kvm_multiple_exception(vcpu, nr, true, error_code,
 			       true, payload, false);
 }
+EXPORT_SYMBOL_GPL(kvm_queue_exception_e_p);
 
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
 {
-- 
2.26.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 4/6] scripts/gdb: rework lx-symbols gdb script
  2021-08-11 12:29 [PATCH v3 0/6] KVM: my debug patch queue Maxim Levitsky
                   ` (2 preceding siblings ...)
  2021-08-11 12:29 ` [PATCH v3 3/6] KVM: SVM: implement force_intercept_exceptions_mask Maxim Levitsky
@ 2021-08-11 12:29 ` Maxim Levitsky
  2021-08-11 12:29 ` [PATCH v3 5/6] KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ Maxim Levitsky
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Maxim Levitsky @ 2021-08-11 12:29 UTC (permalink / raw)
  To: kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Maxim Levitsky, Vitaly Kuznetsov, Sean Christopherson,
	Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton

Fix several issues that are present in lx-symbols script:

* Track module unloads by placing another software breakpoint at
  'free_module'
  (force uninline this symbol just in case), and use remove-symbol-file
  gdb command to unload the symobls of the module that is unloading.

  That gives the gdb a chance to mark all software breakpoints from
  this module as pending again.
  Also remove the module from the 'known' module list once it is unloaded.

* Since we now track module unload, we don't need to reload all
  symbols anymore when 'known' module loaded again
  (that can't happen anymore).
  This allows reloading a module in the debugged kernel to finish
  much faster, while lx-symbols tracks module loads and unloads.

* Disable/enable all gdb breakpoints on both module load and unload
  breakpoint hits, and not only in 'load_all_symbols' as was done before.
  (load_all_symbols is no longer called on breakpoint hit)
  That allows gdb to avoid getting confused about the state of the
  (now two) internal breakpoints we place.
  Otherwise it will leave them in the kernel code segment, when
  continuing which triggers a guest kernel panic as soon as it skips
  over the 'int3' instruction and executes the garbage tail of the optcode
  on which the breakpoint was placed.

* Block SIGINT while the script is running as this seems to crash gdb

* Add a basic check that kernel is already loaded into the guest memory
  to avoid confusing errors.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 kernel/module.c              |   8 +-
 scripts/gdb/linux/symbols.py | 203 +++++++++++++++++++++++------------
 2 files changed, 143 insertions(+), 68 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index ed13917ea5f3..242bd4bb0b55 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -906,8 +906,12 @@ int module_refcount(struct module *mod)
 }
 EXPORT_SYMBOL(module_refcount);
 
-/* This exists whether we can unload or not */
-static void free_module(struct module *mod);
+/* This exists whether we can unload or not
+ * Keep it uninlined to provide a reliable breakpoint target,
+ * e.g. for the gdb helper command 'lx-symbols'.
+ */
+
+static noinline void free_module(struct module *mod);
 
 SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
 		unsigned int, flags)
diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
index 08d264ac328b..78e278fb4bad 100644
--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -14,45 +14,23 @@
 import gdb
 import os
 import re
+import signal
 
 from linux import modules, utils
 
 
 if hasattr(gdb, 'Breakpoint'):
-    class LoadModuleBreakpoint(gdb.Breakpoint):
-        def __init__(self, spec, gdb_command):
-            super(LoadModuleBreakpoint, self).__init__(spec, internal=True)
+
+    class BreakpointWrapper(gdb.Breakpoint):
+        def __init__(self, callback, **kwargs):
+            super(BreakpointWrapper, self).__init__(internal=True, **kwargs)
             self.silent = True
-            self.gdb_command = gdb_command
+            self.callback = callback
 
         def stop(self):
-            module = gdb.parse_and_eval("mod")
-            module_name = module['name'].string()
-            cmd = self.gdb_command
-
-            # enforce update if object file is not found
-            cmd.module_files_updated = False
-
-            # Disable pagination while reporting symbol (re-)loading.
-            # The console input is blocked in this context so that we would
-            # get stuck waiting for the user to acknowledge paged output.
-            show_pagination = gdb.execute("show pagination", to_string=True)
-            pagination = show_pagination.endswith("on.\n")
-            gdb.execute("set pagination off")
-
-            if module_name in cmd.loaded_modules:
-                gdb.write("refreshing all symbols to reload module "
-                          "'{0}'\n".format(module_name))
-                cmd.load_all_symbols()
-            else:
-                cmd.load_module_symbols(module)
-
-            # restore pagination state
-            gdb.execute("set pagination %s" % ("on" if pagination else "off"))
-
+            self.callback()
             return False
 
-
 class LxSymbols(gdb.Command):
     """(Re-)load symbols of Linux kernel and currently loaded modules.
 
@@ -61,15 +39,52 @@ are scanned recursively, starting in the same directory. Optionally, the module
 search path can be extended by a space separated list of paths passed to the
 lx-symbols command."""
 
-    module_paths = []
-    module_files = []
-    module_files_updated = False
-    loaded_modules = []
-    breakpoint = None
-
     def __init__(self):
         super(LxSymbols, self).__init__("lx-symbols", gdb.COMMAND_FILES,
                                         gdb.COMPLETE_FILENAME)
+        self.module_paths = []
+        self.module_files = []
+        self.module_files_updated = False
+        self.loaded_modules = {}
+        self.internal_breakpoints = []
+
+    # prepare GDB for loading/unloading a module
+    def _prepare_for_module_load_unload(self):
+
+        self.blocked_sigint = False
+
+        # block SIGINT during execution to avoid gdb crash
+        sigmask = signal.pthread_sigmask(signal.SIG_BLOCK, [])
+        if not signal.SIGINT in sigmask:
+            self.blocked_sigint = True
+            signal.pthread_sigmask(signal.SIG_BLOCK, {signal.SIGINT})
+
+        # disable all breakpoints to workaround a GDB bug where it would
+        # not correctly resume from an internal breakpoint we placed
+        # in do_module_init/free_module (it leaves the int3
+        self.saved_breakpoints = []
+        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
+            for bp in gdb.breakpoints():
+                self.saved_breakpoints.append({'breakpoint': bp, 'enabled': bp.enabled})
+                bp.enabled = False
+
+        # disable pagination to avoid asking user for continue
+        show_pagination = gdb.execute("show pagination", to_string=True)
+        self.saved_pagination = show_pagination.endswith("on.\n")
+        gdb.execute("set pagination off")
+
+    def _unprepare_for_module_load_unload(self):
+        # restore breakpoint state
+        for breakpoint in self.saved_breakpoints:
+            breakpoint['breakpoint'].enabled = breakpoint['enabled']
+
+        # restore pagination state
+        gdb.execute("set pagination %s" % ("on" if self.saved_pagination else "off"))
+
+        # unblock SIGINT
+        if self.blocked_sigint:
+            sigmask = signal.pthread_sigmask(signal.SIG_UNBLOCK, {signal.SIGINT})
+            self.blocked_sigint = False
 
     def _update_module_files(self):
         self.module_files = []
@@ -107,7 +122,7 @@ lx-symbols command."""
                     name=section_name, addr=str(address)))
         return "".join(args)
 
-    def load_module_symbols(self, module):
+    def _do_load_module_symbols(self, module):
         module_name = module['name'].string()
         module_addr = str(module['core_layout']['base']).split()[0]
 
@@ -130,56 +145,112 @@ lx-symbols command."""
                 addr=module_addr,
                 sections=self._section_arguments(module))
             gdb.execute(cmdline, to_string=True)
-            if module_name not in self.loaded_modules:
-                self.loaded_modules.append(module_name)
+
+            self.loaded_modules[module_name] = {"module_file": module_file,
+                                                "module_addr": module_addr}
         else:
             gdb.write("no module object found for '{0}'\n".format(module_name))
 
+
+    def load_module_symbols(self):
+        module = gdb.parse_and_eval("mod")
+
+                # module already loaded, false alarm
+        # can happen if 'do_init_module' breakpoint is hit multiple times
+        # due to interrupts
+        module_name = module['name'].string()
+        if module_name in self.loaded_modules:
+            gdb.write("spurious module load breakpoint\n")
+            return
+
+        # enforce update if object file is not found
+        self.module_files_updated = False
+        self._prepare_for_module_load_unload()
+        try:
+            self._do_load_module_symbols(module)
+        finally:
+            self._unprepare_for_module_load_unload()
+
+
+    def unload_module_symbols(self):
+        module = gdb.parse_and_eval("mod")
+        module_name = module['name'].string()
+
+        # module already unloaded, false alarm
+        # can happen if 'free_module' breakpoint is hit multiple times
+        # due to interrupts
+        if not module_name in self.loaded_modules:
+            gdb.write("spurious module unload breakpoint\n")
+            return
+
+        module_file = self.loaded_modules[module_name]["module_file"]
+        module_addr = self.loaded_modules[module_name]["module_addr"]
+
+        self._prepare_for_module_load_unload()
+        try:
+            gdb.write("unloading @{addr}: {filename}\n".format(
+                addr=module_addr, filename=module_file))
+            cmdline = "remove-symbol-file {filename}".format(
+                filename=module_file)
+            gdb.execute(cmdline, to_string=True)
+            del self.loaded_modules[module_name]
+
+        finally:
+            self._unprepare_for_module_load_unload()
+
     def load_all_symbols(self):
         gdb.write("loading vmlinux\n")
 
-        # Dropping symbols will disable all breakpoints. So save their states
-        # and restore them afterward.
-        saved_states = []
-        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
-            for bp in gdb.breakpoints():
-                saved_states.append({'breakpoint': bp, 'enabled': bp.enabled})
-
-        # drop all current symbols and reload vmlinux
-        orig_vmlinux = 'vmlinux'
-        for obj in gdb.objfiles():
-            if obj.filename.endswith('vmlinux'):
-                orig_vmlinux = obj.filename
-        gdb.execute("symbol-file", to_string=True)
-        gdb.execute("symbol-file {0}".format(orig_vmlinux))
-
-        self.loaded_modules = []
-        module_list = modules.module_list()
-        if not module_list:
-            gdb.write("no modules found\n")
-        else:
-            [self.load_module_symbols(module) for module in module_list]
+        self._prepare_for_module_load_unload()
+        try:
+            # drop all current symbols and reload vmlinux
+            orig_vmlinux = 'vmlinux'
+            for obj in gdb.objfiles():
+                if obj.filename.endswith('vmlinux'):
+                    orig_vmlinux = obj.filename
+            gdb.execute("symbol-file", to_string=True)
+            gdb.execute("symbol-file {0}".format(orig_vmlinux))
+            self.loaded_modules = {}
+            module_list = modules.module_list()
+            if not module_list:
+                gdb.write("no modules found\n")
+            else:
+                [self._do_load_module_symbols(module) for module in module_list]
+        finally:
+            self._unprepare_for_module_load_unload()
 
-        for saved_state in saved_states:
-            saved_state['breakpoint'].enabled = saved_state['enabled']
+        self._unprepare_for_module_load_unload()
 
     def invoke(self, arg, from_tty):
         self.module_paths = [os.path.abspath(os.path.expanduser(p))
                              for p in arg.split()]
         self.module_paths.append(os.getcwd())
 
+        try:
+            gdb.parse_and_eval("*start_kernel").fetch_lazy()
+        except gdb.MemoryError:
+            gdb.write("Error: Kernel is not yet loaded\n")
+            return
+
         # enforce update
         self.module_files = []
         self.module_files_updated = False
 
+        for bp in self.internal_breakpoints:
+            bp.delete()
+        self.internal_breakpoints = []
+
         self.load_all_symbols()
 
         if hasattr(gdb, 'Breakpoint'):
-            if self.breakpoint is not None:
-                self.breakpoint.delete()
-                self.breakpoint = None
-            self.breakpoint = LoadModuleBreakpoint(
-                "kernel/module.c:do_init_module", self)
+            self.internal_breakpoints.append(
+                BreakpointWrapper(self.load_module_symbols,
+                                  spec="kernel/module.c:do_init_module",
+                                  ))
+            self.internal_breakpoints.append(
+                BreakpointWrapper(self.unload_module_symbols,
+                                  spec="kernel/module.c:free_module",
+                                  ))
         else:
             gdb.write("Note: symbol update on module loading not supported "
                       "with this gdb version\n")
-- 
2.26.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 5/6] KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ
  2021-08-11 12:29 [PATCH v3 0/6] KVM: my debug patch queue Maxim Levitsky
                   ` (3 preceding siblings ...)
  2021-08-11 12:29 ` [PATCH v3 4/6] scripts/gdb: rework lx-symbols gdb script Maxim Levitsky
@ 2021-08-11 12:29 ` Maxim Levitsky
  2021-08-11 12:29 ` [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ Maxim Levitsky
  2021-08-11 13:10 ` [PATCH v3 0/6] KVM: my debug patch queue Paolo Bonzini
  6 siblings, 0 replies; 30+ messages in thread
From: Maxim Levitsky @ 2021-08-11 12:29 UTC (permalink / raw)
  To: kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Maxim Levitsky, Vitaly Kuznetsov, Sean Christopherson,
	Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton

KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts
while running.

This change is mostly intended for more robust single stepping
of the guest and it has the following benefits when enabled:

* Resuming from a breakpoint is much more reliable.
  When resuming execution from a breakpoint, with interrupts enabled,
  more often than not, KVM would inject an interrupt and make the CPU
  jump immediately to the interrupt handler and eventually return to
  the breakpoint, to trigger it again.

  From the user point of view it looks like the CPU never executed a
  single instruction and in some cases that can even prevent forward
  progress, for example, when the breakpoint is placed by an automated
  script (e.g lx-symbols), which does something in response to the
  breakpoint and then continues the guest automatically.
  If the script execution takes enough time for another interrupt to
  arrive, the guest will be stuck on the same breakpoint RIP forever.

* Normal single stepping is much more predictable, since it won't
  land the debugger into an interrupt handler.

* RFLAGS.TF has less chance to be leaked to the guest:

  We set that flag behind the guest's back to do single stepping
  but if single step lands us into an interrupt/exception handler
  it will be leaked to the guest in the form of being pushed
  to the stack.
  This doesn't completely eliminate this problem as exceptions
  can still happen, but at least this reduces the chances
  of this happening.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 Documentation/virt/kvm/api.rst  | 1 +
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/include/uapi/asm/kvm.h | 1 +
 arch/x86/kvm/x86.c              | 4 ++++
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 86d7ad3a126c..4ea1bb28297b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -3357,6 +3357,7 @@ flags which can include the following:
   - KVM_GUESTDBG_INJECT_DB:     inject DB type exception [x86]
   - KVM_GUESTDBG_INJECT_BP:     inject BP type exception [x86]
   - KVM_GUESTDBG_EXIT_PENDING:  trigger an immediate guest exit [s390]
+  - KVM_GUESTDBG_BLOCKIRQ:      avoid injecting interrupts/NMI/SMI [x86]
 
 For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints
 are enabled in memory so we need to ensure breakpoint exceptions are
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 72fe03506018..a12db20069d5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -222,7 +222,8 @@ enum x86_intercept_stage;
 	KVM_GUESTDBG_USE_HW_BP | \
 	KVM_GUESTDBG_USE_SW_BP | \
 	KVM_GUESTDBG_INJECT_BP | \
-	KVM_GUESTDBG_INJECT_DB)
+	KVM_GUESTDBG_INJECT_DB | \
+	KVM_GUESTDBG_BLOCKIRQ)
 
 
 #define PFERR_PRESENT_BIT 0
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index a6c327f8ad9e..2ef1f6513c68 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -295,6 +295,7 @@ struct kvm_debug_exit_arch {
 #define KVM_GUESTDBG_USE_HW_BP		0x00020000
 #define KVM_GUESTDBG_INJECT_DB		0x00040000
 #define KVM_GUESTDBG_INJECT_BP		0x00080000
+#define KVM_GUESTDBG_BLOCKIRQ		0x00100000
 
 /* for KVM_SET_GUEST_DEBUG */
 struct kvm_guest_debug_arch {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5c7b8fa1f7f..d7fade9e3b94 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8883,6 +8883,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
 		can_inject = false;
 	}
 
+	/* Don't inject interrupts if the user asked to avoid doing so */
+	if (vcpu->guest_debug & KVM_GUESTDBG_BLOCKIRQ)
+		return 0;
+
 	/*
 	 * Finally, inject interrupt events.  If an event cannot be injected
 	 * due to architectural conditions (e.g. IF=0) a window-open exit
-- 
2.26.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-08-11 12:29 [PATCH v3 0/6] KVM: my debug patch queue Maxim Levitsky
                   ` (4 preceding siblings ...)
  2021-08-11 12:29 ` [PATCH v3 5/6] KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ Maxim Levitsky
@ 2021-08-11 12:29 ` Maxim Levitsky
  2021-09-06 11:20   ` Paolo Bonzini
  2021-08-11 13:10 ` [PATCH v3 0/6] KVM: my debug patch queue Paolo Bonzini
  6 siblings, 1 reply; 30+ messages in thread
From: Maxim Levitsky @ 2021-08-11 12:29 UTC (permalink / raw)
  To: kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Maxim Levitsky, Vitaly Kuznetsov, Sean Christopherson,
	Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton

Modify debug_regs test to create a pending interrupt
and see that it is blocked when single stepping is done
with KVM_GUESTDBG_BLOCKIRQ

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 .../testing/selftests/kvm/x86_64/debug_regs.c | 24 ++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/debug_regs.c b/tools/testing/selftests/kvm/x86_64/debug_regs.c
index 6097a8283377..5f078db1bcba 100644
--- a/tools/testing/selftests/kvm/x86_64/debug_regs.c
+++ b/tools/testing/selftests/kvm/x86_64/debug_regs.c
@@ -8,12 +8,15 @@
 #include <string.h>
 #include "kvm_util.h"
 #include "processor.h"
+#include "apic.h"
 
 #define VCPU_ID 0
 
 #define DR6_BD		(1 << 13)
 #define DR7_GD		(1 << 13)
 
+#define IRQ_VECTOR 0xAA
+
 /* For testing data access debug BP */
 uint32_t guest_value;
 
@@ -21,6 +24,11 @@ extern unsigned char sw_bp, hw_bp, write_data, ss_start, bd_start;
 
 static void guest_code(void)
 {
+	/* Create a pending interrupt on current vCPU */
+	x2apic_enable();
+	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
+			 APIC_DM_FIXED | IRQ_VECTOR);
+
 	/*
 	 * Software BP tests.
 	 *
@@ -38,12 +46,19 @@ static void guest_code(void)
 		     "mov %%rax,%0;\n\t write_data:"
 		     : "=m" (guest_value) : : "rax");
 
-	/* Single step test, covers 2 basic instructions and 2 emulated */
+	/*
+	 * Single step test, covers 2 basic instructions and 2 emulated
+	 *
+	 * Enable interrupts during the single stepping to see that
+	 * pending interrupt we raised is not handled due to KVM_GUESTDBG_BLOCKIRQ
+	 */
 	asm volatile("ss_start: "
+		     "sti\n\t"
 		     "xor %%eax,%%eax\n\t"
 		     "cpuid\n\t"
 		     "movl $0x1a0,%%ecx\n\t"
 		     "rdmsr\n\t"
+		     "cli\n\t"
 		     : : : "eax", "ebx", "ecx", "edx");
 
 	/* DR6.BD test */
@@ -72,11 +87,13 @@ int main(void)
 	uint64_t cmd;
 	int i;
 	/* Instruction lengths starting at ss_start */
-	int ss_size[4] = {
+	int ss_size[6] = {
+		1,		/* sti*/
 		2,		/* xor */
 		2,		/* cpuid */
 		5,		/* mov */
 		2,		/* rdmsr */
+		1,		/* cli */
 	};
 
 	if (!kvm_check_cap(KVM_CAP_SET_GUEST_DEBUG)) {
@@ -154,7 +171,8 @@ int main(void)
 	for (i = 0; i < (sizeof(ss_size) / sizeof(ss_size[0])); i++) {
 		target_rip += ss_size[i];
 		CLEAR_DEBUG();
-		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
+		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP |
+				KVM_GUESTDBG_BLOCKIRQ;
 		debug.arch.debugreg[7] = 0x00000400;
 		APPLY_DEBUG();
 		vcpu_run(vm, VCPU_ID);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 0/6] KVM: my debug patch queue
  2021-08-11 12:29 [PATCH v3 0/6] KVM: my debug patch queue Maxim Levitsky
                   ` (5 preceding siblings ...)
  2021-08-11 12:29 ` [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ Maxim Levitsky
@ 2021-08-11 13:10 ` Paolo Bonzini
  2021-08-11 13:22   ` Maxim Levitsky
  6 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2021-08-11 13:10 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Sean Christopherson, Ingo Molnar,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

On 11/08/21 14:29, Maxim Levitsky wrote:
> Hi!
> 
> I would like to publish two debug features which were needed for other stuff
> I work on.
> 
> One is the reworked lx-symbols script which now actually works on at least
> gdb 9.1 (gdb 9.2 was reported to fail to load the debug symbols from the kernel
> for some reason, not related to this patch) and upstream qemu.
> 
> The other feature is the ability to trap all guest exceptions (on SVM for now)
> and see them in kvmtrace prior to potential merge to double/triple fault.
> 
> This can be very useful and I already had to manually patch KVM a few
> times for this.
> I will, once time permits, implement this feature on Intel as well.
> 
> V2:
> 
>   * Some more refactoring and workarounds for lx-symbols script
> 
>   * added KVM_GUESTDBG_BLOCKIRQ flag to enable 'block interrupts on
>     single step' together with KVM_CAP_SET_GUEST_DEBUG2 capability
>     to indicate which guest debug flags are supported.
> 
>     This is a replacement for unconditional block of interrupts on single
>     step that was done in previous version of this patch set.
>     Patches to qemu to use that feature will be sent soon.
> 
>   * Reworked the the 'intercept all exceptions for debug' feature according
>     to the review feedback:
> 
>     - renamed the parameter that enables the feature and
>       moved it to common kvm module.
>       (only SVM part is currently implemented though)
> 
>     - disable the feature for SEV guests as was suggested during the review
>     - made the vmexit table const again, as was suggested in the review as well.
> 
> V3:
>   * Modified a selftest to cover the KVM_GUESTDBG_BLOCKIRQ
>   * Rebased on kvm/queue
> 
> Best regards,
>          Maxim Levitsky
> 
> Maxim Levitsky (6):
>    KVM: SVM: split svm_handle_invalid_exit
>    KVM: x86: add force_intercept_exceptions_mask
>    KVM: SVM: implement force_intercept_exceptions_mask
>    scripts/gdb: rework lx-symbols gdb script
>    KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ
>    KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
> 
>   Documentation/virt/kvm/api.rst                |   1 +
>   arch/x86/include/asm/kvm_host.h               |   5 +-
>   arch/x86/include/uapi/asm/kvm.h               |   1 +
>   arch/x86/kvm/svm/svm.c                        |  87 +++++++-
>   arch/x86/kvm/svm/svm.h                        |   6 +-
>   arch/x86/kvm/x86.c                            |  12 +-
>   arch/x86/kvm/x86.h                            |   2 +
>   kernel/module.c                               |   8 +-
>   scripts/gdb/linux/symbols.py                  | 203 ++++++++++++------
>   .../testing/selftests/kvm/x86_64/debug_regs.c |  24 ++-
>   10 files changed, 266 insertions(+), 83 deletions(-)
> 

Queued 1-5-6.

For patches 2 and 3, please add VMX support too.

For patch 4, it's not KVM :) so please submit it separately.

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 0/6] KVM: my debug patch queue
  2021-08-11 13:10 ` [PATCH v3 0/6] KVM: my debug patch queue Paolo Bonzini
@ 2021-08-11 13:22   ` Maxim Levitsky
  0 siblings, 0 replies; 30+ messages in thread
From: Maxim Levitsky @ 2021-08-11 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Sean Christopherson, Ingo Molnar,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

On Wed, 2021-08-11 at 15:10 +0200, Paolo Bonzini wrote:
> On 11/08/21 14:29, Maxim Levitsky wrote:
> > Hi!
> > 
> > I would like to publish two debug features which were needed for other stuff
> > I work on.
> > 
> > One is the reworked lx-symbols script which now actually works on at least
> > gdb 9.1 (gdb 9.2 was reported to fail to load the debug symbols from the kernel
> > for some reason, not related to this patch) and upstream qemu.
> > 
> > The other feature is the ability to trap all guest exceptions (on SVM for now)
> > and see them in kvmtrace prior to potential merge to double/triple fault.
> > 
> > This can be very useful and I already had to manually patch KVM a few
> > times for this.
> > I will, once time permits, implement this feature on Intel as well.
> > 
> > V2:
> > 
> >   * Some more refactoring and workarounds for lx-symbols script
> > 
> >   * added KVM_GUESTDBG_BLOCKIRQ flag to enable 'block interrupts on
> >     single step' together with KVM_CAP_SET_GUEST_DEBUG2 capability
> >     to indicate which guest debug flags are supported.
> > 
> >     This is a replacement for unconditional block of interrupts on single
> >     step that was done in previous version of this patch set.
> >     Patches to qemu to use that feature will be sent soon.
> > 
> >   * Reworked the the 'intercept all exceptions for debug' feature according
> >     to the review feedback:
> > 
> >     - renamed the parameter that enables the feature and
> >       moved it to common kvm module.
> >       (only SVM part is currently implemented though)
> > 
> >     - disable the feature for SEV guests as was suggested during the review
> >     - made the vmexit table const again, as was suggested in the review as well.
> > 
> > V3:
> >   * Modified a selftest to cover the KVM_GUESTDBG_BLOCKIRQ
> >   * Rebased on kvm/queue
> > 
> > Best regards,
> >          Maxim Levitsky
> > 
> > Maxim Levitsky (6):
> >    KVM: SVM: split svm_handle_invalid_exit
> >    KVM: x86: add force_intercept_exceptions_mask
> >    KVM: SVM: implement force_intercept_exceptions_mask
> >    scripts/gdb: rework lx-symbols gdb script
> >    KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ
> >    KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
> > 
> >   Documentation/virt/kvm/api.rst                |   1 +
> >   arch/x86/include/asm/kvm_host.h               |   5 +-
> >   arch/x86/include/uapi/asm/kvm.h               |   1 +
> >   arch/x86/kvm/svm/svm.c                        |  87 +++++++-
> >   arch/x86/kvm/svm/svm.h                        |   6 +-
> >   arch/x86/kvm/x86.c                            |  12 +-
> >   arch/x86/kvm/x86.h                            |   2 +
> >   kernel/module.c                               |   8 +-
> >   scripts/gdb/linux/symbols.py                  | 203 ++++++++++++------
> >   .../testing/selftests/kvm/x86_64/debug_regs.c |  24 ++-
> >   10 files changed, 266 insertions(+), 83 deletions(-)
> > 
> 
> Queued 1-5-6.
> 
> For patches 2 and 3, please add VMX support too.
> 
> For patch 4, it's not KVM :) so please submit it separately.

Thanks!

I will do this!


Best regards,
	Maxim Levitsky

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 3/6] KVM: SVM: implement force_intercept_exceptions_mask
  2021-08-11 12:29 ` [PATCH v3 3/6] KVM: SVM: implement force_intercept_exceptions_mask Maxim Levitsky
@ 2021-08-11 14:26   ` Maxim Levitsky
  2021-09-02 17:34     ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim Levitsky @ 2021-08-11 14:26 UTC (permalink / raw)
  To: kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Sean Christopherson, Ingo Molnar,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 7280 bytes --]

On Wed, 2021-08-11 at 15:29 +0300, Maxim Levitsky wrote:
> Currently #TS interception is only done once.
> Also exception interception is not enabled for SEV guests.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 +
>  arch/x86/kvm/svm/svm.c          | 70 +++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.h          |  6 ++-
>  arch/x86/kvm/x86.c              |  5 ++-
>  4 files changed, 80 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 20daaf67a5bf..72fe03506018 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1690,6 +1690,8 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
>  void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
>  void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
>  void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
> +void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
> +			     u32 error_code, unsigned long payload);
>  void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
>  void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
>  void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e45259177009..19f54b07161a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -233,6 +233,8 @@ static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
>  #define MSRS_RANGE_SIZE 2048
>  #define MSRS_IN_RANGE (MSRS_RANGE_SIZE * 8 / 2)
>  
> +static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code);
> +
>  u32 svm_msrpm_offset(u32 msr)
>  {
>  	u32 offset;
> @@ -1153,6 +1155,22 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> +static void svm_init_force_exceptions_intercepts(struct vcpu_svm *svm)
> +{
> +	int exc;
> +
> +	svm->force_intercept_exceptions_mask = force_intercept_exceptions_mask;
> +	for (exc = 0 ; exc < 32 ; exc++) {
> +		if (!(svm->force_intercept_exceptions_mask & (1 << exc)))
> +			continue;
> +
> +		/* Those are defined to have undefined behavior in the SVM spec */
> +		if (exc != 2 && exc != 9)
> +			continue;
> +		set_exception_intercept(svm, exc);

I made a mistake here, during one of the refactoring I think, after I finished
testing this througfully, and I noticed it now while looking again
at the code.

I attached a fix for this, and I also tested more carefully that the
feature works with selftests, kvm unit tests and by booting few VMs.

Best regards,
	Maxim Levitsky

> +	}
> +}
> +
>  static void init_vmcb(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -1304,6 +1322,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  
>  	enable_gif(svm);
>  
> +	if (!sev_es_guest(vcpu->kvm))
> +		svm_init_force_exceptions_intercepts(svm);
> +
>  }
>  
>  static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> @@ -1892,6 +1913,17 @@ static int pf_interception(struct kvm_vcpu *vcpu)
>  	u64 fault_address = svm->vmcb->control.exit_info_2;
>  	u64 error_code = svm->vmcb->control.exit_info_1;
>  
> +	if ((svm->force_intercept_exceptions_mask & (1 << PF_VECTOR)))
> +		if (npt_enabled && !vcpu->arch.apf.host_apf_flags) {
> +			/* If the #PF was only intercepted for debug, inject
> +			 * it directly to the guest, since the kvm's mmu code
> +			 * is not ready to deal with such page faults.
> +			 */
> +			kvm_queue_exception_e_p(vcpu, PF_VECTOR,
> +						error_code, fault_address);
> +			return 1;
> +		}
> +
>  	return kvm_handle_page_fault(vcpu, error_code, fault_address,
>  			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
>  			svm->vmcb->control.insn_bytes : NULL,
> @@ -1967,6 +1999,40 @@ static int ac_interception(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static int gen_exc_interception(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * Generic exception intercept handler which forwards a guest exception
> +	 * as-is to the guest.
> +	 * For exceptions that don't have a special intercept handler.
> +	 *
> +	 * Used only for 'force_intercept_exceptions_mask' KVM debug feature.
> +	 */
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	int exc = svm->vmcb->control.exit_code - SVM_EXIT_EXCP_BASE;
> +
> +	/* SVM doesn't provide us with an error code for the #DF */
> +	u32 err_code = exc == DF_VECTOR ? 0 : svm->vmcb->control.exit_info_1;
> +
> +	if (!(svm->force_intercept_exceptions_mask & (1 << exc)))
> +		return svm_handle_invalid_exit(vcpu, svm->vmcb->control.exit_code);
> +
> +	if (exc == TS_VECTOR) {
> +		/*
> +		 * SVM doesn't provide us with an error code to be able to
> +		 * re-inject the #TS exception, so just disable its
> +		 * intercept, and let the guest re-execute the instruction.
> +		 */
> +		vmcb_clr_intercept(&svm->vmcb01.ptr->control,
> +				   INTERCEPT_EXCEPTION_OFFSET + TS_VECTOR);
> +		recalc_intercepts(svm);
> +	} else if (x86_exception_has_error_code(exc))
> +		kvm_queue_exception_e(vcpu, exc, err_code);
> +	else
> +		kvm_queue_exception(vcpu, exc);
> +	return 1;
> +}
> +
>  static bool is_erratum_383(void)
>  {
>  	int err, i;
> @@ -3065,6 +3131,10 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[SVM_EXIT_WRITE_DR5]			= dr_interception,
>  	[SVM_EXIT_WRITE_DR6]			= dr_interception,
>  	[SVM_EXIT_WRITE_DR7]			= dr_interception,
> +
> +	[SVM_EXIT_EXCP_BASE ...
> +	SVM_EXIT_EXCP_BASE + 31]		= gen_exc_interception,
> +
>  	[SVM_EXIT_EXCP_BASE + DB_VECTOR]	= db_interception,
>  	[SVM_EXIT_EXCP_BASE + BP_VECTOR]	= bp_interception,
>  	[SVM_EXIT_EXCP_BASE + UD_VECTOR]	= ud_interception,
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 524d943f3efc..187ada7c5b03 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -196,6 +196,7 @@ struct vcpu_svm {
>  	bool ghcb_sa_free;
>  
>  	bool guest_state_loaded;
> +	u32 force_intercept_exceptions_mask;
>  };
>  
>  struct svm_cpu_data {
> @@ -351,8 +352,11 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, u32 bit)
>  	struct vmcb *vmcb = svm->vmcb01.ptr;
>  
>  	WARN_ON_ONCE(bit >= 32);
> -	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
>  
> +	if ((1 << bit) & svm->force_intercept_exceptions_mask)
> +		return;
> +
> +	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
>  	recalc_intercepts(svm);
>  }
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 092e2fad3c0d..e5c7b8fa1f7f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -695,12 +695,13 @@ void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr,
>  }
>  EXPORT_SYMBOL_GPL(kvm_queue_exception_p);
>  
> -static void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
> -				    u32 error_code, unsigned long payload)
> +void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
> +			     u32 error_code, unsigned long payload)
>  {
>  	kvm_multiple_exception(vcpu, nr, true, error_code,
>  			       true, payload, false);
>  }
> +EXPORT_SYMBOL_GPL(kvm_queue_exception_e_p);
>  
>  int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
>  {


[-- Attachment #2: 0001-KVM-x86-fix-for-force_intercept_exceptions_mask.patch --]
[-- Type: text/x-patch, Size: 770 bytes --]

From 2bdf847f9991f5be1bdb3a47c0e796af935bdb3f Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk@redhat.com>
Date: Wed, 11 Aug 2021 17:02:14 +0300
Subject: [PATCH] KVM: x86: fix for force_intercept_exceptions_mask

---
 arch/x86/kvm/svm/svm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6f17569fd5c8..85e5a93fa79a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1166,8 +1166,7 @@ static void svm_init_force_exceptions_intercepts(struct vcpu_svm *svm)
 
 		/* Those are defined to have undefined behavior in the SVM spec */
 		if (exc != 2 && exc != 9)
-			continue;
-		set_exception_intercept(svm, exc);
+			set_exception_intercept(svm, exc);
 	}
 }
 
-- 
2.26.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask
  2021-08-11 12:29 ` [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask Maxim Levitsky
@ 2021-09-02 16:56   ` Sean Christopherson
  2022-02-08 14:34     ` Maxim Levitsky
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-09-02 16:56 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton, Borislav Petkov

Assuming this hasn't been abandoned...

On Wed, Aug 11, 2021, Maxim Levitsky wrote:
> This parameter will be used by VMX and SVM code to force
> interception of a set of exceptions, given by a bitmask
> for guest debug and/or kvm debug.
> 
> This is based on an idea first shown here:
> https://patchwork.kernel.org/project/kvm/patch/20160301192822.GD22677@pd.tnic/
> 
> CC: Borislav Petkov <bp@suse.de>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 3 +++
>  arch/x86/kvm/x86.h | 2 ++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fdc0c18339fb..092e2fad3c0d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -184,6 +184,9 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
>  int __read_mostly pi_inject_timer = -1;
>  module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>  
> +uint force_intercept_exceptions_mask;
> +module_param(force_intercept_exceptions_mask, uint, S_IRUGO | S_IWUSR);

Use octal permissions.  This also can't be a simple writable param, at least not
without a well-documented disclaimer, as there's no guarantee a vCPU will update
its exception bitmap in a timely fashion.  An alternative to a module param would
be to extend/add a per-VM ioctl(), e.g. maybe KVM_SET_GUEST_DEBUG?  The downside
of an ioctl() is that it would require userspace enabling :-/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 3/6] KVM: SVM: implement force_intercept_exceptions_mask
  2021-08-11 14:26   ` Maxim Levitsky
@ 2021-09-02 17:34     ` Sean Christopherson
  2022-02-08 14:35       ` Maxim Levitsky
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-09-02 17:34 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton

On Wed, Aug 11, 2021, Maxim Levitsky wrote:
> On Wed, 2021-08-11 at 15:29 +0300, Maxim Levitsky wrote:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index e45259177009..19f54b07161a 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -233,6 +233,8 @@ static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
> >  #define MSRS_RANGE_SIZE 2048
> >  #define MSRS_IN_RANGE (MSRS_RANGE_SIZE * 8 / 2)
> >  
> > +static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code);
> > +
> >  u32 svm_msrpm_offset(u32 msr)
> >  {
> >  	u32 offset;
> > @@ -1153,6 +1155,22 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
> >  	}
> >  }
> >  
> > +static void svm_init_force_exceptions_intercepts(struct vcpu_svm *svm)
> > +{
> > +	int exc;
> > +
> > +	svm->force_intercept_exceptions_mask = force_intercept_exceptions_mask;

Ah, the param is being snapshotted on vCPU creation, hence the writable module
param.  That works, though it'd be better to snapshot it on a per-VM basic, not
per-vCPU, and do so in common x86 code so that the param doesn't need to be
exported.

> > +	for (exc = 0 ; exc < 32 ; exc++) {

for_each_set_bit()

> > +		if (!(svm->force_intercept_exceptions_mask & (1 << exc)))
> > +			continue;
> > +
> > +		/* Those are defined to have undefined behavior in the SVM spec */
> > +		if (exc != 2 && exc != 9)

Maybe add a pr_warn_once() to let the user know they done messed up?

And given that there are already intercepts with undefined behavior, it's probably
best to disallow intercepting anything we aren't 100% postive will be handled
correctly, e.g. intercepting vector 31 is nonsensical at this time.

> > +			continue;
> > +		set_exception_intercept(svm, exc);

...

> > +static int gen_exc_interception(struct kvm_vcpu *vcpu)
> > +{
> > +	/*
> > +	 * Generic exception intercept handler which forwards a guest exception
> > +	 * as-is to the guest.
> > +	 * For exceptions that don't have a special intercept handler.
> > +	 *
> > +	 * Used only for 'force_intercept_exceptions_mask' KVM debug feature.
> > +	 */
> > +	struct vcpu_svm *svm = to_svm(vcpu);
> > +	int exc = svm->vmcb->control.exit_code - SVM_EXIT_EXCP_BASE;
> > +
> > +	/* SVM doesn't provide us with an error code for the #DF */
> > +	u32 err_code = exc == DF_VECTOR ? 0 : svm->vmcb->control.exit_info_1;

Might be better to handle this in the x86_exception_has_error_code() path to
avoid confusing readers with respect to exceptions that don't have an error code,
e.g.

	else if (x86_exception_has_error_code(exc)) {
		/* SVM doesn't provide the error code on #DF :-( */
		if (exc == DF_VECTOR)
			kvm_queue_exception_e(vcpu, exc, 0);
		else
			kvm_queue_exception_e(vcpu, exc, svm->vmcb->control.exit_info_1);
	} else {
		...
	}

Alternatively, can we zero svm->vmcb->control.exit_info_1 on #DF to make it more
obvious that SVM leaves stale data in exit_info_1 (assuming that's true)?  E.g.

	...

	if (exc == TS_VECTOR) {
		...
	} else if (x86_exception_has_error_code(exc)) {
		/* SVM doesn't provide the error code on #DF :-( */
		if (exc == DF_VECTOR)
			svm->vmcb->control.exit_info_1 = 0;

		kvm_queue_exception_e(vcpu, exc, svm->vmcb->control.exit_info_1);
	} else {
		...
	}

		
> > +
> > +	if (!(svm->force_intercept_exceptions_mask & (1 << exc)))

BIT(exc)

> > +		return svm_handle_invalid_exit(vcpu, svm->vmcb->control.exit_code);
> > +
> > +	if (exc == TS_VECTOR) {
> > +		/*
> > +		 * SVM doesn't provide us with an error code to be able to
> > +		 * re-inject the #TS exception, so just disable its
> > +		 * intercept, and let the guest re-execute the instruction.
> > +		 */
> > +		vmcb_clr_intercept(&svm->vmcb01.ptr->control,
> > +				   INTERCEPT_EXCEPTION_OFFSET + TS_VECTOR);

Maybe just disallow intercepting #TS altogether?  Or does this fall into your
Win98 use case? :-)

> > +		recalc_intercepts(svm);
> > +	} else if (x86_exception_has_error_code(exc))
> > +		kvm_queue_exception_e(vcpu, exc, err_code);
> > +	else
> > +		kvm_queue_exception(vcpu, exc);
> > +	return 1;
> > +}
> > +
> >  static bool is_erratum_383(void)
> >  {
> >  	int err, i;
> > @@ -3065,6 +3131,10 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> >  	[SVM_EXIT_WRITE_DR5]			= dr_interception,
> >  	[SVM_EXIT_WRITE_DR6]			= dr_interception,
> >  	[SVM_EXIT_WRITE_DR7]			= dr_interception,
> > +
> > +	[SVM_EXIT_EXCP_BASE ...
> > +	SVM_EXIT_EXCP_BASE + 31]		= gen_exc_interception,

This generates a Sparse warning due to the duplicate initializer.  IMO that's a
very good warning as I have zero idea how the compiler actually handles this
particular scenario, e.g. do later entries take priority, is it technically
"undefined" behavior, etc...

arch/x86/kvm/svm/svm.c:3065:10: warning: Initializer entry defined twice
arch/x86/kvm/svm/svm.c:3067:29:   also defined here

I don't have a clever solution though :-(

> > +
> >  	[SVM_EXIT_EXCP_BASE + DB_VECTOR]	= db_interception,
> >  	[SVM_EXIT_EXCP_BASE + BP_VECTOR]	= bp_interception,
> >  	[SVM_EXIT_EXCP_BASE + UD_VECTOR]	= ud_interception,
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 524d943f3efc..187ada7c5b03 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -196,6 +196,7 @@ struct vcpu_svm {
> >  	bool ghcb_sa_free;
> >  
> >  	bool guest_state_loaded;
> > +	u32 force_intercept_exceptions_mask;
> >  };
> >  
> >  struct svm_cpu_data {
> > @@ -351,8 +352,11 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, u32 bit)
> >  	struct vmcb *vmcb = svm->vmcb01.ptr;
> >  
> >  	WARN_ON_ONCE(bit >= 32);
> > -	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
> >  
> > +	if ((1 << bit) & svm->force_intercept_exceptions_mask)

BIT(bit)

> > +		return;
> > +
> > +	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
> >  	recalc_intercepts(svm);
> >  }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-08-11 12:29 ` [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ Maxim Levitsky
@ 2021-09-06 11:20   ` Paolo Bonzini
  2021-09-06 21:03     ` Maxim Levitsky
  2021-11-01 15:43     ` Vitaly Kuznetsov
  0 siblings, 2 replies; 30+ messages in thread
From: Paolo Bonzini @ 2021-09-06 11:20 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Sean Christopherson, Ingo Molnar,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

On 11/08/21 14:29, Maxim Levitsky wrote:
> Modify debug_regs test to create a pending interrupt
> and see that it is blocked when single stepping is done
> with KVM_GUESTDBG_BLOCKIRQ
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   .../testing/selftests/kvm/x86_64/debug_regs.c | 24 ++++++++++++++++---
>   1 file changed, 21 insertions(+), 3 deletions(-)

I haven't looked very much at this, but the test fails.

Paolo

> diff --git a/tools/testing/selftests/kvm/x86_64/debug_regs.c b/tools/testing/selftests/kvm/x86_64/debug_regs.c
> index 6097a8283377..5f078db1bcba 100644
> --- a/tools/testing/selftests/kvm/x86_64/debug_regs.c
> +++ b/tools/testing/selftests/kvm/x86_64/debug_regs.c
> @@ -8,12 +8,15 @@
>   #include <string.h>
>   #include "kvm_util.h"
>   #include "processor.h"
> +#include "apic.h"
>   
>   #define VCPU_ID 0
>   
>   #define DR6_BD		(1 << 13)
>   #define DR7_GD		(1 << 13)
>   
> +#define IRQ_VECTOR 0xAA
> +
>   /* For testing data access debug BP */
>   uint32_t guest_value;
>   
> @@ -21,6 +24,11 @@ extern unsigned char sw_bp, hw_bp, write_data, ss_start, bd_start;
>   
>   static void guest_code(void)
>   {
> +	/* Create a pending interrupt on current vCPU */
> +	x2apic_enable();
> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> +			 APIC_DM_FIXED | IRQ_VECTOR);
> +
>   	/*
>   	 * Software BP tests.
>   	 *
> @@ -38,12 +46,19 @@ static void guest_code(void)
>   		     "mov %%rax,%0;\n\t write_data:"
>   		     : "=m" (guest_value) : : "rax");
>   
> -	/* Single step test, covers 2 basic instructions and 2 emulated */
> +	/*
> +	 * Single step test, covers 2 basic instructions and 2 emulated
> +	 *
> +	 * Enable interrupts during the single stepping to see that
> +	 * pending interrupt we raised is not handled due to KVM_GUESTDBG_BLOCKIRQ
> +	 */
>   	asm volatile("ss_start: "
> +		     "sti\n\t"
>   		     "xor %%eax,%%eax\n\t"
>   		     "cpuid\n\t"
>   		     "movl $0x1a0,%%ecx\n\t"
>   		     "rdmsr\n\t"
> +		     "cli\n\t"
>   		     : : : "eax", "ebx", "ecx", "edx");
>   
>   	/* DR6.BD test */
> @@ -72,11 +87,13 @@ int main(void)
>   	uint64_t cmd;
>   	int i;
>   	/* Instruction lengths starting at ss_start */
> -	int ss_size[4] = {
> +	int ss_size[6] = {
> +		1,		/* sti*/
>   		2,		/* xor */
>   		2,		/* cpuid */
>   		5,		/* mov */
>   		2,		/* rdmsr */
> +		1,		/* cli */
>   	};
>   
>   	if (!kvm_check_cap(KVM_CAP_SET_GUEST_DEBUG)) {
> @@ -154,7 +171,8 @@ int main(void)
>   	for (i = 0; i < (sizeof(ss_size) / sizeof(ss_size[0])); i++) {
>   		target_rip += ss_size[i];
>   		CLEAR_DEBUG();
> -		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
> +		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP |
> +				KVM_GUESTDBG_BLOCKIRQ;
>   		debug.arch.debugreg[7] = 0x00000400;
>   		APPLY_DEBUG();
>   		vcpu_run(vm, VCPU_ID);
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-09-06 11:20   ` Paolo Bonzini
@ 2021-09-06 21:03     ` Maxim Levitsky
  2021-11-01 15:43     ` Vitaly Kuznetsov
  1 sibling, 0 replies; 30+ messages in thread
From: Maxim Levitsky @ 2021-09-06 21:03 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Sean Christopherson, Ingo Molnar,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

On Mon, 2021-09-06 at 13:20 +0200, Paolo Bonzini wrote:
> On 11/08/21 14:29, Maxim Levitsky wrote:
> > Modify debug_regs test to create a pending interrupt
> > and see that it is blocked when single stepping is done
> > with KVM_GUESTDBG_BLOCKIRQ
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >   .../testing/selftests/kvm/x86_64/debug_regs.c | 24 ++++++++++++++++---
> >   1 file changed, 21 insertions(+), 3 deletions(-)
> 
> I haven't looked very much at this, but the test fails.

Works for me :-(

[mlevitsk@starship ~/Kernel/master/src/tools/testing/selftests/kvm]$./x86_64/debug_regs 
[mlevitsk@starship ~/Kernel/master/src/tools/testing/selftests/kvm]$echo $?
0


Maybe you run the test on kernel that doesn't support KVM_GUESTDBG_BLOCKIRQ?

Best regards,
	Maxim Levitsky

> 
> Paolo
> 
> > diff --git a/tools/testing/selftests/kvm/x86_64/debug_regs.c b/tools/testing/selftests/kvm/x86_64/debug_regs.c
> > index 6097a8283377..5f078db1bcba 100644
> > --- a/tools/testing/selftests/kvm/x86_64/debug_regs.c
> > +++ b/tools/testing/selftests/kvm/x86_64/debug_regs.c
> > @@ -8,12 +8,15 @@
> >   #include <string.h>
> >   #include "kvm_util.h"
> >   #include "processor.h"
> > +#include "apic.h"
> >   
> >   #define VCPU_ID 0
> >   
> >   #define DR6_BD		(1 << 13)
> >   #define DR7_GD		(1 << 13)
> >   
> > +#define IRQ_VECTOR 0xAA
> > +
> >   /* For testing data access debug BP */
> >   uint32_t guest_value;
> >   
> > @@ -21,6 +24,11 @@ extern unsigned char sw_bp, hw_bp, write_data, ss_start, bd_start;
> >   
> >   static void guest_code(void)
> >   {
> > +	/* Create a pending interrupt on current vCPU */
> > +	x2apic_enable();
> > +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> > +			 APIC_DM_FIXED | IRQ_VECTOR);
> > +
> >   	/*
> >   	 * Software BP tests.
> >   	 *
> > @@ -38,12 +46,19 @@ static void guest_code(void)
> >   		     "mov %%rax,%0;\n\t write_data:"
> >   		     : "=m" (guest_value) : : "rax");
> >   
> > -	/* Single step test, covers 2 basic instructions and 2 emulated */
> > +	/*
> > +	 * Single step test, covers 2 basic instructions and 2 emulated
> > +	 *
> > +	 * Enable interrupts during the single stepping to see that
> > +	 * pending interrupt we raised is not handled due to KVM_GUESTDBG_BLOCKIRQ
> > +	 */
> >   	asm volatile("ss_start: "
> > +		     "sti\n\t"
> >   		     "xor %%eax,%%eax\n\t"
> >   		     "cpuid\n\t"
> >   		     "movl $0x1a0,%%ecx\n\t"
> >   		     "rdmsr\n\t"
> > +		     "cli\n\t"
> >   		     : : : "eax", "ebx", "ecx", "edx");
> >   
> >   	/* DR6.BD test */
> > @@ -72,11 +87,13 @@ int main(void)
> >   	uint64_t cmd;
> >   	int i;
> >   	/* Instruction lengths starting at ss_start */
> > -	int ss_size[4] = {
> > +	int ss_size[6] = {
> > +		1,		/* sti*/
> >   		2,		/* xor */
> >   		2,		/* cpuid */
> >   		5,		/* mov */
> >   		2,		/* rdmsr */
> > +		1,		/* cli */
> >   	};
> >   
> >   	if (!kvm_check_cap(KVM_CAP_SET_GUEST_DEBUG)) {
> > @@ -154,7 +171,8 @@ int main(void)
> >   	for (i = 0; i < (sizeof(ss_size) / sizeof(ss_size[0])); i++) {
> >   		target_rip += ss_size[i];
> >   		CLEAR_DEBUG();
> > -		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
> > +		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP |
> > +				KVM_GUESTDBG_BLOCKIRQ;
> >   		debug.arch.debugreg[7] = 0x00000400;
> >   		APPLY_DEBUG();
> >   		vcpu_run(vm, VCPU_ID);
> > 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-09-06 11:20   ` Paolo Bonzini
  2021-09-06 21:03     ` Maxim Levitsky
@ 2021-11-01 15:43     ` Vitaly Kuznetsov
  2021-11-01 16:19       ` Maxim Levitsky
  1 sibling, 1 reply; 30+ messages in thread
From: Vitaly Kuznetsov @ 2021-11-01 15:43 UTC (permalink / raw)
  To: Paolo Bonzini, Maxim Levitsky, kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Sean Christopherson, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/08/21 14:29, Maxim Levitsky wrote:
>> Modify debug_regs test to create a pending interrupt
>> and see that it is blocked when single stepping is done
>> with KVM_GUESTDBG_BLOCKIRQ
>> 
>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>> ---
>>   .../testing/selftests/kvm/x86_64/debug_regs.c | 24 ++++++++++++++++---
>>   1 file changed, 21 insertions(+), 3 deletions(-)
>
> I haven't looked very much at this, but the test fails.
>

Same here,

the test passes on AMD but fails consistently on Intel:

# ./x86_64/debug_regs 
==== Test Assertion Failure ====
  x86_64/debug_regs.c:179: run->exit_reason == KVM_EXIT_DEBUG && run->debug.arch.exception == DB_VECTOR && run->debug.arch.pc == target_rip && run->debug.arch.dr6 == target_dr6
  pid=13434 tid=13434 errno=0 - Success
     1	0x00000000004027c6: main at debug_regs.c:179
     2	0x00007f65344cf554: ?? ??:0
     3	0x000000000040294a: _start at ??:?
  SINGLE_STEP[1]: exit 8 exception 1 rip 0x402a25 (should be 0x402a27) dr6 0xffff4ff0 (should be 0xffff4ff0)

(I know I'm late to the party).

> Paolo
>
>> diff --git a/tools/testing/selftests/kvm/x86_64/debug_regs.c b/tools/testing/selftests/kvm/x86_64/debug_regs.c
>> index 6097a8283377..5f078db1bcba 100644
>> --- a/tools/testing/selftests/kvm/x86_64/debug_regs.c
>> +++ b/tools/testing/selftests/kvm/x86_64/debug_regs.c
>> @@ -8,12 +8,15 @@
>>   #include <string.h>
>>   #include "kvm_util.h"
>>   #include "processor.h"
>> +#include "apic.h"
>>   
>>   #define VCPU_ID 0
>>   
>>   #define DR6_BD		(1 << 13)
>>   #define DR7_GD		(1 << 13)
>>   
>> +#define IRQ_VECTOR 0xAA
>> +
>>   /* For testing data access debug BP */
>>   uint32_t guest_value;
>>   
>> @@ -21,6 +24,11 @@ extern unsigned char sw_bp, hw_bp, write_data, ss_start, bd_start;
>>   
>>   static void guest_code(void)
>>   {
>> +	/* Create a pending interrupt on current vCPU */
>> +	x2apic_enable();
>> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
>> +			 APIC_DM_FIXED | IRQ_VECTOR);
>> +
>>   	/*
>>   	 * Software BP tests.
>>   	 *
>> @@ -38,12 +46,19 @@ static void guest_code(void)
>>   		     "mov %%rax,%0;\n\t write_data:"
>>   		     : "=m" (guest_value) : : "rax");
>>   
>> -	/* Single step test, covers 2 basic instructions and 2 emulated */
>> +	/*
>> +	 * Single step test, covers 2 basic instructions and 2 emulated
>> +	 *
>> +	 * Enable interrupts during the single stepping to see that
>> +	 * pending interrupt we raised is not handled due to KVM_GUESTDBG_BLOCKIRQ
>> +	 */
>>   	asm volatile("ss_start: "
>> +		     "sti\n\t"
>>   		     "xor %%eax,%%eax\n\t"
>>   		     "cpuid\n\t"
>>   		     "movl $0x1a0,%%ecx\n\t"
>>   		     "rdmsr\n\t"
>> +		     "cli\n\t"
>>   		     : : : "eax", "ebx", "ecx", "edx");
>>   
>>   	/* DR6.BD test */
>> @@ -72,11 +87,13 @@ int main(void)
>>   	uint64_t cmd;
>>   	int i;
>>   	/* Instruction lengths starting at ss_start */
>> -	int ss_size[4] = {
>> +	int ss_size[6] = {
>> +		1,		/* sti*/
>>   		2,		/* xor */
>>   		2,		/* cpuid */
>>   		5,		/* mov */
>>   		2,		/* rdmsr */
>> +		1,		/* cli */
>>   	};
>>   
>>   	if (!kvm_check_cap(KVM_CAP_SET_GUEST_DEBUG)) {
>> @@ -154,7 +171,8 @@ int main(void)
>>   	for (i = 0; i < (sizeof(ss_size) / sizeof(ss_size[0])); i++) {
>>   		target_rip += ss_size[i];
>>   		CLEAR_DEBUG();
>> -		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
>> +		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP |
>> +				KVM_GUESTDBG_BLOCKIRQ;
>>   		debug.arch.debugreg[7] = 0x00000400;
>>   		APPLY_DEBUG();
>>   		vcpu_run(vm, VCPU_ID);
>> 
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-11-01 15:43     ` Vitaly Kuznetsov
@ 2021-11-01 16:19       ` Maxim Levitsky
  2021-11-01 23:21         ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim Levitsky @ 2021-11-01 16:19 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Paolo Bonzini, kvm
  Cc: Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Sean Christopherson, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

On Mon, 2021-11-01 at 16:43 +0100, Vitaly Kuznetsov wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 11/08/21 14:29, Maxim Levitsky wrote:
> > > Modify debug_regs test to create a pending interrupt
> > > and see that it is blocked when single stepping is done
> > > with KVM_GUESTDBG_BLOCKIRQ
> > > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >   .../testing/selftests/kvm/x86_64/debug_regs.c | 24 ++++++++++++++++---
> > >   1 file changed, 21 insertions(+), 3 deletions(-)
> > 
> > I haven't looked very much at this, but the test fails.
> > 
> 
> Same here,
> 
> the test passes on AMD but fails consistently on Intel:
> 
> # ./x86_64/debug_regs 
> ==== Test Assertion Failure ====
>   x86_64/debug_regs.c:179: run->exit_reason == KVM_EXIT_DEBUG && run->debug.arch.exception == DB_VECTOR && run->debug.arch.pc == target_rip && run->debug.arch.dr6 == target_dr6
>   pid=13434 tid=13434 errno=0 - Success
>      1	0x00000000004027c6: main at debug_regs.c:179
>      2	0x00007f65344cf554: ?? ??:0
>      3	0x000000000040294a: _start at ??:?
>   SINGLE_STEP[1]: exit 8 exception 1 rip 0x402a25 (should be 0x402a27) dr6 0xffff4ff0 (should be 0xffff4ff0)
> 
> (I know I'm late to the party).

Well that is strange. It passes on my intel laptop. Just tested 
(kvm/queue + qemu master, compiled today) :-(

It fails on iteration 1 (and there is iteration 0) which I think means that we
start with RIP on sti, and get #DB on start of xor instruction first (correctly), 
and then we get #DB again on start of xor instruction again?

Something very strange. My laptop has i7-7600U.

Best regards,
	Maxim Levitsky




> 
> > Paolo
> > 
> > > diff --git a/tools/testing/selftests/kvm/x86_64/debug_regs.c b/tools/testing/selftests/kvm/x86_64/debug_regs.c
> > > index 6097a8283377..5f078db1bcba 100644
> > > --- a/tools/testing/selftests/kvm/x86_64/debug_regs.c
> > > +++ b/tools/testing/selftests/kvm/x86_64/debug_regs.c
> > > @@ -8,12 +8,15 @@
> > >   #include <string.h>
> > >   #include "kvm_util.h"
> > >   #include "processor.h"
> > > +#include "apic.h"
> > >   
> > >   #define VCPU_ID 0
> > >   
> > >   #define DR6_BD		(1 << 13)
> > >   #define DR7_GD		(1 << 13)
> > >   
> > > +#define IRQ_VECTOR 0xAA
> > > +
> > >   /* For testing data access debug BP */
> > >   uint32_t guest_value;
> > >   
> > > @@ -21,6 +24,11 @@ extern unsigned char sw_bp, hw_bp, write_data, ss_start, bd_start;
> > >   
> > >   static void guest_code(void)
> > >   {
> > > +	/* Create a pending interrupt on current vCPU */
> > > +	x2apic_enable();
> > > +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> > > +			 APIC_DM_FIXED | IRQ_VECTOR);
> > > +
> > >   	/*
> > >   	 * Software BP tests.
> > >   	 *
> > > @@ -38,12 +46,19 @@ static void guest_code(void)
> > >   		     "mov %%rax,%0;\n\t write_data:"
> > >   		     : "=m" (guest_value) : : "rax");
> > >   
> > > -	/* Single step test, covers 2 basic instructions and 2 emulated */
> > > +	/*
> > > +	 * Single step test, covers 2 basic instructions and 2 emulated
> > > +	 *
> > > +	 * Enable interrupts during the single stepping to see that
> > > +	 * pending interrupt we raised is not handled due to KVM_GUESTDBG_BLOCKIRQ
> > > +	 */
> > >   	asm volatile("ss_start: "
> > > +		     "sti\n\t"
> > >   		     "xor %%eax,%%eax\n\t"
> > >   		     "cpuid\n\t"
> > >   		     "movl $0x1a0,%%ecx\n\t"
> > >   		     "rdmsr\n\t"
> > > +		     "cli\n\t"
> > >   		     : : : "eax", "ebx", "ecx", "edx");
> > >   
> > >   	/* DR6.BD test */
> > > @@ -72,11 +87,13 @@ int main(void)
> > >   	uint64_t cmd;
> > >   	int i;
> > >   	/* Instruction lengths starting at ss_start */
> > > -	int ss_size[4] = {
> > > +	int ss_size[6] = {
> > > +		1,		/* sti*/
> > >   		2,		/* xor */
> > >   		2,		/* cpuid */
> > >   		5,		/* mov */
> > >   		2,		/* rdmsr */
> > > +		1,		/* cli */
> > >   	};
> > >   
> > >   	if (!kvm_check_cap(KVM_CAP_SET_GUEST_DEBUG)) {
> > > @@ -154,7 +171,8 @@ int main(void)
> > >   	for (i = 0; i < (sizeof(ss_size) / sizeof(ss_size[0])); i++) {
> > >   		target_rip += ss_size[i];
> > >   		CLEAR_DEBUG();
> > > -		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
> > > +		debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP |
> > > +				KVM_GUESTDBG_BLOCKIRQ;
> > >   		debug.arch.debugreg[7] = 0x00000400;
> > >   		APPLY_DEBUG();
> > >   		vcpu_run(vm, VCPU_ID);
> > > 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-11-01 16:19       ` Maxim Levitsky
@ 2021-11-01 23:21         ` Sean Christopherson
  2021-11-02 10:46           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-11-01 23:21 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Vitaly Kuznetsov, Paolo Bonzini, kvm, Kieran Bingham, Jan Kiszka,
	Andrew Jones, Jonathan Corbet, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

On Mon, Nov 01, 2021, Maxim Levitsky wrote:
> On Mon, 2021-11-01 at 16:43 +0100, Vitaly Kuznetsov wrote:
> > Paolo Bonzini <pbonzini@redhat.com> writes:
> > 
> > > On 11/08/21 14:29, Maxim Levitsky wrote:
> > > > Modify debug_regs test to create a pending interrupt
> > > > and see that it is blocked when single stepping is done
> > > > with KVM_GUESTDBG_BLOCKIRQ
> > > > 
> > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > ---
> > > >   .../testing/selftests/kvm/x86_64/debug_regs.c | 24 ++++++++++++++++---
> > > >   1 file changed, 21 insertions(+), 3 deletions(-)
> > > 
> > > I haven't looked very much at this, but the test fails.
> > > 
> > 
> > Same here,
> > 
> > the test passes on AMD but fails consistently on Intel:
> > 
> > # ./x86_64/debug_regs 
> > ==== Test Assertion Failure ====
> >   x86_64/debug_regs.c:179: run->exit_reason == KVM_EXIT_DEBUG && run->debug.arch.exception == DB_VECTOR && run->debug.arch.pc == target_rip && run->debug.arch.dr6 == target_dr6
> >   pid=13434 tid=13434 errno=0 - Success
> >      1	0x00000000004027c6: main at debug_regs.c:179
> >      2	0x00007f65344cf554: ?? ??:0
> >      3	0x000000000040294a: _start at ??:?
> >   SINGLE_STEP[1]: exit 8 exception 1 rip 0x402a25 (should be 0x402a27) dr6 0xffff4ff0 (should be 0xffff4ff0)
> > 
> > (I know I'm late to the party).
> 
> Well that is strange. It passes on my intel laptop. Just tested 
> (kvm/queue + qemu master, compiled today) :-(
> 
> It fails on iteration 1 (and there is iteration 0) which I think means that we
> start with RIP on sti, and get #DB on start of xor instruction first (correctly), 
> and then we get #DB again on start of xor instruction again?
> 
> Something very strange. My laptop has i7-7600U.

I haven't verified on hardware, but my guess is that this code in vmx_vcpu_run()

	/* When single-stepping over STI and MOV SS, we must clear the
	 * corresponding interruptibility bits in the guest state. Otherwise
	 * vmentry fails as it then expects bit 14 (BS) in pending debug
	 * exceptions being set, but that's not correct for the guest debugging
	 * case. */
	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
		vmx_set_interrupt_shadow(vcpu, 0);

interacts badly with APICv=1.  It will kill the STI shadow and cause the IRQ in
vmcs.GUEST_RVI to be recognized when it (micro-)architecturally should not.  My
head is going in circles trying to sort out what would actually happen.  Maybe
comment out that and/or disable APICv to see if either one makes the test pass?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-11-01 23:21         ` Sean Christopherson
@ 2021-11-02 10:46           ` Vitaly Kuznetsov
  2021-11-02 15:53             ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Vitaly Kuznetsov @ 2021-11-02 10:46 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: Paolo Bonzini, kvm, Kieran Bingham, Jan Kiszka, Andrew Jones,
	Jonathan Corbet, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Nov 01, 2021, Maxim Levitsky wrote:
>> On Mon, 2021-11-01 at 16:43 +0100, Vitaly Kuznetsov wrote:
>> > Paolo Bonzini <pbonzini@redhat.com> writes:
>> > 
>> > > On 11/08/21 14:29, Maxim Levitsky wrote:
>> > > > Modify debug_regs test to create a pending interrupt
>> > > > and see that it is blocked when single stepping is done
>> > > > with KVM_GUESTDBG_BLOCKIRQ
>> > > > 
>> > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>> > > > ---
>> > > >   .../testing/selftests/kvm/x86_64/debug_regs.c | 24 ++++++++++++++++---
>> > > >   1 file changed, 21 insertions(+), 3 deletions(-)
>> > > 
>> > > I haven't looked very much at this, but the test fails.
>> > > 
>> > 
>> > Same here,
>> > 
>> > the test passes on AMD but fails consistently on Intel:
>> > 
>> > # ./x86_64/debug_regs 
>> > ==== Test Assertion Failure ====
>> >   x86_64/debug_regs.c:179: run->exit_reason == KVM_EXIT_DEBUG && run->debug.arch.exception == DB_VECTOR && run->debug.arch.pc == target_rip && run->debug.arch.dr6 == target_dr6
>> >   pid=13434 tid=13434 errno=0 - Success
>> >      1	0x00000000004027c6: main at debug_regs.c:179
>> >      2	0x00007f65344cf554: ?? ??:0
>> >      3	0x000000000040294a: _start at ??:?
>> >   SINGLE_STEP[1]: exit 8 exception 1 rip 0x402a25 (should be 0x402a27) dr6 0xffff4ff0 (should be 0xffff4ff0)
>> > 
>> > (I know I'm late to the party).
>> 
>> Well that is strange. It passes on my intel laptop. Just tested 
>> (kvm/queue + qemu master, compiled today) :-(
>> 
>> It fails on iteration 1 (and there is iteration 0) which I think means that we
>> start with RIP on sti, and get #DB on start of xor instruction first (correctly), 
>> and then we get #DB again on start of xor instruction again?
>> 
>> Something very strange. My laptop has i7-7600U.
>
> I haven't verified on hardware, but my guess is that this code in vmx_vcpu_run()
>
> 	/* When single-stepping over STI and MOV SS, we must clear the
> 	 * corresponding interruptibility bits in the guest state. Otherwise
> 	 * vmentry fails as it then expects bit 14 (BS) in pending debug
> 	 * exceptions being set, but that's not correct for the guest debugging
> 	 * case. */
> 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> 		vmx_set_interrupt_shadow(vcpu, 0);
>
> interacts badly with APICv=1.  It will kill the STI shadow and cause the IRQ in
> vmcs.GUEST_RVI to be recognized when it (micro-)architecturally should not.  My
> head is going in circles trying to sort out what would actually happen.  Maybe
> comment out that and/or disable APICv to see if either one makes the test pass?
>

Interestingly,

loading 'kvm-intel' with 'enable_apicv=0' makes the test pass, however,
commenting out "vmx_set_interrupt_shadow()" as suggested gives a
different result (with enable_apicv=1):

# ./x86_64/debug_regs 
==== Test Assertion Failure ====
  x86_64/debug_regs.c:179: run->exit_reason == KVM_EXIT_DEBUG && run->debug.arch.exception == DB_VECTOR && run->debug.arch.pc == target_rip && run->debug.arch.dr6 == target_dr6
  pid=16352 tid=16352 errno=0 - Success
     1	0x0000000000402b33: main at debug_regs.c:179 (discriminator 10)
     2	0x00007f36401bd554: ?? ??:0
     3	0x00000000004023a9: _start at ??:?
  SINGLE_STEP[1]: exit 9 exception -2147483615 rip 0x1 (should be 0x4024d9) dr6 0xffff4ff0 (should be 0xffff4ff0)

this is a fairly old "Intel(R) Xeon(R) CPU E5-2603 v3".

-- 
Vitaly


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-11-02 10:46           ` Vitaly Kuznetsov
@ 2021-11-02 15:53             ` Sean Christopherson
  2021-11-02 16:18               ` Vitaly Kuznetsov
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-11-02 15:53 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Maxim Levitsky, Paolo Bonzini, kvm, Kieran Bingham, Jan Kiszka,
	Andrew Jones, Jonathan Corbet, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

On Tue, Nov 02, 2021, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> > I haven't verified on hardware, but my guess is that this code in vmx_vcpu_run()
> >
> > 	/* When single-stepping over STI and MOV SS, we must clear the
> > 	 * corresponding interruptibility bits in the guest state. Otherwise
> > 	 * vmentry fails as it then expects bit 14 (BS) in pending debug
> > 	 * exceptions being set, but that's not correct for the guest debugging
> > 	 * case. */
> > 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> > 		vmx_set_interrupt_shadow(vcpu, 0);
> >
> > interacts badly with APICv=1.  It will kill the STI shadow and cause the IRQ in
> > vmcs.GUEST_RVI to be recognized when it (micro-)architecturally should not.  My
> > head is going in circles trying to sort out what would actually happen.  Maybe
> > comment out that and/or disable APICv to see if either one makes the test pass?
> >
> 
> Interestingly,
> 
> loading 'kvm-intel' with 'enable_apicv=0' makes the test pass, however,
> commenting out "vmx_set_interrupt_shadow()" as suggested gives a
> different result (with enable_apicv=1):
> 
> # ./x86_64/debug_regs 
> ==== Test Assertion Failure ====
>   x86_64/debug_regs.c:179: run->exit_reason == KVM_EXIT_DEBUG && run->debug.arch.exception == DB_VECTOR && run->debug.arch.pc == target_rip && run->debug.arch.dr6 == target_dr6
>   pid=16352 tid=16352 errno=0 - Success
>      1	0x0000000000402b33: main at debug_regs.c:179 (discriminator 10)
>      2	0x00007f36401bd554: ?? ??:0
>      3	0x00000000004023a9: _start at ??:?
>   SINGLE_STEP[1]: exit 9 exception -2147483615 rip 0x1 (should be 0x4024d9) dr6 0xffff4ff0 (should be 0xffff4ff0)

Exit 9 is KVM_EXIT_FAIL_ENTRY, which in this case VM-Entry likely failed due to
invalid guest state because there was STI blocking with single-step enabled but
no pending BS #DB:

  Bit 14 (BS) must be 1 if the TF flag (bit 8) in the RFLAGS field is 1 and the
  BTF flag (bit 1) in the IA32_DEBUGCTL field is 0.

Which is precisely what that hack-a-fix avoids.  There isn't really a clean
solution for legacy single-step, AFAIK the only way to avoid this would be to
switch KVM_GUESTDBG_SINGLESTEP to use MTF.

But that mess is a red herring, the test fails with the same signature with APICv=1
if the STI is replaced by PUSHF+BTS+POPFD (to avoid the STI shadow).  We all missed
this key detail from Vitaly's report:

SINGLE_STEP[1]: exit 8 exception 1 rip 0x402a25 (should be 0x402a27) dr6 0xffff4ff0 (should be 0xffff4ff0)
                ^^^^^^

Exit '8' is KVM_EXIT_SHUTDOWN, i.e. the arrival of the IRQ hosed the guest because
the test doesn't invoke vm_init_descriptor_tables() to install event handlers.
The "exception 1" shows up because the run page isn't sanitized by the test, i.e.
it's stale data that happens to match.

So I would fully expect this test to fail with AVIC=1.  The problem is that
KVM_GUESTDBG_BLOCKIRQ does absolutely nothing to handle APICv interrupts.  And
even if KVM does something to fudge that behavior in the emulated local APIC, the
test will then fail miserably virtual IPIs (currently AVIC only).

I stand by my original comment that "Deviating this far from architectural behavior
will end in tears at some point."  Rather than try to "fix" APICv, I vote to instead
either reject KVM_GUESTDBG_BLOCKIRQ if APICv=1, or log a debug message saying that
KVM_GUESTDBG_BLOCKIRQ is ineffective with APICv=1.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-11-02 15:53             ` Sean Christopherson
@ 2021-11-02 16:18               ` Vitaly Kuznetsov
  2021-11-02 18:45                 ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Vitaly Kuznetsov @ 2021-11-02 16:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, Paolo Bonzini, kvm, Kieran Bingham, Jan Kiszka,
	Andrew Jones, Jonathan Corbet, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Nov 02, 2021, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> > I haven't verified on hardware, but my guess is that this code in vmx_vcpu_run()
>> >
>> > 	/* When single-stepping over STI and MOV SS, we must clear the
>> > 	 * corresponding interruptibility bits in the guest state. Otherwise
>> > 	 * vmentry fails as it then expects bit 14 (BS) in pending debug
>> > 	 * exceptions being set, but that's not correct for the guest debugging
>> > 	 * case. */
>> > 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>> > 		vmx_set_interrupt_shadow(vcpu, 0);
>> >
>> > interacts badly with APICv=1.  It will kill the STI shadow and cause the IRQ in
>> > vmcs.GUEST_RVI to be recognized when it (micro-)architecturally should not.  My
>> > head is going in circles trying to sort out what would actually happen.  Maybe
>> > comment out that and/or disable APICv to see if either one makes the test pass?
>> >
>> 
>> Interestingly,
>> 
>> loading 'kvm-intel' with 'enable_apicv=0' makes the test pass, however,
>> commenting out "vmx_set_interrupt_shadow()" as suggested gives a
>> different result (with enable_apicv=1):
>> 
>> # ./x86_64/debug_regs 
>> ==== Test Assertion Failure ====
>>   x86_64/debug_regs.c:179: run->exit_reason == KVM_EXIT_DEBUG && run->debug.arch.exception == DB_VECTOR && run->debug.arch.pc == target_rip && run->debug.arch.dr6 == target_dr6
>>   pid=16352 tid=16352 errno=0 - Success
>>      1	0x0000000000402b33: main at debug_regs.c:179 (discriminator 10)
>>      2	0x00007f36401bd554: ?? ??:0
>>      3	0x00000000004023a9: _start at ??:?
>>   SINGLE_STEP[1]: exit 9 exception -2147483615 rip 0x1 (should be 0x4024d9) dr6 0xffff4ff0 (should be 0xffff4ff0)
>
> Exit 9 is KVM_EXIT_FAIL_ENTRY, which in this case VM-Entry likely failed due to
> invalid guest state because there was STI blocking with single-step enabled but
> no pending BS #DB:
>
>   Bit 14 (BS) must be 1 if the TF flag (bit 8) in the RFLAGS field is 1 and the
>   BTF flag (bit 1) in the IA32_DEBUGCTL field is 0.
>
> Which is precisely what that hack-a-fix avoids.  There isn't really a clean
> solution for legacy single-step, AFAIK the only way to avoid this would be to
> switch KVM_GUESTDBG_SINGLESTEP to use MTF.
>
> But that mess is a red herring, the test fails with the same signature with APICv=1
> if the STI is replaced by PUSHF+BTS+POPFD (to avoid the STI shadow).  We all missed
> this key detail from Vitaly's report:
>
> SINGLE_STEP[1]: exit 8 exception 1 rip 0x402a25 (should be 0x402a27) dr6 0xffff4ff0 (should be 0xffff4ff0)
>                 ^^^^^^
>
> Exit '8' is KVM_EXIT_SHUTDOWN, i.e. the arrival of the IRQ hosed the guest because
> the test doesn't invoke vm_init_descriptor_tables() to install event handlers.
> The "exception 1" shows up because the run page isn't sanitized by the test, i.e.
> it's stale data that happens to match.
>
> So I would fully expect this test to fail with AVIC=1.  The problem is that
> KVM_GUESTDBG_BLOCKIRQ does absolutely nothing to handle APICv interrupts.  And
> even if KVM does something to fudge that behavior in the emulated local APIC, the
> test will then fail miserably virtual IPIs (currently AVIC only).

FWIW, the test doesn't seem to fail on my AMD EPYC system even with "avic=1" ...

>
> I stand by my original comment that "Deviating this far from architectural behavior
> will end in tears at some point."  Rather than try to "fix" APICv, I vote to instead
> either reject KVM_GUESTDBG_BLOCKIRQ if APICv=1, or log a debug message saying that
> KVM_GUESTDBG_BLOCKIRQ is ineffective with APICv=1.
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-11-02 16:18               ` Vitaly Kuznetsov
@ 2021-11-02 18:45                 ` Sean Christopherson
  2021-11-03  9:04                   ` Maxim Levitsky
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-11-02 18:45 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Maxim Levitsky, Paolo Bonzini, kvm, Kieran Bingham, Jan Kiszka,
	Andrew Jones, Jonathan Corbet, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

On Tue, Nov 02, 2021, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> > But that mess is a red herring, the test fails with the same signature with APICv=1
> > if the STI is replaced by PUSHF+BTS+POPFD (to avoid the STI shadow).  We all missed
> > this key detail from Vitaly's report:
> >
> > SINGLE_STEP[1]: exit 8 exception 1 rip 0x402a25 (should be 0x402a27) dr6 0xffff4ff0 (should be 0xffff4ff0)
> >                 ^^^^^^
> >
> > Exit '8' is KVM_EXIT_SHUTDOWN, i.e. the arrival of the IRQ hosed the guest because
> > the test doesn't invoke vm_init_descriptor_tables() to install event handlers.
> > The "exception 1" shows up because the run page isn't sanitized by the test, i.e.
> > it's stale data that happens to match.
> >
> > So I would fully expect this test to fail with AVIC=1.  The problem is that
> > KVM_GUESTDBG_BLOCKIRQ does absolutely nothing to handle APICv interrupts.  And
> > even if KVM does something to fudge that behavior in the emulated local APIC, the
> > test will then fail miserably virtual IPIs (currently AVIC only).
> 
> FWIW, the test doesn't seem to fail on my AMD EPYC system even with "avic=1" ...

Huh.  Assuming the IRQ is pending in the vIRR and KVM didn't screw up elsewhere,
that seems like a CPU AVIC bug.  #DBs have priority over IRQs, but single-step
#DBs are trap-like and KVM (hopefully) isn't injecting a #DB, so a pending IRQ
should be taken on the current instruction in the guest when executing VMRUN with
guest.EFLAGS.IF=1,TF=1 since there will be a one-instruction delay before the
single-step #DB kicks in.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
  2021-11-02 18:45                 ` Sean Christopherson
@ 2021-11-03  9:04                   ` Maxim Levitsky
  2021-11-03  9:29                     ` [PATCH] KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active Maxim Levitsky
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim Levitsky @ 2021-11-03  9:04 UTC (permalink / raw)
  To: Sean Christopherson, Vitaly Kuznetsov
  Cc: Paolo Bonzini, kvm, Kieran Bingham, Jan Kiszka, Andrew Jones,
	Jonathan Corbet, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton

On Tue, 2021-11-02 at 18:45 +0000, Sean Christopherson wrote:
> On Tue, Nov 02, 2021, Vitaly Kuznetsov wrote:
> > Sean Christopherson <seanjc@google.com> writes:
> > > But that mess is a red herring, the test fails with the same signature with APICv=1
> > > if the STI is replaced by PUSHF+BTS+POPFD (to avoid the STI shadow).  We all missed
> > > this key detail from Vitaly's report:
> > > 
> > > SINGLE_STEP[1]: exit 8 exception 1 rip 0x402a25 (should be 0x402a27) dr6 0xffff4ff0 (should be 0xffff4ff0)
> > >                 ^^^^^^
> > > 
> > > Exit '8' is KVM_EXIT_SHUTDOWN, i.e. the arrival of the IRQ hosed the guest because
> > > the test doesn't invoke vm_init_descriptor_tables() to install event handlers.
> > > The "exception 1" shows up because the run page isn't sanitized by the test, i.e.
> > > it's stale data that happens to match.
> > > 
> > > So I would fully expect this test to fail with AVIC=1.  The problem is that
> > > KVM_GUESTDBG_BLOCKIRQ does absolutely nothing to handle APICv interrupts.  And
> > > even if KVM does something to fudge that behavior in the emulated local APIC, the
> > > test will then fail miserably virtual IPIs (currently AVIC only).
> > 
> > FWIW, the test doesn't seem to fail on my AMD EPYC system even with "avic=1" ...
Its because AVIC is inhibited for many reasons. In this test x2apic is used,
and having x2apic in CPUID inhibits AVIC.

> 
> Huh.  Assuming the IRQ is pending in the vIRR and KVM didn't screw up elsewhere,
> that seems like a CPU AVIC bug.  #DBs have priority over IRQs, but single-step
> #DBs are trap-like and KVM (hopefully) isn't injecting a #DB, so a pending IRQ
> should be taken on the current instruction in the guest when executing VMRUN with
> guest.EFLAGS.IF=1,TF=1 since there will be a one-instruction delay before the
> single-step #DB kicks in.
> 
We could inhibit AVIC/APICv when KVM_GUESTDBG_BLOCKIRQ is in use, I'll send patch for
this soon.

Thanks a lot for finding out what is going on!

Best regards,	Maxim Levitsky


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active
  2021-11-03  9:04                   ` Maxim Levitsky
@ 2021-11-03  9:29                     ` Maxim Levitsky
  2021-11-03  9:31                       ` Maxim Levitsky
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim Levitsky @ 2021-11-03  9:29 UTC (permalink / raw)
  To: kvm
  Cc: Thomas Gleixner, linux-kernel, Paolo Bonzini,
	Sean Christopherson, Joerg Roedel, Vitaly Kuznetsov,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Ingo Molnar, Jim Mattson, Wanpeng Li,
	Maxim Levitsky

KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
standard kvm's inject_pending_event, and not via APICv/AVIC.

Since this is a debug feature, just inhibit it while it
is in use.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/svm/avic.c         | 3 ++-
 arch/x86/kvm/vmx/vmx.c          | 3 ++-
 arch/x86/kvm/x86.c              | 3 +++
 4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 88fce6ab4bbd7..8f6e15b95a4d8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1034,6 +1034,7 @@ struct kvm_x86_msr_filter {
 #define APICV_INHIBIT_REASON_IRQWIN     3
 #define APICV_INHIBIT_REASON_PIT_REINJ  4
 #define APICV_INHIBIT_REASON_X2APIC	5
+#define APICV_INHIBIT_REASON_BLOCKIRQ	6
 
 struct kvm_arch {
 	unsigned long n_used_mmu_pages;
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8052d92069e01..affc0ea98d302 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -904,7 +904,8 @@ bool svm_check_apicv_inhibit_reasons(ulong bit)
 			  BIT(APICV_INHIBIT_REASON_NESTED) |
 			  BIT(APICV_INHIBIT_REASON_IRQWIN) |
 			  BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
-			  BIT(APICV_INHIBIT_REASON_X2APIC);
+			  BIT(APICV_INHIBIT_REASON_X2APIC) |
+			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ);
 
 	return supported & BIT(bit);
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 71f54d85f104c..e4fc9ff7cd944 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7565,7 +7565,8 @@ static void hardware_unsetup(void)
 static bool vmx_check_apicv_inhibit_reasons(ulong bit)
 {
 	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
-			  BIT(APICV_INHIBIT_REASON_HYPERV);
+			  BIT(APICV_INHIBIT_REASON_HYPERV) |
+			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ);
 
 	return supported & BIT(bit);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac83d873d65b0..eb3b8d2375713 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10747,6 +10747,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
 		vcpu->arch.singlestep_rip = kvm_get_linear_rip(vcpu);
 
+	kvm_request_apicv_update(vcpu->kvm,
+			!(vcpu->guest_debug & KVM_GUESTDBG_BLOCKIRQ),
+			APICV_INHIBIT_REASON_BLOCKIRQ);
 	/*
 	 * Trigger an rflags update that will inject or remove the trace
 	 * flags.
-- 
2.26.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active
  2021-11-03  9:29                     ` [PATCH] KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active Maxim Levitsky
@ 2021-11-03  9:31                       ` Maxim Levitsky
  0 siblings, 0 replies; 30+ messages in thread
From: Maxim Levitsky @ 2021-11-03  9:31 UTC (permalink / raw)
  To: kvm
  Cc: Thomas Gleixner, linux-kernel, Paolo Bonzini,
	Sean Christopherson, Joerg Roedel, Vitaly Kuznetsov,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Ingo Molnar, Jim Mattson, Wanpeng Li

On Wed, 2021-11-03 at 11:29 +0200, Maxim Levitsky wrote:
> KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
> standard kvm's inject_pending_event, and not via APICv/AVIC.
> 
> Since this is a debug feature, just inhibit it while it
> is in use.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/svm/avic.c         | 3 ++-
>  arch/x86/kvm/vmx/vmx.c          | 3 ++-
>  arch/x86/kvm/x86.c              | 3 +++
>  4 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 88fce6ab4bbd7..8f6e15b95a4d8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1034,6 +1034,7 @@ struct kvm_x86_msr_filter {
>  #define APICV_INHIBIT_REASON_IRQWIN     3
>  #define APICV_INHIBIT_REASON_PIT_REINJ  4
>  #define APICV_INHIBIT_REASON_X2APIC	5
> +#define APICV_INHIBIT_REASON_BLOCKIRQ	6
>  
>  struct kvm_arch {
>  	unsigned long n_used_mmu_pages;
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 8052d92069e01..affc0ea98d302 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -904,7 +904,8 @@ bool svm_check_apicv_inhibit_reasons(ulong bit)
>  			  BIT(APICV_INHIBIT_REASON_NESTED) |
>  			  BIT(APICV_INHIBIT_REASON_IRQWIN) |
>  			  BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
> -			  BIT(APICV_INHIBIT_REASON_X2APIC);
> +			  BIT(APICV_INHIBIT_REASON_X2APIC) |
> +			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ);
>  
>  	return supported & BIT(bit);
>  }
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 71f54d85f104c..e4fc9ff7cd944 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7565,7 +7565,8 @@ static void hardware_unsetup(void)
>  static bool vmx_check_apicv_inhibit_reasons(ulong bit)
>  {
>  	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
> -			  BIT(APICV_INHIBIT_REASON_HYPERV);
> +			  BIT(APICV_INHIBIT_REASON_HYPERV) |
> +			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ);
>  
>  	return supported & BIT(bit);
>  }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ac83d873d65b0..eb3b8d2375713 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10747,6 +10747,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>  	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>  		vcpu->arch.singlestep_rip = kvm_get_linear_rip(vcpu);
>  
> +	kvm_request_apicv_update(vcpu->kvm,
> +			!(vcpu->guest_debug & KVM_GUESTDBG_BLOCKIRQ),
> +			APICV_INHIBIT_REASON_BLOCKIRQ);
I need more coffee, bad indention here :)

Could you test if this patch works for you?

I managed to reproduce this issue on AVIC, by patching out the x2apic inhibit,
and then this patch fixed the issue for me.


Best regards,
	Maxim Levitsky

>  	/*
>  	 * Trigger an rflags update that will inject or remove the trace
>  	 * flags.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask
  2021-09-02 16:56   ` Sean Christopherson
@ 2022-02-08 14:34     ` Maxim Levitsky
  2022-03-08 23:37       ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim Levitsky @ 2022-02-08 14:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton, Borislav Petkov

On Thu, 2021-09-02 at 16:56 +0000, Sean Christopherson wrote:
> Assuming this hasn't been abandoned...
> 
> On Wed, Aug 11, 2021, Maxim Levitsky wrote:
> > This parameter will be used by VMX and SVM code to force
> > interception of a set of exceptions, given by a bitmask
> > for guest debug and/or kvm debug.
> > 
> > This is based on an idea first shown here:
> > https://patchwork.kernel.org/project/kvm/patch/20160301192822.GD22677@pd.tnic/
> > 
> > CC: Borislav Petkov <bp@suse.de>
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/x86.c | 3 +++
> >  arch/x86/kvm/x86.h | 2 ++
> >  2 files changed, 5 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index fdc0c18339fb..092e2fad3c0d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -184,6 +184,9 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
> >  int __read_mostly pi_inject_timer = -1;
> >  module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
> >  
> > +uint force_intercept_exceptions_mask;
> > +module_param(force_intercept_exceptions_mask, uint, S_IRUGO | S_IWUSR);
> 
> Use octal permissions.  This also can't be a simple writable param, at least not
> without a well-documented disclaimer, as there's no guarantee a vCPU will update
> its exception bitmap in a timely fashion.  An alternative to a module param would
> be to extend/add a per-VM ioctl(), e.g. maybe KVM_SET_GUEST_DEBUG?  The downside
> of an ioctl() is that it would require userspace enabling :-/
> 

All other module params in this file use macros for permissions, that is why
I used them too.

I'll add a comment with a disclaimer here - this is only for debug.
I strongly don't want to have this as ioctl as that will indeed need qemu patches,
not to mention things like unit tests and which don't even always use qemu.

Or I can make this parameter read-only. I don't mind reloading kvm module when
I change this parameter.


Best regards,
	Maxim Levitsky

PS: Forgot to send this mail, it was in my drafts folder.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 3/6] KVM: SVM: implement force_intercept_exceptions_mask
  2021-09-02 17:34     ` Sean Christopherson
@ 2022-02-08 14:35       ` Maxim Levitsky
  0 siblings, 0 replies; 30+ messages in thread
From: Maxim Levitsky @ 2022-02-08 14:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton

On Thu, 2021-09-02 at 17:34 +0000, Sean Christopherson wrote:
> On Wed, Aug 11, 2021, Maxim Levitsky wrote:
> > On Wed, 2021-08-11 at 15:29 +0300, Maxim Levitsky wrote:
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index e45259177009..19f54b07161a 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -233,6 +233,8 @@ static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
> > >  #define MSRS_RANGE_SIZE 2048
> > >  #define MSRS_IN_RANGE (MSRS_RANGE_SIZE * 8 / 2)
> > >  
> > > +static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code);
> > > +
> > >  u32 svm_msrpm_offset(u32 msr)
> > >  {
> > >  	u32 offset;
> > > @@ -1153,6 +1155,22 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
> > >  	}
> > >  }
> > >  
> > > +static void svm_init_force_exceptions_intercepts(struct vcpu_svm *svm)
> > > +{
> > > +	int exc;
> > > +
> > > +	svm->force_intercept_exceptions_mask = force_intercept_exceptions_mask;
> 
> Ah, the param is being snapshotted on vCPU creation, hence the writable module
> param.  That works, though it'd be better to snapshot it on a per-VM basic, not
> per-vCPU, and do so in common x86 code so that the param doesn't need to be
> exported.

I have nothing against that.

> 
> > > +	for (exc = 0 ; exc < 32 ; exc++) {
> 
> for_each_set_bit()
I used a helper function instead, IMHO a bit cleaner.

> 
> > > +		if (!(svm->force_intercept_exceptions_mask & (1 << exc)))
> > > +			continue;
> > > +
> > > +		/* Those are defined to have undefined behavior in the SVM spec */
> > > +		if (exc != 2 && exc != 9)
> 
> Maybe add a pr_warn_once() to let the user know they done messed up?
> 
> And given that there are already intercepts with undefined behavior, it's probably
> best to disallow intercepting anything we aren't 100% postive will be handled
> correctly, e.g. intercepting vector 31 is nonsensical at this time.

Or I think I'll just drop this check altogether - this is a debug feature anyway.

> 
> > > +			continue;
> > > +		set_exception_intercept(svm, exc);
> 
> ...
> 
> > > +static int gen_exc_interception(struct kvm_vcpu *vcpu)
> > > +{
> > > +	/*
> > > +	 * Generic exception intercept handler which forwards a guest exception
> > > +	 * as-is to the guest.
> > > +	 * For exceptions that don't have a special intercept handler.
> > > +	 *
> > > +	 * Used only for 'force_intercept_exceptions_mask' KVM debug feature.
> > > +	 */
> > > +	struct vcpu_svm *svm = to_svm(vcpu);
> > > +	int exc = svm->vmcb->control.exit_code - SVM_EXIT_EXCP_BASE;
> > > +
> > > +	/* SVM doesn't provide us with an error code for the #DF */
> > > +	u32 err_code = exc == DF_VECTOR ? 0 : svm->vmcb->control.exit_info_1;
> 
> Might be better to handle this in the x86_exception_has_error_code() path to
> avoid confusing readers with respect to exceptions that don't have an error code,
> e.g.
> 
> 	else if (x86_exception_has_error_code(exc)) {
> 		/* SVM doesn't provide the error code on #DF :-( */
> 		if (exc == DF_VECTOR)
> 			kvm_queue_exception_e(vcpu, exc, 0);
> 		else
> 			kvm_queue_exception_e(vcpu, exc, svm->vmcb->control.exit_info_1);
> 	} else {
> 		...
> 	}
> 
> Alternatively, can we zero svm->vmcb->control.exit_info_1 on #DF to make it more
> obvious that SVM leaves stale data in exit_info_1 (assuming that's true)?  E.g.
> 
> 	...
> 
> 	if (exc == TS_VECTOR) {
> 		...
> 	} else if (x86_exception_has_error_code(exc)) {
> 		/* SVM doesn't provide the error code on #DF :-( */
> 		if (exc == DF_VECTOR)
> 			svm->vmcb->control.exit_info_1 = 0;
> 
> 		kvm_queue_exception_e(vcpu, exc, svm->vmcb->control.exit_info_1);
> 	} else {
> 		...
> 	}

Makes sense.

> 
> 		
> > > +
> > > +	if (!(svm->force_intercept_exceptions_mask & (1 << exc)))
> 
> BIT(exc)
I added a helper function in common x86 code for this.

> 
> > > +		return svm_handle_invalid_exit(vcpu, svm->vmcb->control.exit_code);
> > > +
> > > +	if (exc == TS_VECTOR) {
> > > +		/*
> > > +		 * SVM doesn't provide us with an error code to be able to
> > > +		 * re-inject the #TS exception, so just disable its
> > > +		 * intercept, and let the guest re-execute the instruction.
> > > +		 */
> > > +		vmcb_clr_intercept(&svm->vmcb01.ptr->control,
> > > +				   INTERCEPT_EXCEPTION_OFFSET + TS_VECTOR);
> 
> Maybe just disallow intercepting #TS altogether?  Or does this fall into your
> Win98 use case? :-)

Win98 does indeed generate few #TS exceptions but so far I haven't noticed
any issues related to task switches. Anyway I would like to intercept
as much as possible since this is a debug feature. A single interception
is still better that nothing.


> 
> > > +		recalc_intercepts(svm);
> > > +	} else if (x86_exception_has_error_code(exc))
> > > +		kvm_queue_exception_e(vcpu, exc, err_code);
> > > +	else
> > > +		kvm_queue_exception(vcpu, exc);
> > > +	return 1;
> > > +}
> > > +
> > >  static bool is_erratum_383(void)
> > >  {
> > >  	int err, i;
> > > @@ -3065,6 +3131,10 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> > >  	[SVM_EXIT_WRITE_DR5]			= dr_interception,
> > >  	[SVM_EXIT_WRITE_DR6]			= dr_interception,
> > >  	[SVM_EXIT_WRITE_DR7]			= dr_interception,
> > > +
> > > +	[SVM_EXIT_EXCP_BASE ...
> > > +	SVM_EXIT_EXCP_BASE + 31]		= gen_exc_interception,
> 
> This generates a Sparse warning due to the duplicate initializer.  IMO that's a
> very good warning as I have zero idea how the compiler actually handles this
> particular scenario, e.g. do later entries take priority, is it technically
> "undefined" behavior, etc...
> 
> arch/x86/kvm/svm/svm.c:3065:10: warning: Initializer entry defined twice
> arch/x86/kvm/svm/svm.c:3067:29:   also defined here
> 
> I don't have a clever solution though :-('

Good catch. I thought that this would make sense but standards never make sense.
I'll do this manually.

> 
> > > +
> > >  	[SVM_EXIT_EXCP_BASE + DB_VECTOR]	= db_interception,
> > >  	[SVM_EXIT_EXCP_BASE + BP_VECTOR]	= bp_interception,
> > >  	[SVM_EXIT_EXCP_BASE + UD_VECTOR]	= ud_interception,
> > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > index 524d943f3efc..187ada7c5b03 100644
> > > --- a/arch/x86/kvm/svm/svm.h
> > > +++ b/arch/x86/kvm/svm/svm.h
> > > @@ -196,6 +196,7 @@ struct vcpu_svm {
> > >  	bool ghcb_sa_free;
> > >  
> > >  	bool guest_state_loaded;
> > > +	u32 force_intercept_exceptions_mask;
> > >  };
> > >  
> > >  struct svm_cpu_data {
> > > @@ -351,8 +352,11 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, u32 bit)
> > >  	struct vmcb *vmcb = svm->vmcb01.ptr;
> > >  
> > >  	WARN_ON_ONCE(bit >= 32);
> > > -	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
> > >  
> > > +	if ((1 << bit) & svm->force_intercept_exceptions_mask)
> 
> BIT(bit)
Fixed with helper function as well.

> 
> > > +		return;
> > > +
> > > +	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
> > >  	recalc_intercepts(svm);
> > >  }


Thanks for the review!
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask
  2022-02-08 14:34     ` Maxim Levitsky
@ 2022-03-08 23:37       ` Sean Christopherson
  2022-03-09 12:31         ` Maxim Levitsky
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2022-03-08 23:37 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton, Borislav Petkov

On Tue, Feb 08, 2022, Maxim Levitsky wrote:
> On Thu, 2021-09-02 at 16:56 +0000, Sean Christopherson wrote:
> > Assuming this hasn't been abandoned...
> > 
> > On Wed, Aug 11, 2021, Maxim Levitsky wrote:
> > > This parameter will be used by VMX and SVM code to force
> > > interception of a set of exceptions, given by a bitmask
> > > for guest debug and/or kvm debug.
> > > 
> > > This is based on an idea first shown here:
> > > https://patchwork.kernel.org/project/kvm/patch/20160301192822.GD22677@pd.tnic/
> > > 
> > > CC: Borislav Petkov <bp@suse.de>
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >  arch/x86/kvm/x86.c | 3 +++
> > >  arch/x86/kvm/x86.h | 2 ++
> > >  2 files changed, 5 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index fdc0c18339fb..092e2fad3c0d 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -184,6 +184,9 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
> > >  int __read_mostly pi_inject_timer = -1;
> > >  module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
> > >  
> > > +uint force_intercept_exceptions_mask;
> > > +module_param(force_intercept_exceptions_mask, uint, S_IRUGO | S_IWUSR);
> > 
> > Use octal permissions.  This also can't be a simple writable param, at least not
> > without a well-documented disclaimer, as there's no guarantee a vCPU will update
> > its exception bitmap in a timely fashion.  An alternative to a module param would
> > be to extend/add a per-VM ioctl(), e.g. maybe KVM_SET_GUEST_DEBUG?  The downside
> > of an ioctl() is that it would require userspace enabling :-/
> > 
> 
> All other module params in this file use macros for permissions, that is why
> I used them too.
> 
> I'll add a comment with a disclaimer here - this is only for debug.
> I strongly don't want to have this as ioctl as that will indeed need qemu patches,
> not to mention things like unit tests and which don't even always use qemu.
> 
> Or I can make this parameter read-only. I don't mind reloading kvm module when
> I change this parameter.

Oh!  We can force an update, a la nx_huge_pages, where the setter loops through
all VMs and does a kvm_make_all_cpus_request() to instruct vCPUs to update their
bitmaps.  Requires a new request, but that doesn't seem like a huge deal, and it
might help pave the way for adding more debug hooks for developers.

The param should also be "unsafe".

E.g. something like

static const struct kernel_param_ops force_ex_intercepts_ops = {
	.set = set_force_exception_intercepts,
	.get = get_force_exception_intercepts,
};
module_param_cb_unsafe(force_exception_intercepts, &force_ex_intercepts_ops,
		       &force_exception_intercepts, 0644);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask
  2022-03-08 23:37       ` Sean Christopherson
@ 2022-03-09 12:31         ` Maxim Levitsky
  2022-03-09 14:03           ` Paolo Bonzini
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim Levitsky @ 2022-03-09 12:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Paolo Bonzini, Joerg Roedel, Yang Weijiang,
	linux-kernel, Borislav Petkov,
	open list:KERNEL SELFTEST FRAMEWORK, open list:DOCUMENTATION,
	Shuah Khan, Andrew Morton, Borislav Petkov

On Tue, 2022-03-08 at 23:37 +0000, Sean Christopherson wrote:
> On Tue, Feb 08, 2022, Maxim Levitsky wrote:
> > On Thu, 2021-09-02 at 16:56 +0000, Sean Christopherson wrote:
> > > Assuming this hasn't been abandoned...
> > > 
> > > On Wed, Aug 11, 2021, Maxim Levitsky wrote:
> > > > This parameter will be used by VMX and SVM code to force
> > > > interception of a set of exceptions, given by a bitmask
> > > > for guest debug and/or kvm debug.
> > > > 
> > > > This is based on an idea first shown here:
> > > > https://patchwork.kernel.org/project/kvm/patch/20160301192822.GD22677@pd.tnic/
> > > > 
> > > > CC: Borislav Petkov <bp@suse.de>
> > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > ---
> > > >  arch/x86/kvm/x86.c | 3 +++
> > > >  arch/x86/kvm/x86.h | 2 ++
> > > >  2 files changed, 5 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index fdc0c18339fb..092e2fad3c0d 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -184,6 +184,9 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
> > > >  int __read_mostly pi_inject_timer = -1;
> > > >  module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
> > > >  
> > > > +uint force_intercept_exceptions_mask;
> > > > +module_param(force_intercept_exceptions_mask, uint, S_IRUGO | S_IWUSR);
> > > 
> > > Use octal permissions.  This also can't be a simple writable param, at least not
> > > without a well-documented disclaimer, as there's no guarantee a vCPU will update
> > > its exception bitmap in a timely fashion.  An alternative to a module param would
> > > be to extend/add a per-VM ioctl(), e.g. maybe KVM_SET_GUEST_DEBUG?  The downside
> > > of an ioctl() is that it would require userspace enabling :-/
> > > 
> > 
> > All other module params in this file use macros for permissions, that is why
> > I used them too.
> > 
> > I'll add a comment with a disclaimer here - this is only for debug.
> > I strongly don't want to have this as ioctl as that will indeed need qemu patches,
> > not to mention things like unit tests and which don't even always use qemu.
> > 
> > Or I can make this parameter read-only. I don't mind reloading kvm module when
> > I change this parameter.
> 
> Oh!  We can force an update, a la nx_huge_pages, where the setter loops through
> all VMs and does a kvm_make_all_cpus_request() to instruct vCPUs to update their
> bitmaps.  Requires a new request, but that doesn't seem like a huge deal, and it
> might help pave the way for adding more debug hooks for developers.

Question: is it worth it? Since I am very busy with various things, this feature,
beeing just small debug help which I used once in a while doesn't get much time from me.

I can implement this in the future no doubt.

Best regards,
	Maxim Levitsky

> 
> The param should also be "unsafe".

I didn't knew about unsafe parameters until recently. I can mark it as such,
no objections.

Best regards,
	Maxim Levitsky

> 
> E.g. something like
> 
> static const struct kernel_param_ops force_ex_intercepts_ops = {
> 	.set = set_force_exception_intercepts,
> 	.get = get_force_exception_intercepts,
> };
> module_param_cb_unsafe(force_exception_intercepts, &force_ex_intercepts_ops,
> 		       &force_exception_intercepts, 0644);
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask
  2022-03-09 12:31         ` Maxim Levitsky
@ 2022-03-09 14:03           ` Paolo Bonzini
  2022-03-09 15:40             ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2022-03-09 14:03 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Kieran Bingham, Jan Kiszka, Andrew Jones, Jonathan Corbet,
	Vitaly Kuznetsov, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton,
	Borislav Petkov

On 3/9/22 13:31, Maxim Levitsky wrote:
> Question: is it worth it? Since I am very busy with various things, this feature,
> beeing just small debug help which I used once in a while doesn't get much time from me.

I agree it's not very much worth.

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask
  2022-03-09 14:03           ` Paolo Bonzini
@ 2022-03-09 15:40             ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2022-03-09 15:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Maxim Levitsky, kvm, Kieran Bingham, Jan Kiszka, Andrew Jones,
	Jonathan Corbet, Vitaly Kuznetsov, Ingo Molnar, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Johannes Berg, Wanpeng Li, H. Peter Anvin, Jessica Yu,
	Jim Mattson, Joerg Roedel, Yang Weijiang, linux-kernel,
	Borislav Petkov, open list:KERNEL SELFTEST FRAMEWORK,
	open list:DOCUMENTATION, Shuah Khan, Andrew Morton,
	Borislav Petkov

On Wed, Mar 09, 2022, Paolo Bonzini wrote:
> On 3/9/22 13:31, Maxim Levitsky wrote:
> > Question: is it worth it? Since I am very busy with various things, this
> > feature, beeing just small debug help which I used once in a while doesn't
> > get much time from me.
> 
> I agree it's not very much worth.

I don't have a use case, was just trying to find the bottom of my inbox and came
across this thread.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-03-09 15:41 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 12:29 [PATCH v3 0/6] KVM: my debug patch queue Maxim Levitsky
2021-08-11 12:29 ` [PATCH v3 1/6] KVM: SVM: split svm_handle_invalid_exit Maxim Levitsky
2021-08-11 12:29 ` [PATCH v3 2/6] KVM: x86: add force_intercept_exceptions_mask Maxim Levitsky
2021-09-02 16:56   ` Sean Christopherson
2022-02-08 14:34     ` Maxim Levitsky
2022-03-08 23:37       ` Sean Christopherson
2022-03-09 12:31         ` Maxim Levitsky
2022-03-09 14:03           ` Paolo Bonzini
2022-03-09 15:40             ` Sean Christopherson
2021-08-11 12:29 ` [PATCH v3 3/6] KVM: SVM: implement force_intercept_exceptions_mask Maxim Levitsky
2021-08-11 14:26   ` Maxim Levitsky
2021-09-02 17:34     ` Sean Christopherson
2022-02-08 14:35       ` Maxim Levitsky
2021-08-11 12:29 ` [PATCH v3 4/6] scripts/gdb: rework lx-symbols gdb script Maxim Levitsky
2021-08-11 12:29 ` [PATCH v3 5/6] KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ Maxim Levitsky
2021-08-11 12:29 ` [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ Maxim Levitsky
2021-09-06 11:20   ` Paolo Bonzini
2021-09-06 21:03     ` Maxim Levitsky
2021-11-01 15:43     ` Vitaly Kuznetsov
2021-11-01 16:19       ` Maxim Levitsky
2021-11-01 23:21         ` Sean Christopherson
2021-11-02 10:46           ` Vitaly Kuznetsov
2021-11-02 15:53             ` Sean Christopherson
2021-11-02 16:18               ` Vitaly Kuznetsov
2021-11-02 18:45                 ` Sean Christopherson
2021-11-03  9:04                   ` Maxim Levitsky
2021-11-03  9:29                     ` [PATCH] KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active Maxim Levitsky
2021-11-03  9:31                       ` Maxim Levitsky
2021-08-11 13:10 ` [PATCH v3 0/6] KVM: my debug patch queue Paolo Bonzini
2021-08-11 13:22   ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).