LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable
@ 2015-02-04 13:27 Joerg Roedel
  2015-02-05  5:51 ` Jiang Liu
  2015-02-18 17:08 ` [tip:irq/urgent] x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable() tip-bot for Joerg Roedel
  0 siblings, 2 replies; 4+ messages in thread
From: Joerg Roedel @ 2015-02-04 13:27 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Prarit Bhargava,
	Yinghai Lu, Jiang Liu
  Cc: x86, linux-kernel, alnovak, jroedel, joro

Hi,

here is a patch to fix a kernel panic at shutdown we have seen recently.
The issue is hard to reproduce, so the patch description about the root
cause of the bug comes only from review and my current understanding of
x86 irq code.

So if what I wrote in the patch description doesn't make sense, please
let me know.

Constructive comments and feedback welcome.

Thanks,

	Joerg

>From 153e8f6cf39c42dd9767eb49a27eacfb69738fb0 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <jroedel@suse.de>
Date: Wed, 4 Feb 2015 13:33:33 +0100
Subject: [PATCH] x86: irq: Check for valid irq descriptor in
 check_irq_vectors_for_cpu_disable

When an interrupt is migrated away from a cpu it will stay
in its vector_irq array until smp_irq_move_cleanup_interrupt
succeeded. The cfg->move_in_progress flag is cleared already
when the IPI was sent.

When the interrupt is destroyed after migration its 'struct
irq_desc' is freed and the vector_irq arrays are cleaned up.
But since cfg->move_in_progress is already 0 the references
at cpus before the last migration will not be cleared. So
this would leave a reference to an already destroyed irq
alive.

When the cpu is taken down at this point, the
check_irq_vectors_for_cpu_disable function finds a valid irq
number in the vector_irq array, but gets NULL for its
descriptor and dereferences it, causing a kernel panic.

This has been observed on real systems at shutdown. Add a
check to check_irq_vectors_for_cpu_disable for a valid
'struct irq_desc' to prevent this issue.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/irq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 705ef8d..67b1cbe 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -302,6 +302,9 @@ int check_irq_vectors_for_cpu_disable(void)
 		irq = __this_cpu_read(vector_irq[vector]);
 		if (irq >= 0) {
 			desc = irq_to_desc(irq);
+			if (!desc)
+				continue;
+
 			data = irq_desc_get_irq_data(desc);
 			cpumask_copy(&affinity_new, data->affinity);
 			cpu_clear(this_cpu, affinity_new);
-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable
  2015-02-04 13:27 [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable Joerg Roedel
@ 2015-02-05  5:51 ` Jiang Liu
  2015-02-06 12:28   ` Joerg Roedel
  2015-02-18 17:08 ` [tip:irq/urgent] x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable() tip-bot for Joerg Roedel
  1 sibling, 1 reply; 4+ messages in thread
From: Jiang Liu @ 2015-02-05  5:51 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Prarit Bhargava, Yinghai Lu
  Cc: x86, linux-kernel, alnovak, joro

On 2015/2/4 21:27, Joerg Roedel wrote:
> Hi,
> 
> here is a patch to fix a kernel panic at shutdown we have seen recently.
> The issue is hard to reproduce, so the patch description about the root
> cause of the bug comes only from review and my current understanding of
> x86 irq code.
> 
> So if what I wrote in the patch description doesn't make sense, please
> let me know.
> 
> Constructive comments and feedback welcome.
> 
> Thanks,
> 
> 	Joerg
> 
> From 153e8f6cf39c42dd9767eb49a27eacfb69738fb0 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <jroedel@suse.de>
> Date: Wed, 4 Feb 2015 13:33:33 +0100
> Subject: [PATCH] x86: irq: Check for valid irq descriptor in
>  check_irq_vectors_for_cpu_disable
> 
> When an interrupt is migrated away from a cpu it will stay
> in its vector_irq array until smp_irq_move_cleanup_interrupt
> succeeded. The cfg->move_in_progress flag is cleared already
> when the IPI was sent.
> 
> When the interrupt is destroyed after migration its 'struct
> irq_desc' is freed and the vector_irq arrays are cleaned up.
> But since cfg->move_in_progress is already 0 the references
> at cpus before the last migration will not be cleared. So
> this would leave a reference to an already destroyed irq
> alive.
> 
> When the cpu is taken down at this point, the
> check_irq_vectors_for_cpu_disable function finds a valid irq
> number in the vector_irq array, but gets NULL for its
> descriptor and dereferences it, causing a kernel panic.
> 
> This has been observed on real systems at shutdown. Add a
> check to check_irq_vectors_for_cpu_disable for a valid
> 'struct irq_desc' to prevent this issue.
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>

Actually there's another racing pattern.
for (irq = 0; irq < nr_irqs; irq++) {
	desc = irq_to_desc(irq);
	access desc->xxx
}

When sparsing IRQ is enabled, there's no mechanism to protect
desc returned by irq_to_desc(). Once I have considered a brute
solution of disabling freeing of irq_desc:(
Regards!
Gerry
> ---
>  arch/x86/kernel/irq.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index 705ef8d..67b1cbe 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -302,6 +302,9 @@ int check_irq_vectors_for_cpu_disable(void)
>  		irq = __this_cpu_read(vector_irq[vector]);
>  		if (irq >= 0) {
>  			desc = irq_to_desc(irq);
> +			if (!desc)
> +				continue;
> +
>  			data = irq_desc_get_irq_data(desc);
>  			cpumask_copy(&affinity_new, data->affinity);
>  			cpu_clear(this_cpu, affinity_new);
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable
  2015-02-05  5:51 ` Jiang Liu
@ 2015-02-06 12:28   ` Joerg Roedel
  0 siblings, 0 replies; 4+ messages in thread
From: Joerg Roedel @ 2015-02-06 12:28 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Prarit Bhargava, Yinghai Lu, x86, linux-kernel, alnovak

Hi Jiang,

On Thu, Feb 05, 2015 at 01:51:26PM +0800, Jiang Liu wrote:
> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>

Thanks for your review.

> Actually there's another racing pattern.
> for (irq = 0; irq < nr_irqs; irq++) {
> 	desc = irq_to_desc(irq);
> 	access desc->xxx
> }
> 
> When sparsing IRQ is enabled, there's no mechanism to protect
> desc returned by irq_to_desc(). Once I have considered a brute
> solution of disabling freeing of irq_desc:(

Hmm, how about wrapping the places that use irq_desc in rcu_read_lock()
and do a synchronize_rcu() before we free it (at least in the SPARSE_IRQ
case)? It wouldn't be a real RCU data structure, but we make at least
sure that we don't free an irq_desc thats in use.


	Joerg


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip:irq/urgent] x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable()
  2015-02-04 13:27 [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable Joerg Roedel
  2015-02-05  5:51 ` Jiang Liu
@ 2015-02-18 17:08 ` tip-bot for Joerg Roedel
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Joerg Roedel @ 2015-02-18 17:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, linux, torvalds, mingo, yinghai, linux-kernel, kys,
	jroedel, prarit, jiang.liu, JBeulich, peterz, hpa

Commit-ID:  d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1
Gitweb:     http://git.kernel.org/tip/d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 4 Feb 2015 13:33:33 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 18 Feb 2015 15:01:42 +0100

x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable()

When an interrupt is migrated away from a cpu it will stay
in its vector_irq array until smp_irq_move_cleanup_interrupt
succeeded. The cfg->move_in_progress flag is cleared already
when the IPI was sent.

When the interrupt is destroyed after migration its 'struct
irq_desc' is freed and the vector_irq arrays are cleaned up.
But since cfg->move_in_progress is already 0 the references
at cpus before the last migration will not be cleared. So
this would leave a reference to an already destroyed irq
alive.

When the cpu is taken down at this point, the
check_irq_vectors_for_cpu_disable() function finds a valid irq
number in the vector_irq array, but gets NULL for its
descriptor and dereferences it, causing a kernel panic.

This has been observed on real systems at shutdown. Add a
check to check_irq_vectors_for_cpu_disable() for a valid
'struct irq_desc' to prevent this issue.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: alnovak@suse.com
Cc: joro@8bytes.org
Link: http://lkml.kernel.org/r/20150204132754.GA10078@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/irq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 705ef8d..67b1cbe 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -302,6 +302,9 @@ int check_irq_vectors_for_cpu_disable(void)
 		irq = __this_cpu_read(vector_irq[vector]);
 		if (irq >= 0) {
 			desc = irq_to_desc(irq);
+			if (!desc)
+				continue;
+
 			data = irq_desc_get_irq_data(desc);
 			cpumask_copy(&affinity_new, data->affinity);
 			cpu_clear(this_cpu, affinity_new);

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-02-18 17:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-04 13:27 [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable Joerg Roedel
2015-02-05  5:51 ` Jiang Liu
2015-02-06 12:28   ` Joerg Roedel
2015-02-18 17:08 ` [tip:irq/urgent] x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable() tip-bot for Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).