LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2] workqueue: fix state-dump console deadlock
@ 2021-10-06 11:58 Johan Hovold
  2021-10-06 19:09 ` John Ogness
  2021-10-11 16:54 ` Tejun Heo
  0 siblings, 2 replies; 3+ messages in thread
From: Johan Hovold @ 2021-10-06 11:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	John Ogness, Greg Kroah-Hartman, Fabio Estevam, linux-serial,
	linux-kernel, Johan Hovold, stable

Console drivers often queue work while holding locks also taken in their
console write paths, something which can lead to deadlocks on SMP when
dumping workqueue state (e.g. sysrq-t or on suspend failures).

For serial console drivers this could look like:

	CPU0				CPU1
	----				----

	show_workqueue_state();
	  lock(&pool->lock);		<IRQ>
	  				  lock(&port->lock);
					  schedule_work();
					    lock(&pool->lock);
	  printk();
	    lock(console_owner);
	    lock(&port->lock);

where workqueues are, for example, used to push data to the line
discipline, process break signals and handle modem-status changes. Line
disciplines and serdev drivers can also queue work on write-wakeup
notifications, etc.

Reworking every console driver to avoid queuing work while holding locks
also taken in their write paths would complicate drivers and is neither
desirable or feasible.

Instead use the deferred-printk mechanism to avoid printing while
holding pool locks when dumping workqueue state.

Note that there are a few WARN_ON() assertions in the workqueue code
which could potentially also trigger a deadlock. Hopefully the ongoing
printk rework will provide a general solution for this eventually.

This was originally reported after a lockdep splat when executing
sysrq-t with the imx serial driver.

Fixes: 3494fc30846d ("workqueue: dump workqueues on sysrq-t")
Cc: stable@vger.kernel.org	# 4.0
Reported-by: Fabio Estevam <festevam@denx.de>
Tested-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
---

Changes in v2
 - defer printing also of worker pool state (Peter Mladek)
 - add Fabio's tested-by tag


 kernel/workqueue.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33a6b4a2443d..1b3eb1e9531f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4830,8 +4830,16 @@ void show_workqueue_state(void)
 
 		for_each_pwq(pwq, wq) {
 			raw_spin_lock_irqsave(&pwq->pool->lock, flags);
-			if (pwq->nr_active || !list_empty(&pwq->inactive_works))
+			if (pwq->nr_active || !list_empty(&pwq->inactive_works)) {
+				/*
+				 * Defer printing to avoid deadlocks in console
+				 * drivers that queue work while holding locks
+				 * also taken in their write paths.
+				 */
+				printk_deferred_enter();
 				show_pwq(pwq);
+				printk_deferred_exit();
+			}
 			raw_spin_unlock_irqrestore(&pwq->pool->lock, flags);
 			/*
 			 * We could be printing a lot from atomic context, e.g.
@@ -4849,7 +4857,12 @@ void show_workqueue_state(void)
 		raw_spin_lock_irqsave(&pool->lock, flags);
 		if (pool->nr_workers == pool->nr_idle)
 			goto next_pool;
-
+		/*
+		 * Defer printing to avoid deadlocks in console drivers that
+		 * queue work while holding locks also taken in their write
+		 * paths.
+		 */
+		printk_deferred_enter();
 		pr_info("pool %d:", pool->id);
 		pr_cont_pool_info(pool);
 		pr_cont(" hung=%us workers=%d",
@@ -4864,6 +4877,7 @@ void show_workqueue_state(void)
 			first = false;
 		}
 		pr_cont("\n");
+		printk_deferred_exit();
 	next_pool:
 		raw_spin_unlock_irqrestore(&pool->lock, flags);
 		/*
-- 
2.32.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] workqueue: fix state-dump console deadlock
  2021-10-06 11:58 [PATCH v2] workqueue: fix state-dump console deadlock Johan Hovold
@ 2021-10-06 19:09 ` John Ogness
  2021-10-11 16:54 ` Tejun Heo
  1 sibling, 0 replies; 3+ messages in thread
From: John Ogness @ 2021-10-06 19:09 UTC (permalink / raw)
  To: Johan Hovold, Tejun Heo
  Cc: Lai Jiangshan, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Greg Kroah-Hartman, Fabio Estevam, linux-serial, linux-kernel,
	Johan Hovold, stable

On 2021-10-06, Johan Hovold <johan@kernel.org> wrote:
> Console drivers often queue work while holding locks also taken in their
> console write paths, something which can lead to deadlocks on SMP when
> dumping workqueue state (e.g. sysrq-t or on suspend failures).
>
> For serial console drivers this could look like:
>
> 	CPU0				CPU1
> 	----				----
>
> 	show_workqueue_state();
> 	  lock(&pool->lock);		<IRQ>
> 	  				  lock(&port->lock);
> 					  schedule_work();
> 					    lock(&pool->lock);
> 	  printk();
> 	    lock(console_owner);
> 	    lock(&port->lock);
>
> where workqueues are, for example, used to push data to the line
> discipline, process break signals and handle modem-status changes. Line
> disciplines and serdev drivers can also queue work on write-wakeup
> notifications, etc.
>
> Reworking every console driver to avoid queuing work while holding locks
> also taken in their write paths would complicate drivers and is neither
> desirable or feasible.
>
> Instead use the deferred-printk mechanism to avoid printing while
> holding pool locks when dumping workqueue state.

When I introduced the printk_deferred_enter/exit functions, I kind of
expected patches like this to start showing up. The functions make it
really convenient to establish general sections of console print
deferring.

When we move to kthread-printers, all printk calls will be deferred
automatically. However, that is only during normal operation. The
various printk_deferred sites may still be significant and will continue
to have special meaning during startup and shutdown states, when the
kthreads will not be active and direct printing will exist as it is now.

FWIW, I am OK with this patch. It will be re-evaluated once we have
kthread-printers, but I suspect even then it will remain.

> Note that there are a few WARN_ON() assertions in the workqueue code
> which could potentially also trigger a deadlock. Hopefully the ongoing
> printk rework will provide a general solution for this eventually.
>
> This was originally reported after a lockdep splat when executing
> sysrq-t with the imx serial driver.
>
> Fixes: 3494fc30846d ("workqueue: dump workqueues on sysrq-t")
> Cc: stable@vger.kernel.org	# 4.0
> Reported-by: Fabio Estevam <festevam@denx.de>
> Tested-by: Fabio Estevam <festevam@denx.de>
> Signed-off-by: Johan Hovold <johan@kernel.org>

Reviewed-by: John Ogness <john.ogness@linutronix.de>

> ---
>
> Changes in v2
>  - defer printing also of worker pool state (Peter Mladek)
>  - add Fabio's tested-by tag
>
>
>  kernel/workqueue.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 33a6b4a2443d..1b3eb1e9531f 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4830,8 +4830,16 @@ void show_workqueue_state(void)
>  
>  		for_each_pwq(pwq, wq) {
>  			raw_spin_lock_irqsave(&pwq->pool->lock, flags);
> -			if (pwq->nr_active || !list_empty(&pwq->inactive_works))
> +			if (pwq->nr_active || !list_empty(&pwq->inactive_works)) {
> +				/*
> +				 * Defer printing to avoid deadlocks in console
> +				 * drivers that queue work while holding locks
> +				 * also taken in their write paths.
> +				 */
> +				printk_deferred_enter();
>  				show_pwq(pwq);
> +				printk_deferred_exit();
> +			}
>  			raw_spin_unlock_irqrestore(&pwq->pool->lock, flags);
>  			/*
>  			 * We could be printing a lot from atomic context, e.g.
> @@ -4849,7 +4857,12 @@ void show_workqueue_state(void)
>  		raw_spin_lock_irqsave(&pool->lock, flags);
>  		if (pool->nr_workers == pool->nr_idle)
>  			goto next_pool;
> -
> +		/*
> +		 * Defer printing to avoid deadlocks in console drivers that
> +		 * queue work while holding locks also taken in their write
> +		 * paths.
> +		 */
> +		printk_deferred_enter();
>  		pr_info("pool %d:", pool->id);
>  		pr_cont_pool_info(pool);
>  		pr_cont(" hung=%us workers=%d",
> @@ -4864,6 +4877,7 @@ void show_workqueue_state(void)
>  			first = false;
>  		}
>  		pr_cont("\n");
> +		printk_deferred_exit();
>  	next_pool:
>  		raw_spin_unlock_irqrestore(&pool->lock, flags);
>  		/*
> -- 
> 2.32.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] workqueue: fix state-dump console deadlock
  2021-10-06 11:58 [PATCH v2] workqueue: fix state-dump console deadlock Johan Hovold
  2021-10-06 19:09 ` John Ogness
@ 2021-10-11 16:54 ` Tejun Heo
  1 sibling, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2021-10-11 16:54 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Lai Jiangshan, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	John Ogness, Greg Kroah-Hartman, Fabio Estevam, linux-serial,
	linux-kernel, stable

On Wed, Oct 06, 2021 at 01:58:52PM +0200, Johan Hovold wrote:
> Console drivers often queue work while holding locks also taken in their
> console write paths, something which can lead to deadlocks on SMP when
> dumping workqueue state (e.g. sysrq-t or on suspend failures).
> 
> For serial console drivers this could look like:
> 
> 	CPU0				CPU1
> 	----				----
> 
> 	show_workqueue_state();
> 	  lock(&pool->lock);		<IRQ>
> 	  				  lock(&port->lock);
> 					  schedule_work();
> 					    lock(&pool->lock);
> 	  printk();
> 	    lock(console_owner);
> 	    lock(&port->lock);
> 
> where workqueues are, for example, used to push data to the line
> discipline, process break signals and handle modem-status changes. Line
> disciplines and serdev drivers can also queue work on write-wakeup
> notifications, etc.
> 
> Reworking every console driver to avoid queuing work while holding locks
> also taken in their write paths would complicate drivers and is neither
> desirable or feasible.
> 
> Instead use the deferred-printk mechanism to avoid printing while
> holding pool locks when dumping workqueue state.
> 
> Note that there are a few WARN_ON() assertions in the workqueue code
> which could potentially also trigger a deadlock. Hopefully the ongoing
> printk rework will provide a general solution for this eventually.
> 
> This was originally reported after a lockdep splat when executing
> sysrq-t with the imx serial driver.
> 
> Fixes: 3494fc30846d ("workqueue: dump workqueues on sysrq-t")
> Cc: stable@vger.kernel.org	# 4.0
> Reported-by: Fabio Estevam <festevam@denx.de>
> Tested-by: Fabio Estevam <festevam@denx.de>
> Signed-off-by: Johan Hovold <johan@kernel.org>

Applied to wq/for-5.15-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-10-11 16:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-06 11:58 [PATCH v2] workqueue: fix state-dump console deadlock Johan Hovold
2021-10-06 19:09 ` John Ogness
2021-10-11 16:54 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).