LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH]  printk: make printk_safe_flush safe in NMI context by skipping flushing
@ 2018-05-29  2:51 Hoeun Ryu
  2018-05-29 12:13 ` Sergey Senozhatsky
  0 siblings, 1 reply; 7+ messages in thread
From: Hoeun Ryu @ 2018-05-29  2:51 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt; +Cc: Hoeun Ryu, linux-kernel

From: Hoeun Ryu <hoeun.ryu@lge.com>

 Make printk_safe_flush() safe in NMI context.
nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
function is called in watchdog_overflow_callback() if the flag of hardlockup
backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
watchdog_overflow_callback() function is called in NMI context on some
architectures.
 Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries
to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in
preempted contexts (task or irq in this case) or by other CPUs and it may cause
deadlocks.
 By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU
just skips flushing if the lock is not avaiable in NMI context. The messages in
per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in
hard lockup (because irq is disabled here) but if panic() is not called. The
flushing can be delayed by the next irq work in normal cases.

Suggested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Hoeun Ryu <hoeun.ryu@lge.com>
---
 kernel/printk/printk_safe.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c200..62bcc9b 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -254,6 +254,16 @@ void printk_safe_flush(void)
 {
 	int cpu;
 
+	/*
+	 * Just avoid deadlocks here, we could loose the messages in per-cpu nmi buffer
+	 * in the case that hardlockup happens but panic() is not called (irq_work won't
+	 * work).
+	 * The flushing can be delayed by the next irq_work if flushing is skippped here
+	 * in normal cases.
+	 */
+	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
+		return;
+
 	for_each_possible_cpu(cpu) {
 #ifdef CONFIG_PRINTK_NMI
 		__printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work);
-- 
2.1.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH]  printk: make printk_safe_flush safe in NMI context by skipping flushing
  2018-05-29  2:51 [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing Hoeun Ryu
@ 2018-05-29 12:13 ` Sergey Senozhatsky
  2018-05-29 22:43   ` Hoeun Ryu
  2018-05-30  8:32   ` Petr Mladek
  0 siblings, 2 replies; 7+ messages in thread
From: Sergey Senozhatsky @ 2018-05-29 12:13 UTC (permalink / raw)
  To: Hoeun Ryu
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Hoeun Ryu, linux-kernel

On (05/29/18 11:51), Hoeun Ryu wrote:
>  Make printk_safe_flush() safe in NMI context.
> nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
> function is called in watchdog_overflow_callback() if the flag of hardlockup
> backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> watchdog_overflow_callback() function is called in NMI context on some
> architectures.
>  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries
> to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in
> preempted contexts (task or irq in this case) or by other CPUs and it may cause
> deadlocks.
>  By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU
> just skips flushing if the lock is not avaiable in NMI context. The messages in
> per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in
> hard lockup (because irq is disabled here) but if panic() is not called. The
> flushing can be delayed by the next irq work in normal cases.

Any chance we can add more info to the commit message? E.g. backtraces
which would describe "how" is this possible (like the one I posted in
another email). Just to make it more clear.

> @@ -254,6 +254,16 @@ void printk_safe_flush(void)
>  {
>  	int cpu;
>  
> +	/*
> +	 * Just avoid deadlocks here, we could loose the messages in per-cpu nmi buffer
> +	 * in the case that hardlockup happens but panic() is not called (irq_work won't
> +	 * work).
> +	 * The flushing can be delayed by the next irq_work if flushing is skippped here
									   ^^ skipped

	-ss

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH]  printk: make printk_safe_flush safe in NMI context by skipping flushing
  2018-05-29 12:13 ` Sergey Senozhatsky
@ 2018-05-29 22:43   ` Hoeun Ryu
  2018-05-30  8:32   ` Petr Mladek
  1 sibling, 0 replies; 7+ messages in thread
From: Hoeun Ryu @ 2018-05-29 22:43 UTC (permalink / raw)
  To: 'Sergey Senozhatsky', 'Hoeun Ryu'
  Cc: 'Petr Mladek', 'Sergey Senozhatsky',
	'Steven Rostedt',
	linux-kernel


> -----Original Message-----
> From: Sergey Senozhatsky [mailto:sergey.senozhatsky.work@gmail.com]
> Sent: Tuesday, May 29, 2018 9:13 PM
> To: Hoeun Ryu <hoeun.ryu@lge.com.com>
> Cc: Petr Mladek <pmladek@suse.com>; Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com>; Steven Rostedt <rostedt@goodmis.org>;
> Hoeun Ryu <hoeun.ryu@lge.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On (05/29/18 11:51), Hoeun Ryu wrote:
> >  Make printk_safe_flush() safe in NMI context.
> > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > watchdog_overflow_callback() function is called in NMI context on some
> > architectures.
> >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already
> locked in
> > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> > deadlocks.
> >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > hard lockup (because irq is disabled here) but if panic() is not called.
> The
> > flushing can be delayed by the next irq work in normal cases.
> 
> Any chance we can add more info to the commit message? E.g. backtraces
> which would describe "how" is this possible (like the one I posted in
> another email). Just to make it more clear.
> 

OK, I will.

> > @@ -254,6 +254,16 @@ void printk_safe_flush(void)
> >  {
> >  	int cpu;
> >
> > +	/*
> > +	 * Just avoid deadlocks here, we could loose the messages in per-
> cpu nmi buffer
> > +	 * in the case that hardlockup happens but panic() is not called
> (irq_work won't
> > +	 * work).
> > +	 * The flushing can be delayed by the next irq_work if flushing is
> skippped here
>
^^ skipped
> 

Typo will be fixed.

> 	-ss

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH]  printk: make printk_safe_flush safe in NMI context by skipping flushing
  2018-05-29 12:13 ` Sergey Senozhatsky
  2018-05-29 22:43   ` Hoeun Ryu
@ 2018-05-30  8:32   ` Petr Mladek
  2018-06-01  1:55     ` Hoeun Ryu
  2018-06-01  5:17     ` Hoeun Ryu
  1 sibling, 2 replies; 7+ messages in thread
From: Petr Mladek @ 2018-05-30  8:32 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Hoeun Ryu, Sergey Senozhatsky, Steven Rostedt, Hoeun Ryu, linux-kernel

On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> On (05/29/18 11:51), Hoeun Ryu wrote:
> >  Make printk_safe_flush() safe in NMI context.
> > nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
> > function is called in watchdog_overflow_callback() if the flag of hardlockup
> > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > watchdog_overflow_callback() function is called in NMI context on some
> > architectures.
> >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries
> > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in
> > preempted contexts (task or irq in this case) or by other CPUs and it may cause

The sentence "logbuf_lock can be already locked in preempted contexts" does not
make much sense. It is a spin lock. It means that both interrupts and
preemption are disabled.

I would change it to something like:

"Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries
to lock logbuf_lock in vprintk_emit() that might be already be part
of a soft- or hard-lockup on another CPU."


> > deadlocks.
> >  By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU
> > just skips flushing if the lock is not avaiable in NMI context. The messages in
> > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in
> > hard lockup (because irq is disabled here) but if panic() is not called. The
> > flushing can be delayed by the next irq work in normal cases.

I somehow miss there a motivation why the current state is better than
the previous. It looks like we exchange the risk of a deadlock with
a risk of loosing the messages.

I see it the following way:

"This patch prevents a deadlock in printk_safe_flush() in NMI
context. It makes sure that we continue and eventually call
printk_safe_flush_on_panic() from panic() that has better
chances to succeed.

There is a risk that logbuf_lock was not part of a soft- or
dead-lockup and we might just loose the messages. But then there is a high
chance that irq_work will get called and the messages will get flushed
the normal way."


> Any chance we can add more info to the commit message? E.g. backtraces
> which would describe "how" is this possible (like the one I posted in
> another email). Just to make it more clear.

I agree that a backtrace would be helpful. But it is not a must to
have from my point of view.

The patch itself looks good to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH]  printk: make printk_safe_flush safe in NMI context by skipping flushing
  2018-05-30  8:32   ` Petr Mladek
@ 2018-06-01  1:55     ` Hoeun Ryu
  2018-06-01  5:17     ` Hoeun Ryu
  1 sibling, 0 replies; 7+ messages in thread
From: Hoeun Ryu @ 2018-06-01  1:55 UTC (permalink / raw)
  To: 'Petr Mladek', 'Sergey Senozhatsky'
  Cc: 'Hoeun Ryu', 'Sergey Senozhatsky',
	'Steven Rostedt',
	linux-kernel

I appreciate the detailed correction.
I will reflect the corrections in the next patch.
plus, the explanation in the code will be fixed.

> -----Original Message-----
> From: Petr Mladek [mailto:pmladek@suse.com]
> Sent: Wednesday, May 30, 2018 5:32 PM
> To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> Cc: Hoeun Ryu <hoeun.ryu@lge.com.com>; Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com>; Steven Rostedt <rostedt@goodmis.org>;
> Hoeun Ryu <hoeun.ryu@lge.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > On (05/29/18 11:51), Hoeun Ryu wrote:
> > >  Make printk_safe_flush() safe in NMI context.
> > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > watchdog_overflow_callback() function is called in NMI context on some
> > > architectures.
> > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> already locked in
> > > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> 
> The sentence "logbuf_lock can be already locked in preempted contexts"
> does not
> make much sense. It is a spin lock. It means that both interrupts and
> preemption are disabled.
> 
> I would change it to something like:
> 
> "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> tries
> to lock logbuf_lock in vprintk_emit() that might be already be part
> of a soft- or hard-lockup on another CPU."
> 
> 
> > > deadlocks.
> > >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > > hard lockup (because irq is disabled here) but if panic() is not
> called. The
> > > flushing can be delayed by the next irq work in normal cases.
> 
> I somehow miss there a motivation why the current state is better than
> the previous. It looks like we exchange the risk of a deadlock with
> a risk of loosing the messages.
> 
> I see it the following way:
> 
> "This patch prevents a deadlock in printk_safe_flush() in NMI
> context. It makes sure that we continue and eventually call
> printk_safe_flush_on_panic() from panic() that has better
> chances to succeed.
> 
> There is a risk that logbuf_lock was not part of a soft- or
> dead-lockup and we might just loose the messages. But then there is a high
> chance that irq_work will get called and the messages will get flushed
> the normal way."
> 
> 
> > Any chance we can add more info to the commit message? E.g. backtraces
> > which would describe "how" is this possible (like the one I posted in
> > another email). Just to make it more clear.
> 
> I agree that a backtrace would be helpful. But it is not a must to
> have from my point of view.
> 
> The patch itself looks good to me.
> 
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH]  printk: make printk_safe_flush safe in NMI context by skipping flushing
  2018-05-30  8:32   ` Petr Mladek
  2018-06-01  1:55     ` Hoeun Ryu
@ 2018-06-01  5:17     ` Hoeun Ryu
  2018-06-01  9:05       ` Petr Mladek
  1 sibling, 1 reply; 7+ messages in thread
From: Hoeun Ryu @ 2018-06-01  5:17 UTC (permalink / raw)
  To: 'Petr Mladek', 'Sergey Senozhatsky'
  Cc: 'Hoeun Ryu', 'Sergey Senozhatsky',
	'Steven Rostedt',
	linux-kernel


> -----Original Message-----
> From: Petr Mladek [mailto:pmladek@suse.com]
> Sent: Wednesday, May 30, 2018 5:32 PM
> To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> Cc: Hoeun Ryu <hoeun.ryu@lge.com.com>; Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com>; Steven Rostedt <rostedt@goodmis.org>;
> Hoeun Ryu <hoeun.ryu@lge.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > On (05/29/18 11:51), Hoeun Ryu wrote:
> > >  Make printk_safe_flush() safe in NMI context.
> > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > watchdog_overflow_callback() function is called in NMI context on some
> > > architectures.
> > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> already locked in
> > > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> 
> The sentence "logbuf_lock can be already locked in preempted contexts"
> does not
> make much sense. It is a spin lock. It means that both interrupts and
> preemption are disabled.
> 

I'd like to say that the preempting context is NMI,
so the preempted contexts could be task/irq/bh contexts on the same CPU.

> I would change it to something like:
> 
> "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> tries
> to lock logbuf_lock in vprintk_emit() that might be already be part
> of a soft- or hard-lockup on another CPU."
> 

It looks more clear.
But I'd modify "be part of a soft- or hard-lockup on another CPU." to
"be part of another non-nmi context on the same CPU or a soft- or
hard-lockup on another CPU."

How about it?

> 
> > > deadlocks.
> > >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > > hard lockup (because irq is disabled here) but if panic() is not
> called. The
> > > flushing can be delayed by the next irq work in normal cases.
> 
> I somehow miss there a motivation why the current state is better than
> the previous. It looks like we exchange the risk of a deadlock with
> a risk of loosing the messages.
> 
> I see it the following way:
> 
> "This patch prevents a deadlock in printk_safe_flush() in NMI
> context. It makes sure that we continue and eventually call
> printk_safe_flush_on_panic() from panic() that has better
> chances to succeed.
> 
> There is a risk that logbuf_lock was not part of a soft- or
> dead-lockup and we might just loose the messages. But then there is a high
> chance that irq_work will get called and the messages will get flushed
> the normal way."
> 
> 
> > Any chance we can add more info to the commit message? E.g. backtraces
> > which would describe "how" is this possible (like the one I posted in
> > another email). Just to make it more clear.
> 
> I agree that a backtrace would be helpful. But it is not a must to
> have from my point of view.
> 
> The patch itself looks good to me.
> 
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH]  printk: make printk_safe_flush safe in NMI context by skipping flushing
  2018-06-01  5:17     ` Hoeun Ryu
@ 2018-06-01  9:05       ` Petr Mladek
  0 siblings, 0 replies; 7+ messages in thread
From: Petr Mladek @ 2018-06-01  9:05 UTC (permalink / raw)
  To: Hoeun Ryu
  Cc: 'Sergey Senozhatsky', 'Hoeun Ryu',
	'Sergey Senozhatsky', 'Steven Rostedt',
	linux-kernel

On Fri 2018-06-01 14:17:54, Hoeun Ryu wrote:
> 
> > -----Original Message-----
> > From: Petr Mladek [mailto:pmladek@suse.com]
> > Sent: Wednesday, May 30, 2018 5:32 PM
> > To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> > Cc: Hoeun Ryu <hoeun.ryu@lge.com.com>; Sergey Senozhatsky
> > <sergey.senozhatsky@gmail.com>; Steven Rostedt <rostedt@goodmis.org>;
> > Hoeun Ryu <hoeun.ryu@lge.com>; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> > skipping flushing
> > 
> > On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > > On (05/29/18 11:51), Hoeun Ryu wrote:
> > > >  Make printk_safe_flush() safe in NMI context.
> > > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> > example the
> > > > function is called in watchdog_overflow_callback() if the flag of
> > hardlockup
> > > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > > watchdog_overflow_callback() function is called in NMI context on some
> > > > architectures.
> > > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> > eventually tries
> > > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> > already locked in
> > > > preempted contexts (task or irq in this case) or by other CPUs and it
> > may cause
> > 
> > The sentence "logbuf_lock can be already locked in preempted contexts"
> > does not
> > make much sense. It is a spin lock. It means that both interrupts and
> > preemption are disabled.
> > 
> 
> I'd like to say that the preempting context is NMI,
> so the preempted contexts could be task/irq/bh contexts on the same CPU.

Good point!

> > I would change it to something like:
> > 
> > "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> > tries
> > to lock logbuf_lock in vprintk_emit() that might be already be part
> > of a soft- or hard-lockup on another CPU."
> > 
> 
> It looks more clear.
> But I'd modify "be part of a soft- or hard-lockup on another CPU." to
> "be part of another non-nmi context on the same CPU or a soft- or
> hard-lockup on another CPU."
> 
> How about it?

Looks fine to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-06-01  9:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-29  2:51 [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing Hoeun Ryu
2018-05-29 12:13 ` Sergey Senozhatsky
2018-05-29 22:43   ` Hoeun Ryu
2018-05-30  8:32   ` Petr Mladek
2018-06-01  1:55     ` Hoeun Ryu
2018-06-01  5:17     ` Hoeun Ryu
2018-06-01  9:05       ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).