LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] ring-buffer: add paranoid checks for loops
Date: Thu, 30 Oct 2008 19:45:52 +0100	[thread overview]
Message-ID: <20081030184552.GC17822@elte.hu> (raw)
In-Reply-To: <alpine.DEB.1.10.0810291439460.13214@gandalf.stny.rr.com>


* Steven Rostedt <rostedt@goodmis.org> wrote:

> [ for 2.6.28 ]
> 
> While writing a new tracer, I had a bug where I caused the 
> ring-buffer to recurse in a bad way. The bug was with the tracer I 
> was writing and not the ring-buffer itself. But it took a long time 
> to find the problem.
> 
> This patch adds paranoid checks into the ring-buffer infrastructure
> that will catch bugs of this nature.
> 
> Note: I put the bug back in the tracer and this patch showed the error
>       nicely and prevented the lockup.
> 
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  kernel/trace/ring_buffer.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> Index: linux-tip.git/kernel/trace/ring_buffer.c
> ===================================================================
> --- linux-tip.git.orig/kernel/trace/ring_buffer.c	2008-10-29 12:38:54.000000000 -0400
> +++ linux-tip.git/kernel/trace/ring_buffer.c	2008-10-29 16:11:00.000000000 -0400
> @@ -1022,8 +1022,20 @@ rb_reserve_next_event(struct ring_buffer
>  	struct ring_buffer_event *event;
>  	u64 ts, delta;
>  	int commit = 0;
> +	int paranoid = 0;
>  
>   again:
> +	/*
> +	 * If we loop here 1,000 times, that means we are either
> +	 * in an interrupt storm, or we have something buggy.
> +	 * Bail!
> +	 */
> +	if (unlikely(paranoid > 1000)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		return NULL;
> +	}
> +	paranoid++;
> +
>  	ts = ring_buffer_time_stamp(cpu_buffer->cpu);
>  
>  	/*
> @@ -1532,10 +1544,21 @@ rb_get_reader_page(struct ring_buffer_pe
>  {
>  	struct buffer_page *reader = NULL;
>  	unsigned long flags;
> +	int paranoid = 0;
>  
>  	spin_lock_irqsave(&cpu_buffer->lock, flags);
>  
>   again:
> +	/*
> +	 * We can call here a couple of times, lets only allow 5.
> +	 */
> +	if (unlikely(paranoid > 4)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		reader = NULL;
> +		goto out;
> +	}
> +	paranoid++;
> +
>  	reader = cpu_buffer->reader_page;
>  
>  	/* If there's more to read, return this page */
> @@ -1665,6 +1688,7 @@ ring_buffer_peek(struct ring_buffer *buf
>  	struct ring_buffer_per_cpu *cpu_buffer;
>  	struct ring_buffer_event *event;
>  	struct buffer_page *reader;
> +	int paranoid = 0;
>  
>  	if (!cpu_isset(cpu, buffer->cpumask))
>  		return NULL;
> @@ -1672,6 +1696,16 @@ ring_buffer_peek(struct ring_buffer *buf
>  	cpu_buffer = buffer->buffers[cpu];
>  
>   again:
> +	/*
> +	 * This could happen a few times, but if more than
> +	 * 10 times, then something is probably wrong.
> +	 */
> +	if (unlikely(paranoid > 10)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		return NULL;
> +	}
> +	paranoid++;
> +
>  	reader = rb_get_reader_page(cpu_buffer);
>  	if (!reader)
>  		return NULL;
> @@ -1722,6 +1756,7 @@ ring_buffer_iter_peek(struct ring_buffer
>  	struct ring_buffer *buffer;
>  	struct ring_buffer_per_cpu *cpu_buffer;
>  	struct ring_buffer_event *event;
> +	int paranoid = 0;
>  
>  	if (ring_buffer_iter_empty(iter))
>  		return NULL;
> @@ -1730,6 +1765,16 @@ ring_buffer_iter_peek(struct ring_buffer
>  	buffer = cpu_buffer->buffer;
>  
>   again:
> +	/*
> +	 * This could happen a few times, but if more than
> +	 * 10 times, then something is probably wrong.
> +	 */
> +	if (unlikely(paranoid > 10)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		return NULL;
> +	}
> +	paranoid++;
> +

hm, all those magic constants look a bit like voodoo and make the 
patch ugly, and people who read this will be confused about the 
purpose for sure.

But the checks are still worth having in practice. So could you please 
improve the comments, to come up with some tangible calculation that 
leads to these constants?

For example the '1000' constant, how did you come to that? Could you 
estimate what type of interrupt storm is needed to trigger it falsely? 
So instead of this comment:

> +	 * If we loop here 1,000 times, that means we are either
> +	 * in an interrupt storm, or we have something buggy.
> +	 * Bail!

something like this might look more acceptable:

> +	 * If we loop here 1,000 times, that means we are either
> +	 * in an interrupt storm that preempted the same trace-entry
> +	 * attempt 1000 times in a row, or we have a bug in the tracer.
> +	 * Bail!

i.e. please exaplain every single magic number there so that it can be 
followed how you got to that number, and what precise effects that 
number has.

In the cases where you just guessed a number based on experiments, 
please think it through and insert an analysis about the effects of 
that number.

Would this be doable?

	Ingo

  reply	other threads:[~2008-10-30 18:46 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-29 22:48 Steven Rostedt
2008-10-30 18:45 ` Ingo Molnar [this message]
2008-10-30 19:00   ` Steven Rostedt
2008-10-31  3:16   ` [PATCH -v2] " Steven Rostedt
2008-10-31  9:38     ` Ingo Molnar
2008-10-31 13:58       ` [PATCH -v3] " Steven Rostedt
2008-11-03 10:10         ` Ingo Molnar
2008-10-31 14:00       ` [PATCH -v2] " Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081030184552.GC17822@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [PATCH] ring-buffer: add paranoid checks for loops' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).