LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] ring-buffer: add paranoid checks for loops
@ 2008-10-29 22:48 Steven Rostedt
  2008-10-30 18:45 ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2008-10-29 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Linus Torvalds,
	Andrew Morton


[ for 2.6.28 ]

While writing a new tracer, I had a bug where I caused the ring-buffer
to recurse in a bad way. The bug was with the tracer I was writing
and not the ring-buffer itself. But it took a long time to find the
problem.

This patch adds paranoid checks into the ring-buffer infrastructure
that will catch bugs of this nature.

Note: I put the bug back in the tracer and this patch showed the error
      nicely and prevented the lockup.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 kernel/trace/ring_buffer.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

Index: linux-tip.git/kernel/trace/ring_buffer.c
===================================================================
--- linux-tip.git.orig/kernel/trace/ring_buffer.c	2008-10-29 12:38:54.000000000 -0400
+++ linux-tip.git/kernel/trace/ring_buffer.c	2008-10-29 16:11:00.000000000 -0400
@@ -1022,8 +1022,20 @@ rb_reserve_next_event(struct ring_buffer
 	struct ring_buffer_event *event;
 	u64 ts, delta;
 	int commit = 0;
+	int paranoid = 0;
 
  again:
+	/*
+	 * If we loop here 1,000 times, that means we are either
+	 * in an interrupt storm, or we have something buggy.
+	 * Bail!
+	 */
+	if (unlikely(paranoid > 1000)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		return NULL;
+	}
+	paranoid++;
+
 	ts = ring_buffer_time_stamp(cpu_buffer->cpu);
 
 	/*
@@ -1532,10 +1544,21 @@ rb_get_reader_page(struct ring_buffer_pe
 {
 	struct buffer_page *reader = NULL;
 	unsigned long flags;
+	int paranoid = 0;
 
 	spin_lock_irqsave(&cpu_buffer->lock, flags);
 
  again:
+	/*
+	 * We can call here a couple of times, lets only allow 5.
+	 */
+	if (unlikely(paranoid > 4)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		reader = NULL;
+		goto out;
+	}
+	paranoid++;
+
 	reader = cpu_buffer->reader_page;
 
 	/* If there's more to read, return this page */
@@ -1665,6 +1688,7 @@ ring_buffer_peek(struct ring_buffer *buf
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct ring_buffer_event *event;
 	struct buffer_page *reader;
+	int paranoid = 0;
 
 	if (!cpu_isset(cpu, buffer->cpumask))
 		return NULL;
@@ -1672,6 +1696,16 @@ ring_buffer_peek(struct ring_buffer *buf
 	cpu_buffer = buffer->buffers[cpu];
 
  again:
+	/*
+	 * This could happen a few times, but if more than
+	 * 10 times, then something is probably wrong.
+	 */
+	if (unlikely(paranoid > 10)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		return NULL;
+	}
+	paranoid++;
+
 	reader = rb_get_reader_page(cpu_buffer);
 	if (!reader)
 		return NULL;
@@ -1722,6 +1756,7 @@ ring_buffer_iter_peek(struct ring_buffer
 	struct ring_buffer *buffer;
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct ring_buffer_event *event;
+	int paranoid = 0;
 
 	if (ring_buffer_iter_empty(iter))
 		return NULL;
@@ -1730,6 +1765,16 @@ ring_buffer_iter_peek(struct ring_buffer
 	buffer = cpu_buffer->buffer;
 
  again:
+	/*
+	 * This could happen a few times, but if more than
+	 * 10 times, then something is probably wrong.
+	 */
+	if (unlikely(paranoid > 10)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		return NULL;
+	}
+	paranoid++;
+
 	if (rb_per_cpu_empty(cpu_buffer))
 		return NULL;
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ring-buffer: add paranoid checks for loops
  2008-10-29 22:48 [PATCH] ring-buffer: add paranoid checks for loops Steven Rostedt
@ 2008-10-30 18:45 ` Ingo Molnar
  2008-10-30 19:00   ` Steven Rostedt
  2008-10-31  3:16   ` [PATCH -v2] " Steven Rostedt
  0 siblings, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2008-10-30 18:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andrew Morton


* Steven Rostedt <rostedt@goodmis.org> wrote:

> [ for 2.6.28 ]
> 
> While writing a new tracer, I had a bug where I caused the 
> ring-buffer to recurse in a bad way. The bug was with the tracer I 
> was writing and not the ring-buffer itself. But it took a long time 
> to find the problem.
> 
> This patch adds paranoid checks into the ring-buffer infrastructure
> that will catch bugs of this nature.
> 
> Note: I put the bug back in the tracer and this patch showed the error
>       nicely and prevented the lockup.
> 
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  kernel/trace/ring_buffer.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> Index: linux-tip.git/kernel/trace/ring_buffer.c
> ===================================================================
> --- linux-tip.git.orig/kernel/trace/ring_buffer.c	2008-10-29 12:38:54.000000000 -0400
> +++ linux-tip.git/kernel/trace/ring_buffer.c	2008-10-29 16:11:00.000000000 -0400
> @@ -1022,8 +1022,20 @@ rb_reserve_next_event(struct ring_buffer
>  	struct ring_buffer_event *event;
>  	u64 ts, delta;
>  	int commit = 0;
> +	int paranoid = 0;
>  
>   again:
> +	/*
> +	 * If we loop here 1,000 times, that means we are either
> +	 * in an interrupt storm, or we have something buggy.
> +	 * Bail!
> +	 */
> +	if (unlikely(paranoid > 1000)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		return NULL;
> +	}
> +	paranoid++;
> +
>  	ts = ring_buffer_time_stamp(cpu_buffer->cpu);
>  
>  	/*
> @@ -1532,10 +1544,21 @@ rb_get_reader_page(struct ring_buffer_pe
>  {
>  	struct buffer_page *reader = NULL;
>  	unsigned long flags;
> +	int paranoid = 0;
>  
>  	spin_lock_irqsave(&cpu_buffer->lock, flags);
>  
>   again:
> +	/*
> +	 * We can call here a couple of times, lets only allow 5.
> +	 */
> +	if (unlikely(paranoid > 4)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		reader = NULL;
> +		goto out;
> +	}
> +	paranoid++;
> +
>  	reader = cpu_buffer->reader_page;
>  
>  	/* If there's more to read, return this page */
> @@ -1665,6 +1688,7 @@ ring_buffer_peek(struct ring_buffer *buf
>  	struct ring_buffer_per_cpu *cpu_buffer;
>  	struct ring_buffer_event *event;
>  	struct buffer_page *reader;
> +	int paranoid = 0;
>  
>  	if (!cpu_isset(cpu, buffer->cpumask))
>  		return NULL;
> @@ -1672,6 +1696,16 @@ ring_buffer_peek(struct ring_buffer *buf
>  	cpu_buffer = buffer->buffers[cpu];
>  
>   again:
> +	/*
> +	 * This could happen a few times, but if more than
> +	 * 10 times, then something is probably wrong.
> +	 */
> +	if (unlikely(paranoid > 10)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		return NULL;
> +	}
> +	paranoid++;
> +
>  	reader = rb_get_reader_page(cpu_buffer);
>  	if (!reader)
>  		return NULL;
> @@ -1722,6 +1756,7 @@ ring_buffer_iter_peek(struct ring_buffer
>  	struct ring_buffer *buffer;
>  	struct ring_buffer_per_cpu *cpu_buffer;
>  	struct ring_buffer_event *event;
> +	int paranoid = 0;
>  
>  	if (ring_buffer_iter_empty(iter))
>  		return NULL;
> @@ -1730,6 +1765,16 @@ ring_buffer_iter_peek(struct ring_buffer
>  	buffer = cpu_buffer->buffer;
>  
>   again:
> +	/*
> +	 * This could happen a few times, but if more than
> +	 * 10 times, then something is probably wrong.
> +	 */
> +	if (unlikely(paranoid > 10)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		return NULL;
> +	}
> +	paranoid++;
> +

hm, all those magic constants look a bit like voodoo and make the 
patch ugly, and people who read this will be confused about the 
purpose for sure.

But the checks are still worth having in practice. So could you please 
improve the comments, to come up with some tangible calculation that 
leads to these constants?

For example the '1000' constant, how did you come to that? Could you 
estimate what type of interrupt storm is needed to trigger it falsely? 
So instead of this comment:

> +	 * If we loop here 1,000 times, that means we are either
> +	 * in an interrupt storm, or we have something buggy.
> +	 * Bail!

something like this might look more acceptable:

> +	 * If we loop here 1,000 times, that means we are either
> +	 * in an interrupt storm that preempted the same trace-entry
> +	 * attempt 1000 times in a row, or we have a bug in the tracer.
> +	 * Bail!

i.e. please exaplain every single magic number there so that it can be 
followed how you got to that number, and what precise effects that 
number has.

In the cases where you just guessed a number based on experiments, 
please think it through and insert an analysis about the effects of 
that number.

Would this be doable?

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ring-buffer: add paranoid checks for loops
  2008-10-30 18:45 ` Ingo Molnar
@ 2008-10-30 19:00   ` Steven Rostedt
  2008-10-31  3:16   ` [PATCH -v2] " Steven Rostedt
  1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2008-10-30 19:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andrew Morton


On Thu, 30 Oct 2008, Ingo Molnar wrote:
> 
> hm, all those magic constants look a bit like voodoo and make the 
> patch ugly, and people who read this will be confused about the 
> purpose for sure.

Point taken.

> 
> But the checks are still worth having in practice. So could you please 
> improve the comments, to come up with some tangible calculation that 
> leads to these constants?
> 
> For example the '1000' constant, how did you come to that? Could you 
> estimate what type of interrupt storm is needed to trigger it falsely? 
> So instead of this comment:

My original number was 100,000, but I thought that a bit high ;-)
Since it is OK for an interrupt to preempt this code and perform a trace, 
which would make the condition fail by the one being preempted. The 
likelyhood of an interrupt coming in at that location 1000 times in a row 
seems to be awefully low. It's not enough that a 1000 interrupts come in, 
the task being preempted must loop 1000 times and have a trace interrupt 
cause the condition to fail each time. I'll explain it this way in the 
comments.

I picked a big number because I can see a traced interrupt that is very 
active causing several interruptions in this code.

> 
> > +	 * If we loop here 1,000 times, that means we are either
> > +	 * in an interrupt storm, or we have something buggy.
> > +	 * Bail!
> 
> something like this might look more acceptable:
> 
> > +	 * If we loop here 1,000 times, that means we are either
> > +	 * in an interrupt storm that preempted the same trace-entry
> > +	 * attempt 1000 times in a row, or we have a bug in the tracer.
> > +	 * Bail!
> 
> i.e. please exaplain every single magic number there so that it can be 
> followed how you got to that number, and what precise effects that 
> number has.
> 
> In the cases where you just guessed a number based on experiments, 
> please think it through and insert an analysis about the effects of 
> that number.
> 
> Would this be doable?

Again, there are small "allowable" races that causes the code to loop a 
few times.  I'll try to explain them a bit better in the comments.
There's small races between the reader and writer that can hit just right 
to cause a "loop again". But these chances are much smaller than the 
interrupt tracing situation.

I'll look deeper at the reasons for the races and explain them a bit 
better.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH -v2] ring-buffer: add paranoid checks for loops
  2008-10-30 18:45 ` Ingo Molnar
  2008-10-30 19:00   ` Steven Rostedt
@ 2008-10-31  3:16   ` Steven Rostedt
  2008-10-31  9:38     ` Ingo Molnar
  1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2008-10-31  3:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andrew Morton

[
  Changes since v1:

   Updated comments to be more detailed.
]

While writing a new tracer, I had a bug where I caused the ring-buffer
to recurse in a bad way. The bug was with the tracer I was writing
and not the ring-buffer itself. But it took a long time to find the
problem.

This patch adds paranoid checks into the ring-buffer infrastructure
that will catch bugs of this nature.

Note: I put the bug back in the tracer and this patch showed the error
      nicely and prevented the lockup.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 kernel/trace/ring_buffer.c |   58 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

Index: linux-tip.git/kernel/trace/ring_buffer.c
===================================================================
--- linux-tip.git.orig/kernel/trace/ring_buffer.c	2008-10-30 11:22:43.000000000 -0400
+++ linux-tip.git/kernel/trace/ring_buffer.c	2008-10-30 22:50:53.000000000 -0400
@@ -1022,8 +1022,24 @@ rb_reserve_next_event(struct ring_buffer
 	struct ring_buffer_event *event;
 	u64 ts, delta;
 	int commit = 0;
+	int paranoid = 0;
 
  again:
+	/*
+	 * We allow for interrupts to reenter here and do a trace.
+	 * If one does, it will cause this original code to loop
+	 * back here. Even with heavy interrupts happening, this
+	 * should only happen a few times in a row. If this happens
+	 * 1000 times in a row, there must be either an interrupt
+	 * storm or we have something buggy.
+	 * Bail!
+	 */
+	if (unlikely(paranoid > 1000)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		return NULL;
+	}
+	paranoid++;
+
 	ts = ring_buffer_time_stamp(cpu_buffer->cpu);
 
 	/*
@@ -1532,10 +1548,24 @@ rb_get_reader_page(struct ring_buffer_pe
 {
 	struct buffer_page *reader = NULL;
 	unsigned long flags;
+	int paranoid = 0;
 
 	spin_lock_irqsave(&cpu_buffer->lock, flags);
 
  again:
+	/*
+	 * This should normally only loop twice. But because the
+	 * start of the reader inserts an empty page, it causes
+	 * a case where we will loop three times. There should be no
+	 * reason to loop four times (that I know of).
+	 */
+	if (unlikely(paranoid > 2)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		reader = NULL;
+		goto out;
+	}
+	paranoid++;
+
 	reader = cpu_buffer->reader_page;
 
 	/* If there's more to read, return this page */
@@ -1665,6 +1695,7 @@ ring_buffer_peek(struct ring_buffer *buf
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct ring_buffer_event *event;
 	struct buffer_page *reader;
+	int paranoid = 0;
 
 	if (!cpu_isset(cpu, buffer->cpumask))
 		return NULL;
@@ -1672,6 +1703,19 @@ ring_buffer_peek(struct ring_buffer *buf
 	cpu_buffer = buffer->buffers[cpu];
 
  again:
+	/*
+	 * We repeat when a timestamp is encountered. It is possible
+	 * to get multiple timestamps from an interrupt entering just
+	 * as one timestamp is about to be written. The max times
+	 * that this can happen is the number of nested interrupts we
+	 * can have.  10 should be more than enough.
+	 */
+	if (unlikely(paranoid > 10)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		return NULL;
+	}
+	paranoid++;
+
 	reader = rb_get_reader_page(cpu_buffer);
 	if (!reader)
 		return NULL;
@@ -1722,6 +1766,7 @@ ring_buffer_iter_peek(struct ring_buffer
 	struct ring_buffer *buffer;
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct ring_buffer_event *event;
+	int paranoid = 0;
 
 	if (ring_buffer_iter_empty(iter))
 		return NULL;
@@ -1730,6 +1775,19 @@ ring_buffer_iter_peek(struct ring_buffer
 	buffer = cpu_buffer->buffer;
 
  again:
+	/*
+	 * We repeat when a timestamp is encountered. It is possible
+	 * to get multiple timestamps from an interrupt entering just
+	 * as one timestamp is about to be written. The max times
+	 * that this can happen is the number of nested interrupts we
+	 * can have.  10 should be more than enough.
+	 */
+	if (unlikely(paranoid > 10)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		return NULL;
+	}
+	paranoid++;
+
 	if (rb_per_cpu_empty(cpu_buffer))
 		return NULL;
 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -v2] ring-buffer: add paranoid checks for loops
  2008-10-31  3:16   ` [PATCH -v2] " Steven Rostedt
@ 2008-10-31  9:38     ` Ingo Molnar
  2008-10-31 13:58       ` [PATCH -v3] " Steven Rostedt
  2008-10-31 14:00       ` [PATCH -v2] " Steven Rostedt
  0 siblings, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2008-10-31  9:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andrew Morton


* Steven Rostedt <rostedt@goodmis.org> wrote:

> +	/*
> +	 * This should normally only loop twice. But because the
> +	 * start of the reader inserts an empty page, it causes
> +	 * a case where we will loop three times. There should be no
> +	 * reason to loop four times (that I know of).
> +	 */
> +	if (unlikely(paranoid > 2)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		reader = NULL;
> +		goto out;
> +	}
> +	paranoid++;

ok, the explanations look nice now.

A small nit - the above comment suggests that looping 4 times is the 
anomaly - still the test is for paranoid > 2 ?

> +	int paranoid = 0;

another small nit: i'd suggest to rename 'paranoid' to 'nr_loops' or 
'nr_iterations' or so. It is the _condition_ that signals paranoia, 
not the variable in itself - making the current patch look a bit 
weird.

>   again:
> +	/*
> +	 * We repeat when a timestamp is encountered. It is possible
> +	 * to get multiple timestamps from an interrupt entering just
> +	 * as one timestamp is about to be written. The max times
> +	 * that this can happen is the number of nested interrupts we
> +	 * can have.  10 should be more than enough.
> +	 */
> +	if (unlikely(paranoid > 10)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		return NULL;

s/10 should be more than enough/Nesting higher than 10 is clearly 
anomalous/

> +	/*
> +	 * We repeat when a timestamp is encountered. It is possible
> +	 * to get multiple timestamps from an interrupt entering just
> +	 * as one timestamp is about to be written. The max times
> +	 * that this can happen is the number of nested interrupts we
> +	 * can have.  10 should be more than enough.
> +	 */
> +	if (unlikely(paranoid > 10)) {
> +		RB_WARN_ON(cpu_buffer, 1);
> +		return NULL;

ditto.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH -v3] ring-buffer: add paranoid checks for loops
  2008-10-31  9:38     ` Ingo Molnar
@ 2008-10-31 13:58       ` Steven Rostedt
  2008-11-03 10:10         ` Ingo Molnar
  2008-10-31 14:00       ` [PATCH -v2] " Steven Rostedt
  1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2008-10-31 13:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andrew Morton

[
  Changes from v2:

   Applied Ingo's comments:

     Rephrased one of the comments.

     Renamed the "paranoid" variable into "nr_loops".
]

While writing a new tracer, I had a bug where I caused the ring-buffer
to recurse in a bad way. The bug was with the tracer I was writing
and not the ring-buffer itself. But it took a long time to find the
problem.

This patch adds paranoid checks into the ring-buffer infrastructure
that will catch bugs of this nature.

Note: I put the bug back in the tracer and this patch showed the error
      nicely and prevented the lockup.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 kernel/trace/ring_buffer.c |   56 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

Index: linux-tip.git/kernel/trace/ring_buffer.c
===================================================================
--- linux-tip.git.orig/kernel/trace/ring_buffer.c	2008-10-30 11:22:43.000000000 -0400
+++ linux-tip.git/kernel/trace/ring_buffer.c	2008-10-31 09:50:35.000000000 -0400
@@ -1022,8 +1022,23 @@ rb_reserve_next_event(struct ring_buffer
 	struct ring_buffer_event *event;
 	u64 ts, delta;
 	int commit = 0;
+	int nr_loops = 0;
 
  again:
+	/*
+	 * We allow for interrupts to reenter here and do a trace.
+	 * If one does, it will cause this original code to loop
+	 * back here. Even with heavy interrupts happening, this
+	 * should only happen a few times in a row. If this happens
+	 * 1000 times in a row, there must be either an interrupt
+	 * storm or we have something buggy.
+	 * Bail!
+	 */
+	if (unlikely(++nr_loops > 1000)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		return NULL;
+	}
+
 	ts = ring_buffer_time_stamp(cpu_buffer->cpu);
 
 	/*
@@ -1532,10 +1547,23 @@ rb_get_reader_page(struct ring_buffer_pe
 {
 	struct buffer_page *reader = NULL;
 	unsigned long flags;
+	int nr_loops = 0;
 
 	spin_lock_irqsave(&cpu_buffer->lock, flags);
 
  again:
+	/*
+	 * This should normally only loop twice. But because the
+	 * start of the reader inserts an empty page, it causes
+	 * a case where we will loop three times. There should be no
+	 * reason to loop four times (that I know of).
+	 */
+	if (unlikely(++nr_loops > 3)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		reader = NULL;
+		goto out;
+	}
+
 	reader = cpu_buffer->reader_page;
 
 	/* If there's more to read, return this page */
@@ -1665,6 +1693,7 @@ ring_buffer_peek(struct ring_buffer *buf
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct ring_buffer_event *event;
 	struct buffer_page *reader;
+	int nr_loops = 0;
 
 	if (!cpu_isset(cpu, buffer->cpumask))
 		return NULL;
@@ -1672,6 +1701,19 @@ ring_buffer_peek(struct ring_buffer *buf
 	cpu_buffer = buffer->buffers[cpu];
 
  again:
+	/*
+	 * We repeat when a timestamp is encountered. It is possible
+	 * to get multiple timestamps from an interrupt entering just
+	 * as one timestamp is about to be written. The max times
+	 * that this can happen is the number of nested interrupts we
+	 * can have.  Nesting 10 deep of interrupts is clearly
+	 * an anomaly.
+	 */
+	if (unlikely(++nr_loops > 10)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		return NULL;
+	}
+
 	reader = rb_get_reader_page(cpu_buffer);
 	if (!reader)
 		return NULL;
@@ -1722,6 +1764,7 @@ ring_buffer_iter_peek(struct ring_buffer
 	struct ring_buffer *buffer;
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct ring_buffer_event *event;
+	int nr_loops = 0;
 
 	if (ring_buffer_iter_empty(iter))
 		return NULL;
@@ -1730,6 +1773,19 @@ ring_buffer_iter_peek(struct ring_buffer
 	buffer = cpu_buffer->buffer;
 
  again:
+	/*
+	 * We repeat when a timestamp is encountered. It is possible
+	 * to get multiple timestamps from an interrupt entering just
+	 * as one timestamp is about to be written. The max times
+	 * that this can happen is the number of nested interrupts we
+	 * can have. Nesting 10 deep of interrupts is clearly
+	 * an anomaly.
+	 */
+	if (unlikely(++nr_loops > 10)) {
+		RB_WARN_ON(cpu_buffer, 1);
+		return NULL;
+	}
+
 	if (rb_per_cpu_empty(cpu_buffer))
 		return NULL;
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -v2] ring-buffer: add paranoid checks for loops
  2008-10-31  9:38     ` Ingo Molnar
  2008-10-31 13:58       ` [PATCH -v3] " Steven Rostedt
@ 2008-10-31 14:00       ` Steven Rostedt
  1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2008-10-31 14:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andrew Morton


On Fri, 31 Oct 2008, Ingo Molnar wrote:
> 
> > +	/*
> > +	 * This should normally only loop twice. But because the
> > +	 * start of the reader inserts an empty page, it causes
> > +	 * a case where we will loop three times. There should be no
> > +	 * reason to loop four times (that I know of).
> > +	 */
> > +	if (unlikely(paranoid > 2)) {
> > +		RB_WARN_ON(cpu_buffer, 1);
> > +		reader = NULL;
> > +		goto out;
> > +	}
> > +	paranoid++;
> 
> ok, the explanations look nice now.
> 
> A small nit - the above comment suggests that looping 4 times is the 
> anomaly - still the test is for paranoid > 2 ?

Yes, that's because the variable started at 0. So > 2 really means the 
loop iterated more than 3.

My last patch changed it to be a bit easier to understand...

   if (unlikely(++nr_loops > 3)) {

-- Steve


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -v3] ring-buffer: add paranoid checks for loops
  2008-10-31 13:58       ` [PATCH -v3] " Steven Rostedt
@ 2008-11-03 10:10         ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2008-11-03 10:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andrew Morton


* Steven Rostedt <rostedt@goodmis.org> wrote:

> [
>   Changes from v2:
> 
>    Applied Ingo's comments:
> 
>      Rephrased one of the comments.
> 
>      Renamed the "paranoid" variable into "nr_loops".
> ]
> 
> While writing a new tracer, I had a bug where I caused the ring-buffer
> to recurse in a bad way. The bug was with the tracer I was writing
> and not the ring-buffer itself. But it took a long time to find the
> problem.
> 
> This patch adds paranoid checks into the ring-buffer infrastructure
> that will catch bugs of this nature.
> 
> Note: I put the bug back in the tracer and this patch showed the error
>       nicely and prevented the lockup.
> 
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  kernel/trace/ring_buffer.c |   56 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)

applied to tip/tracing/urgent, thanks Steve!

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-11-03 10:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-29 22:48 [PATCH] ring-buffer: add paranoid checks for loops Steven Rostedt
2008-10-30 18:45 ` Ingo Molnar
2008-10-30 19:00   ` Steven Rostedt
2008-10-31  3:16   ` [PATCH -v2] " Steven Rostedt
2008-10-31  9:38     ` Ingo Molnar
2008-10-31 13:58       ` [PATCH -v3] " Steven Rostedt
2008-11-03 10:10         ` Ingo Molnar
2008-10-31 14:00       ` [PATCH -v2] " Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).