LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: linux-kernel@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Pekka Enberg <penberg@kernel.org>,
	Steve Capper <steve.capper@linaro.org>
Subject: [PATCH] mm/slub: fix lockups on PREEMPT && !SMP kernels
Date: Fri, 13 Mar 2015 15:47:12 +0000	[thread overview]
Message-ID: <1426261632-8911-1-git-send-email-mark.rutland@arm.com> (raw)

Commit 9aabf810a67cd97e ("mm/slub: optimize alloc/free fastpath by
removing preemption on/off") introduced an occasional hang for kernels
built with CONFIG_PREEMPT && !CONFIG_SMP.

The problem is the following loop the patch introduced to
slab_alloc_node and slab_free:

do {
        tid = this_cpu_read(s->cpu_slab->tid);
        c = raw_cpu_ptr(s->cpu_slab);
} while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid));

GCC 4.9 has been observed to hoist the load of c and c->tid above the
loop for !SMP kernels (as in this case raw_cpu_ptr(x) is compile-time
constant and does not force a reload). On arm64 the generated assembly
looks like:

ffffffc00016d3c4:       f9400404        ldr     x4, [x0,#8]
ffffffc00016d3c8:       f9400401        ldr     x1, [x0,#8]
ffffffc00016d3cc:       eb04003f        cmp     x1, x4
ffffffc00016d3d0:       54ffffc1        b.ne    ffffffc00016d3c8 <slab_alloc_node.constprop.82+0x30>

If the thread is preempted between the load of c->tid (into x1) and tid
(into x4), and and allocation or free occurs in another thread (bumping
the cpu_slab's tid), the thread will be stuck in the loop until
s->cpu_slab->tid wraps, which may be forever in the absence of
allocations on the same CPU.

The loop itself is somewhat pointless as the thread can be preempted at
any point after the loop before the this_cpu_cmpxchg_double, and the
window for preemption within the loop is far smaller. Given that we
assume accessing c->tid is safe for the loop condition, and we retry
when the cmpxchg fails, we can get rid of the loop entirely and just
access c->tid via the raw_cpu_ptr for s->cpu_slab.

This patch ensures that we reload c->tid even when the compiler would
otherwise assume that it could cache the value, and removes the
redundant (and misleading) loop. Comments are updated to take these
changes into account.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steve Capper <steve.capper@linaro.org>
---
 mm/slub.c | 32 ++++++++++++--------------------
 1 file changed, 12 insertions(+), 20 deletions(-)

Hi all,

I wasn't entirely sure if the read of c->tid in the loop condition was safe
given the general rules for this_cpu operations. I assume that a read on a
remote CPU is always safe, and hence we can just load tid via the raw_cpu_ptr
and leave it to the cmpxchg to detect when we get preempted. If the remote read
is not safe then as far as I can tell the original patch is broken in this
regard even in the absence of reordering by the compiler.

I've boot-tested this patch a number of times (with SMP and !SMP) on the
platform I noticed the lockup on, and everything seems stable. Do we have any
SL*B stress tests that could give this a better thrashing?

Otherwise any and all comments/review/testing would be appreciated.

Mark.

diff --git a/mm/slub.c b/mm/slub.c
index 6832c4e..2b9a75e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2437,19 +2437,13 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 		return NULL;
 redo:
 	/*
-	 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
-	 * enabled. We may switch back and forth between cpus while
-	 * reading from one cpu area. That does not matter as long
-	 * as we end up on the original cpu again when doing the cmpxchg.
-	 *
-	 * We should guarantee that tid and kmem_cache are retrieved on
-	 * the same cpu. It could be different if CONFIG_PREEMPT so we need
-	 * to check if it is matched or not.
+	 * Preemption is enabled. We may switch back and forth between cpus,
+	 * and other threads may access the current cpu's area. That does not
+	 * matter as long as when we reach the cmpxhg we end up on the original
+	 * cpu and tid hasn't changed. We'll retry until that happens.
 	 */
-	do {
-		tid = this_cpu_read(s->cpu_slab->tid);
-		c = raw_cpu_ptr(s->cpu_slab);
-	} while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid));
+	c = raw_cpu_ptr(s->cpu_slab);
+	tid = READ_ONCE(c->tid);
 
 	/*
 	 * Irqless object alloc/free algorithm used here depends on sequence
@@ -2710,15 +2704,13 @@ static __always_inline void slab_free(struct kmem_cache *s,
 
 redo:
 	/*
-	 * Determine the currently cpus per cpu slab.
-	 * The cpu may change afterward. However that does not matter since
-	 * data is retrieved via this pointer. If we are on the same cpu
-	 * during the cmpxchg then the free will succedd.
+	 * Preemption is enabled. We may switch back and forth between cpus,
+	 * and other threads may access the current cpu's area. That does not
+	 * matter as long as when we reach the cmpxhg we end up on the original
+	 * cpu and tid hasn't changed. We'll retry until that happens.
 	 */
-	do {
-		tid = this_cpu_read(s->cpu_slab->tid);
-		c = raw_cpu_ptr(s->cpu_slab);
-	} while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid));
+	c = raw_cpu_ptr(s->cpu_slab);
+	tid = READ_ONCE(c->tid);
 
 	/* Same with comment on barrier() in slab_alloc_node() */
 	barrier();
-- 
1.9.1


             reply	other threads:[~2015-03-13 15:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-13 15:47 Mark Rutland [this message]
2015-03-13 16:29 ` Christoph Lameter
2015-03-13 18:16   ` Mark Rutland
2015-03-13 18:27     ` Christoph Lameter
2015-03-16 12:45   ` [PATCHv2] " Mark Rutland
2015-03-17  1:09 ` [PATCH] " Joonsoo Kim
2015-03-17 12:00   ` Mark Rutland
2015-03-17 12:15     ` [PATCHv2] " Mark Rutland
2015-03-17 18:57       ` Christoph Lameter
2015-03-18  5:59       ` Joonsoo Kim
2015-03-18 15:21         ` Mark Rutland
2015-03-19 12:13           ` Joonsoo Kim
2015-03-19 16:16             ` Christoph Lameter
2015-03-24 14:17       ` Ville Syrjälä

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1426261632-8911-1-git-send-email-mark.rutland@arm.com \
    --to=mark.rutland@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=steve.capper@linaro.org \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [PATCH] mm/slub: fix lockups on PREEMPT && '\!'SMP kernels' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).