LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
Matthew Wilcox <willy@infradead.org>,
Christoph Hellwig <hch@lst.de>,
Dan Williams <dan.j.williams@intel.com>,
Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
linux-nvdimm@lists.01.org, stable@vger.kernel.org
Subject: Re: [PATCH 5/5] radix tree: fix multi-order iteration race
Date: Wed, 9 May 2018 14:46:11 +0200 [thread overview]
Message-ID: <20180509124611.6hoa743z4qrx6bgc@quack2.suse.cz> (raw)
In-Reply-To: <20180503192430.7582-6-ross.zwisler@linux.intel.com>
On Thu 03-05-18 13:24:30, Ross Zwisler wrote:
> Fix a race in the multi-order iteration code which causes the kernel to hit
> a GP fault. This was first seen with a production v4.15 based kernel
> (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD
> DAX entries.
>
> The race has to do with how we tear down multi-order sibling entries when
> we are removing an item from the tree. Remember for example that an order
> 2 entry looks like this:
>
> struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
>
> where 'entry' is in some slot in the struct radix_tree_node, and the three
> slots following 'entry' contain sibling pointers which point back to
> 'entry.'
>
> When we delete 'entry' from the tree, we call :
> radix_tree_delete()
> radix_tree_delete_item()
> __radix_tree_delete()
> replace_slot()
>
> replace_slot() first removes the siblings in order from the first to the
> last, then at then replaces 'entry' with NULL. This means that for a brief
> period of time we end up with one or more of the siblings removed, so:
>
> struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
>
> This causes an issue if you have a reader iterating over the slots in the
> tree via radix_tree_for_each_slot() while only under
> rcu_read_lock()/rcu_read_unlock() protection. This is a common case in
> mm/filemap.c.
>
> The issue is that when __radix_tree_next_slot() => skip_siblings() tries to
> skip over the sibling entries in the slots, it currently does so with an
> exact match on the slot directly preceding our current slot. Normally this
> works:
> V preceding slot
> struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> ^ current slot
>
> This lets you find the first sibling, and you skip them all in order.
>
> But in the case where one of the siblings is NULL, that slot is skipped and
> then our sibling detection is interrupted:
>
> V preceding slot
> struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> ^ current slot
>
> This means that the sibling pointers aren't recognized since they point all
> the way back to 'entry', so we think that they are normal internal radix
> tree pointers. This causes us to think we need to walk down to a struct
> radix_tree_node starting at the address of 'entry'.
>
> In a real running kernel this will crash the thread with a GP fault when
> you try and dereference the slots in your broken node starting at 'entry'.
>
> We fix this race by fixing the way that skip_siblings() detects sibling
> nodes. Instead of testing against the preceding slot we instead look for
> siblings via is_sibling_entry() which compares against the position of the
> struct radix_tree_node.slots[] array. This ensures that sibling entries
> are properly identified, even if they are no longer contiguous with the
> 'entry' they point to.
>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reported-by: CR, Sapthagirish <sapthagirish.cr@intel.com>
> Fixes: commit 148deab223b2 ("radix-tree: improve multiorder iterators")
> Cc: <stable@vger.kernel.org>
Looks good to me. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> lib/radix-tree.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index da9e10c827df..43e0cbedc3a0 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -1612,11 +1612,9 @@ static void set_iter_tags(struct radix_tree_iter *iter,
> static void __rcu **skip_siblings(struct radix_tree_node **nodep,
> void __rcu **slot, struct radix_tree_iter *iter)
> {
> - void *sib = node_to_entry(slot - 1);
> -
> while (iter->index < iter->next_index) {
> *nodep = rcu_dereference_raw(*slot);
> - if (*nodep && *nodep != sib)
> + if (*nodep && !is_sibling_entry(iter->node, *nodep))
> return slot;
> slot++;
> iter->index = __radix_tree_iter_add(iter, 1);
> @@ -1631,7 +1629,7 @@ void __rcu **__radix_tree_next_slot(void __rcu **slot,
> struct radix_tree_iter *iter, unsigned flags)
> {
> unsigned tag = flags & RADIX_TREE_ITER_TAG_MASK;
> - struct radix_tree_node *node = rcu_dereference_raw(*slot);
> + struct radix_tree_node *node;
>
> slot = skip_siblings(&node, slot, iter);
>
> --
> 2.14.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2018-05-09 12:46 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-03 19:24 [PATCH 0/5] fix radix tree " Ross Zwisler
2018-05-03 19:24 ` [PATCH 1/5] radix tree test suite: fix mapshift build target Ross Zwisler
2018-07-15 23:00 ` Matthew Wilcox
2018-07-16 16:07 ` Ross Zwisler
2018-07-16 19:52 ` Matthew Wilcox
2018-07-16 21:08 ` Ross Zwisler
2018-07-17 2:41 ` Matthew Wilcox
2018-07-21 23:45 ` Dave Chinner
2018-07-22 3:11 ` Ross Zwisler
2018-07-17 3:18 ` Matthew Wilcox
2018-07-17 17:17 ` Ross Zwisler
2018-05-03 19:24 ` [PATCH 2/5] radix tree test suite: fix compilation issue Ross Zwisler
2018-05-03 19:24 ` [PATCH 3/5] radix tree test suite: add item_delete_rcu() Ross Zwisler
2018-05-03 19:24 ` [PATCH 4/5] radix tree test suite: multi-order iteration race Ross Zwisler
2018-05-03 19:24 ` [PATCH 5/5] radix tree: fix " Ross Zwisler
2018-05-09 12:46 ` Jan Kara [this message]
2018-05-09 15:09 ` Ross Zwisler
2018-05-08 17:44 ` [PATCH 0/5] fix radix tree " Ross Zwisler
2018-05-10 22:48 ` Andrew Morton
2018-05-10 22:54 ` Dan Williams
2018-05-10 23:12 ` Andrew Morton
2018-05-10 23:19 ` Dan Williams
2018-05-11 4:04 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180509124611.6hoa743z4qrx6bgc@quack2.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=ross.zwisler@linux.intel.com \
--cc=stable@vger.kernel.org \
--cc=willy@infradead.org \
--subject='Re: [PATCH 5/5] radix tree: fix multi-order iteration race' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).