LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
	linux-nvdimm@lists.01.org, stable@vger.kernel.org
Subject: Re: [PATCH 5/5] radix tree: fix multi-order iteration race
Date: Wed, 9 May 2018 14:46:11 +0200	[thread overview]
Message-ID: <20180509124611.6hoa743z4qrx6bgc@quack2.suse.cz> (raw)
In-Reply-To: <20180503192430.7582-6-ross.zwisler@linux.intel.com>

On Thu 03-05-18 13:24:30, Ross Zwisler wrote:
> Fix a race in the multi-order iteration code which causes the kernel to hit
> a GP fault.  This was first seen with a production v4.15 based kernel
> (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD
> DAX entries.
> 
> The race has to do with how we tear down multi-order sibling entries when
> we are removing an item from the tree.  Remember for example that an order
> 2 entry looks like this:
> 
> struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> 
> where 'entry' is in some slot in the struct radix_tree_node, and the three
> slots following 'entry' contain sibling pointers which point back to
> 'entry.'
> 
> When we delete 'entry' from the tree, we call :
>   radix_tree_delete()
>     radix_tree_delete_item()
>       __radix_tree_delete()
>         replace_slot()
> 
> replace_slot() first removes the siblings in order from the first to the
> last, then at then replaces 'entry' with NULL.  This means that for a brief
> period of time we end up with one or more of the siblings removed, so:
> 
> struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> 
> This causes an issue if you have a reader iterating over the slots in the
> tree via radix_tree_for_each_slot() while only under
> rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
> mm/filemap.c.
> 
> The issue is that when __radix_tree_next_slot() => skip_siblings() tries to
> skip over the sibling entries in the slots, it currently does so with an
> exact match on the slot directly preceding our current slot.  Normally this
> works:
>                                     V preceding slot
> struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
>                                             ^ current slot
> 
> This lets you find the first sibling, and you skip them all in order.
> 
> But in the case where one of the siblings is NULL, that slot is skipped and
> then our sibling detection is interrupted:
> 
>                                            V preceding slot
> struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
>                                                   ^ current slot
> 
> This means that the sibling pointers aren't recognized since they point all
> the way back to 'entry', so we think that they are normal internal radix
> tree pointers.  This causes us to think we need to walk down to a struct
> radix_tree_node starting at the address of 'entry'.
> 
> In a real running kernel this will crash the thread with a GP fault when
> you try and dereference the slots in your broken node starting at 'entry'.
> 
> We fix this race by fixing the way that skip_siblings() detects sibling
> nodes.  Instead of testing against the preceding slot we instead look for
> siblings via is_sibling_entry() which compares against the position of the
> struct radix_tree_node.slots[] array.  This ensures that sibling entries
> are properly identified, even if they are no longer contiguous with the
> 'entry' they point to.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reported-by: CR, Sapthagirish <sapthagirish.cr@intel.com>
> Fixes: commit 148deab223b2 ("radix-tree: improve multiorder iterators")
> Cc: <stable@vger.kernel.org>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  lib/radix-tree.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index da9e10c827df..43e0cbedc3a0 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -1612,11 +1612,9 @@ static void set_iter_tags(struct radix_tree_iter *iter,
>  static void __rcu **skip_siblings(struct radix_tree_node **nodep,
>  			void __rcu **slot, struct radix_tree_iter *iter)
>  {
> -	void *sib = node_to_entry(slot - 1);
> -
>  	while (iter->index < iter->next_index) {
>  		*nodep = rcu_dereference_raw(*slot);
> -		if (*nodep && *nodep != sib)
> +		if (*nodep && !is_sibling_entry(iter->node, *nodep))
>  			return slot;
>  		slot++;
>  		iter->index = __radix_tree_iter_add(iter, 1);
> @@ -1631,7 +1629,7 @@ void __rcu **__radix_tree_next_slot(void __rcu **slot,
>  				struct radix_tree_iter *iter, unsigned flags)
>  {
>  	unsigned tag = flags & RADIX_TREE_ITER_TAG_MASK;
> -	struct radix_tree_node *node = rcu_dereference_raw(*slot);
> +	struct radix_tree_node *node;
>  
>  	slot = skip_siblings(&node, slot, iter);
>  
> -- 
> 2.14.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2018-05-09 12:46 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-03 19:24 [PATCH 0/5] fix radix tree " Ross Zwisler
2018-05-03 19:24 ` [PATCH 1/5] radix tree test suite: fix mapshift build target Ross Zwisler
2018-07-15 23:00   ` Matthew Wilcox
2018-07-16 16:07     ` Ross Zwisler
2018-07-16 19:52       ` Matthew Wilcox
2018-07-16 21:08         ` Ross Zwisler
2018-07-17  2:41           ` Matthew Wilcox
2018-07-21 23:45             ` Dave Chinner
2018-07-22  3:11               ` Ross Zwisler
2018-07-17  3:18       ` Matthew Wilcox
2018-07-17 17:17         ` Ross Zwisler
2018-05-03 19:24 ` [PATCH 2/5] radix tree test suite: fix compilation issue Ross Zwisler
2018-05-03 19:24 ` [PATCH 3/5] radix tree test suite: add item_delete_rcu() Ross Zwisler
2018-05-03 19:24 ` [PATCH 4/5] radix tree test suite: multi-order iteration race Ross Zwisler
2018-05-03 19:24 ` [PATCH 5/5] radix tree: fix " Ross Zwisler
2018-05-09 12:46   ` Jan Kara [this message]
2018-05-09 15:09     ` Ross Zwisler
2018-05-08 17:44 ` [PATCH 0/5] fix radix tree " Ross Zwisler
2018-05-10 22:48   ` Andrew Morton
2018-05-10 22:54     ` Dan Williams
2018-05-10 23:12       ` Andrew Morton
2018-05-10 23:19         ` Dan Williams
2018-05-11  4:04     ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180509124611.6hoa743z4qrx6bgc@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    --subject='Re: [PATCH 5/5] radix tree: fix multi-order iteration race' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).