LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: "Longpeng (Mike)" <longpeng2@huawei.com>,
Sean Christopherson <sean.j.christopherson@intel.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, arei.gonglei@huawei.com,
weidong.huang@huawei.com, weifuqiang@huawei.com,
kvm@vger.kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH] mm/hugetlb: avoid get wrong ptep caused by race
Date: Thu, 20 Feb 2020 16:22:24 -0800 [thread overview]
Message-ID: <a82956f7-26e4-5c1c-8d5d-4b2510f6b17d@oracle.com> (raw)
In-Reply-To: <502b5e52-060b-6864-d1b7-eab2dc951aed@huawei.com>
On 2/19/20 6:30 PM, Longpeng (Mike) wrote:
> 在 2020/2/20 3:33, Mike Kravetz 写道:
>> + Kirill
>> On 2/18/20 5:58 PM, Sean Christopherson wrote:
>>> On Wed, Feb 19, 2020 at 09:39:59AM +0800, Longpeng (Mike) wrote:
<snip>
>>> The race and the fix make sense. I assumed dereferencing garbage from the
>>> huge page was the issue, but I wasn't 100% that was the case, which is why
>>> I asked about alternative fixes.
>>>
>>>> We change the code from
>>>> if (pud_huge(*pud) || !pud_present(*pud))
>>>> to
>>>> if (pud_huge(*pud)
>>>> return (pte_t *)pud;
>>>> busy loop for 500ms
>>>> if (!pud_present(*pud))
>>>> return (pte_t *)pud;
>>>> and the panic will be hit quickly.
>>>>
>>>> ARM64 has already use READ/WRITE_ONCE to access the pagetable, look at this
>>>> commit 20a004e7 (arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables).
>>>>
>>>> The root cause is: 'if (pud_huge(*pud) || !pud_present(*pud))' read entry from
>>>> pud twice and the *pud maybe change in a race, so if we only read the pud once.
>>>> I use READ_ONCE here is just for safe, to prevents the complier mischief if
>>>> possible.
>>>
>>> FWIW, I'd be in favor of going the READ/WRITE_ONCE() route for x86, e.g.
>>> convert everything as a follow-up patch (or patches). I'm fairly confident
>>> that KVM's usage of lookup_address_in_mm() is safe, but I wouldn't exactly
>>> bet my life on it. I'd much rather the failing scenario be that KVM uses
>>> a sub-optimal page size as opposed to exploding on a bad pointer.
>>
>> Longpeng(Mike) asked in another e-mail specifically about making similar
>> changes to lookup_address_in_mm(). Replying here as there is more context.
>>
>> I 'think' lookup_address_in_mm is safe from this issue. Why? IIUC, the
>> problem with the huge_pte_offset routine is that the pud changes from
>> pud_none() to pud_huge() in the middle of
>> 'if (pud_huge(*pud) || !pud_present(*pud))'. In the case of
>> lookup_address_in_mm, we know pud was not pud_none() as it was previously
>> checked. I am not aware of any other state transitions which could cause
>> us trouble. However, I am no expert in this area.
Bad copy/paste by me. Longpeng(Mike) was asking about lookup_address_in_pgd.
> So... I need just fix huge_pte_offset in mm/hugetlb.c, right?
Let's start with just a fix for huge_pte_offset() as you can easily reproduce
that issue by adding a delay.
> Is it possible the pud changes from pud_huge() to pud_none() while another CPU
> is walking the pagetable ?
I believe it is possible. If we hole punch a hugetlbfs file, we will clear
the corresponding pud's. Hence, we can go from pud_huge() to pud_none().
Unless I am missing something, that does imply we could have issues in places
such as lookup_address_in_pgd:
pud = pud_offset(p4d, address);
if (pud_none(*pud))
return NULL;
*level = PG_LEVEL_1G;
if (pud_large(*pud) || !pud_present(*pud))
return (pte_t *)pud;
I hope I am wrong, but it seems like pud_none(*pud) could become true after
the initial check, and before the (pud_large) check. If so, there could be
a problem (addressing exception) when the code continues and looks up the pmd.
pmd = pmd_offset(pud, address);
if (pmd_none(*pmd))
return NULL;
It has been mentioned before that there are many page table walks like this.
What am I missing that prevents races like this? Or, have we just been lucky?
--
Mike Kravetz
next prev parent reply other threads:[~2020-02-21 0:23 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-18 12:10 Longpeng(Mike)
2020-02-18 20:37 ` Sean Christopherson
2020-02-19 0:51 ` Mike Kravetz
2020-02-19 1:39 ` Longpeng (Mike)
2020-02-19 1:58 ` Sean Christopherson
2020-02-19 12:21 ` Longpeng (Mike)
2020-02-19 16:22 ` Sean Christopherson
2020-02-20 2:32 ` Longpeng (Mike)
2020-02-19 19:33 ` Mike Kravetz
2020-02-20 2:30 ` Longpeng (Mike)
2020-02-21 0:22 ` Mike Kravetz [this message]
2020-02-22 2:15 ` Longpeng (Mike)
2020-02-18 20:52 ` Matthew Wilcox
2020-02-19 2:09 ` Longpeng (Mike)
2020-02-19 3:49 ` Mike Kravetz
2020-02-19 12:52 ` Longpeng (Mike)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a82956f7-26e4-5c1c-8d5d-4b2510f6b17d@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=arei.gonglei@huawei.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longpeng2@huawei.com \
--cc=sean.j.christopherson@intel.com \
--cc=weidong.huang@huawei.com \
--cc=weifuqiang@huawei.com \
--cc=willy@infradead.org \
--subject='Re: [PATCH] mm/hugetlb: avoid get wrong ptep caused by race' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).