LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Anshuman Khandual <anshuman.khandual@arm.com>
To: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: linux-mm@kvack.org, christophe.leroy@c-s.fr,
	Jonathan Corbet <corbet@lwn.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Vineet Gupta <vgupta@synopsys.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-snps-arc@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org,
	linux-riscv@lists.infradead.org, x86@kernel.org,
	linux-doc@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
Date: Wed, 8 Apr 2020 12:41:51 +0530
Message-ID: <253cf5c8-e43e-5737-24e8-3eda3b6ba7b3@arm.com> (raw)
In-Reply-To: <20200407175440.41cc00a5@thinkpad>


On 04/07/2020 09:24 PM, Gerald Schaefer wrote:
> On Sun, 5 Apr 2020 17:58:14 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
> [...]
>>>
>>> Could be fixed like this (the first de-reference is a bit special,
>>> because at that point *ptep does not really point to a large (pmd) entry
>>> yet, it is initially an invalid pte entry, which breaks our huge_ptep_get()  
>>
>> There seems to be an inconsistency on s390 platform. Even though it defines
>> a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET
>> which should have forced it fallback on generic huge_ptep_get() but it does
>> not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when
>> an arch uses <asm-generic/hugetlb.h>. s390 does not use that and hence gets
>> away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds
>> confusing ? But I might not have the entire context here.
> 
> Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get()
> was initially introduced because of s390, and now we don't select
> __HAVE_ARCH_HUGE_PTEP_GET...
> 
> As you realized, I guess this is because we do not use generic hugetlb.h.
> And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad
> ("hugetlb: introduce generic version of huge_ptep_get"), that was probably
> the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET.

Understood.

> 
> Nothing really wrong with that, but yes, very confusing. Maybe we could
> also select it for s390, even though it wouldn't have any functional
> impact (so far), just for less confusion. Maybe also thinking about
> using the generic hugetlb.h, not sure if the original reasons for not
> doing so would still apply. Now I only need to find the time...

Seems like something worth to explore if we could remove this confusion.

> 
>>
>>> conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE,
>>> because we do have some special bits there in our large pmds. It seems
>>> to also work w/o that alignment, but it feels a bit wrong):  
>>
>> Sure, we can accommodate that.
>>
>>>
>>> @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test
>>>                                           unsigned long vaddr, pgprot_t prot)
>>>  {
>>>         struct page *page = pfn_to_page(pfn);
>>> -       pte_t pte = READ_ONCE(*ptep);
>>> +       pte_t pte;
>>>
>>> -       pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>>> +       pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot));  
>>
>> So that keeps the existing value in 'ptep' pointer at bay and instead
>> construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at
>> least provide the seed that can be ORed with RANDOM_ORVALUE before
>> being masked with PMD_MASK. Do you see any problem ?
> 
> Yes, unfortunately. The problem is that the resulting pte is not marked
> as present. The conversion pte -> (huge) pmd, which is done in
> set_huge_pte_at() for s390, will establish an empty pmd for non-present
> ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent
> huge_ptep_get() will not result in the same original pte value. If you

Ohh.

> want to preserve and check the RANDOM_ORVALUE, it has to be a present
> pte, hence the mk_pte(_phys).

Understood and mk_pte() is also available on all platforms.

> 
>>
>> Some thing like this instead.
>>
>> pte_t pte = READ_ONCE(*ptep);
>> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));
>>
>> We cannot use mk_pte_phys() as it is defined only on some platforms
>> without any generic fallback for others.
> 
> Oh, didn't know that, sorry. What about using mk_pte() instead, at least
> it would result in a present pte:
> 
> pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));

Lets use mk_pte() here but can we do this instead

paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK;
pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot));

> 
> And if you also want to do some with the existing value, which seems
> to be an empty pte, then maybe just check if writing and reading that
> value with set_huge_pte_at() / huge_ptep_get() returns the same,
> i.e. initially w/o RANDOM_ORVALUE.
> 
> So, in combination, like this (BTW, why is the barrier() needed, it
> is not used for the other set_huge_pte_at() calls later?):

Ahh missed, will add them. Earlier we faced problem without it after
set_pte_at() for a test on powerpc (64) platform. Hence just added it
here to be extra careful.

> 
> @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test
>         struct page *page = pfn_to_page(pfn);
>         pte_t pte = READ_ONCE(*ptep);
>  
> -       pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> +       set_huge_pte_at(mm, vaddr, ptep, pte);
> +       WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> +
> +       pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));
>         set_huge_pte_at(mm, vaddr, ptep, pte);
>         barrier();
>         WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> 
> This would actually add a new test "write empty pte with
> set_huge_pte_at(), then verify with huge_ptep_get()", which happens
> to trigger a warning on s390 :-)

On arm64 as well which checks for pte_present() in set_huge_pte_at().
But PTE present check is not really present in each set_huge_pte_at()
implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT.
Hence wondering if we should add this new test here which will keep
giving warnings on s390 and arm64 (at the least).

> 
> That (new) warning actually points to misbehavior on s390, we do not
> write a correct empty pmd in this case, but one that is empty and also
> marked as large. huge_ptep_get() will then not correctly recognize it
> as empty and do wrong conversion. It is also not consistent with
> huge_ptep_get_and_clear(), where we write the empty pmd w/o marking
> as large. Last but not least it would also break our pmd_protnone()
> logic (see below). Another nice finding on s390 :-)

:) 

> 
> I don't think this has any effect in practice (yet), but I will post a
> fix for that, just in case you will add / change this test.

Okay.

> 
>>
>>>         set_huge_pte_at(mm, vaddr, ptep, pte);
>>>         barrier();
>>>         WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
>>>         huge_pte_clear(mm, vaddr, ptep, PMD_SIZE);
>>> -       pte = READ_ONCE(*ptep);
>>> +       pte = huge_ptep_get(ptep);
>>>         WARN_ON(!huge_pte_none(pte));
>>>  
>>>         pte = mk_huge_pte(page, prot);
>>>         set_huge_pte_at(mm, vaddr, ptep, pte);
>>>         huge_ptep_set_wrprotect(mm, vaddr, ptep);
>>> -       pte = READ_ONCE(*ptep);
>>> +       pte = huge_ptep_get(ptep);
>>>         WARN_ON(huge_pte_write(pte));
>>>  
>>>         pte = mk_huge_pte(page, prot);
>>>         set_huge_pte_at(mm, vaddr, ptep, pte);
>>>         huge_ptep_get_and_clear(mm, vaddr, ptep);
>>> -       pte = READ_ONCE(*ptep);
>>> +       pte = huge_ptep_get(ptep);
>>>         WARN_ON(!huge_pte_none(pte));
>>>  
>>>         pte = mk_huge_pte(page, prot);
>>> @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test
>>>         pte = huge_pte_mkwrite(pte);
>>>         pte = huge_pte_mkdirty(pte);
>>>         huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
>>> -       pte = READ_ONCE(*ptep);
>>> +       pte = huge_ptep_get(ptep);
>>>         WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
>>>  }
>>>  #else
>>>
>>> 3) The pmd_protnone_tests() has an issue, because it passes a pmd to
>>> pmd_protnone() which has not been marked as large. We check for large
>>> pmd in the s390 implementation of pmd_protnone(), and will fail if a
>>> pmd is not large. We had similar issues before, in other helpers, where
>>> I changed the logic on s390 to not require the pmd large check, but I'm
>>> not so sure in this case. Is there a valid use case for doing
>>> pmd_protnone() on "normal" pmds? Or could this be changed like this:  
>>
>> That is a valid question. IIUC, all existing callers for pmd_protnone()
>> ensure that it is indeed a huge PMD. But even assuming otherwise should
>> not the huge PMD requirement get checked in the caller itself rather than
>> in the arch helper which is just supposed to check the existence of the
>> dedicated PTE bit(s) for this purpose. Purely from a helper perspective
>> pmd_protnone() should not really care about being large even though it
>> might never get used without one.
>>
>> Also all platforms (except s390) derive the pmd_protnone() from their
>> respective pte_protnone(). I wonder why should s390 be any different
>> unless it is absolutely necessary.
> 
> This is again because of our different page table entry layouts for
> pte/pmd and (large) pmd. The bits we check for pmd_protnone() are
> not valid for normal pmd/pte, and we would return undefined result for
> normal entries.
> 
> Of course, we could rely on nobody calling pmd_protnone() on normal
> pmds, but in this case we also use pmd_large() check in pmd_protnone()
> for indication if the pmd is present. W/o that, we would return
> true for empty pmds, that doesn't sound right. Not sure if we also
> want to rely on nobody calling pmd_protnone() on empty pmds.

That might be problematic.

> 
> Anyway, if in practice it is not correct to use pmd_protnone()
> on normal pmds, then I would suggest that your tests should also
> not do / test it. And I strongly assume that it is not correct, at
> least I cannot think of a valid case, and of course s390 would
> already be broken if there was such a case.

Okay, will use huge PMD here as you had suggested earlier.

> 
> Regards,
> Gerald
> 
> 

  reply index

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24  5:22 Anshuman Khandual
2020-03-24  5:22 ` [PATCH V2 1/3] mm/debug: Add tests validating arch page table helpers for core features Anshuman Khandual
2020-03-24 13:29   ` Zi Yan
2020-03-26  2:18     ` Anshuman Khandual
2020-03-26 17:08       ` Zi Yan
2020-03-30  8:56   ` [mm/debug] f675f2f91d: WARNING:at_mm/debug_vm_pgtable.c:#debug_vm_pgtable kernel test robot
2020-04-05 14:49     ` Anshuman Khandual
2020-03-24  5:22 ` [PATCH V2 2/3] mm/debug: Add tests validating arch advanced page table helpers Anshuman Khandual
2020-03-27  7:41   ` [mm/debug] d157503f6f: WARNING:at_mm/debug_vm_pgtable.c:#debug_vm_pgtable kernel test robot
2020-03-24  5:22 ` [PATCH V2 3/3] Documentation/mm: Add descriptions for arch page table helpers Anshuman Khandual
2020-03-26  2:23 ` [PATCH V2 0/3] mm/debug: Add more arch page table helper tests Anshuman Khandual
2020-03-26 15:23   ` Christophe Leroy
2020-03-27  6:46     ` Anshuman Khandual
2020-03-27  7:00       ` Christophe Leroy
2020-03-29 14:21         ` Anshuman Khandual
2020-03-31 12:30 ` Gerald Schaefer
2020-04-05 12:28   ` Anshuman Khandual
2020-04-07 15:54     ` Gerald Schaefer
2020-04-08  7:11       ` Anshuman Khandual [this message]
2020-04-08 12:15         ` Gerald Schaefer
2020-04-09  1:06           ` Anshuman Khandual

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=253cf5c8-e43e-5737-24e8-3eda3b6ba7b3@arm.com \
    --to=anshuman.khandual@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=borntraeger@de.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@c-s.fr \
    --cc=corbet@lwn.net \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=kirill@shutemov.name \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-snps-arc@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=paulus@samba.org \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vgupta@synopsys.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lkml.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lkml.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lkml.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lkml.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lkml.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lkml.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lkml.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lkml.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lkml.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lkml.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git