LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* bisected boot regression post 2.6.25-rc3.. please revert
@ 2008-03-01 18:56 Arjan van de Ven
2008-03-03 7:46 ` Ingo Molnar
2008-03-03 17:15 ` Nish Aravamudan
0 siblings, 2 replies; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-01 18:56 UTC (permalink / raw)
To: mingo, torvalds, hans.rosenfeld; +Cc: linux-kernel
Hi Linus, Ingo, Hans,
Please revert commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd
( x86: fix pmd_bad and pud_bad to support huge pages )
since it prevents the kernel to finish booting on my (Penryn based)
laptop. The boot stops right after freeing the init memory.
Took a while to bisect (due to it touching page*.h, which forces a
full recompile), but it definitely is caused by this commit...
Please revert.
[arjan@localhost linux.trees.git]$ git-bisect good
cded932b75ab0a5f9181ee3da34a0a488d1a14fd is first bad commit
commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd
Author: Hans Rosenfeld <hans.rosenfeld@amd.com>
Date: Mon Feb 18 18:10:47 2008 +0100
x86: fix pmd_bad and pud_bad to support huge pages
I recently stumbled upon a problem in the support for huge pages. If a
program using huge pages does not explicitly unmap them, they remain
mapped (and therefore, are lost) after the program exits.
I observed that the free huge page count in /proc/meminfo decreased when
running my program, and it did not increase after the program exited.
After running the program a few times, no more huge pages could be
allocated.
The reason for this seems to be that the x86 pmd_bad and pud_bad
consider pmd/pud entries having the PSE bit set invalid. I think there
is nothing wrong with this bit being set, it just indicates that the
lowest level of translation has been reached. This bit has to be (and
is) checked after the basic validity of the entry has been checked, like
in this fragment from follow_page() in mm/memory.c:
if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
goto no_page_table;
if (pmd_huge(*pmd)) {
BUG_ON(flags & FOLL_GET);
page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
goto out;
}
Note that this code currently doesn't work as intended if the pmd refers
to a huge page, the pmd_huge() check can not be reached if the page is
huge.
Extending pmd_bad() (and, for future 1GB page support, pud_bad()) to
allow for the PSE bit being set fixes this. For similar reasons,
allowing the NX bit being set is necessary, too. I have seen huge pages
having the NX bit set in their pmd entry, which would cause the same
problem.
Signed-Off-By: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
:040000 040000 1051a11e76304c0fa9229af65cbbd288a8125651
fd91df38defe41df09dde3c0955fb32a4476467a M include
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-01 18:56 bisected boot regression post 2.6.25-rc3.. please revert Arjan van de Ven
@ 2008-03-03 7:46 ` Ingo Molnar
2008-03-03 9:13 ` Ingo Molnar
2008-03-03 17:15 ` Nish Aravamudan
1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-03-03 7:46 UTC (permalink / raw)
To: Arjan van de Ven
Cc: torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner, H. Peter Anvin
* Arjan van de Ven <arjan@linux.intel.com> wrote:
> Hi Linus, Ingo, Hans,
>
> Please revert commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd ( x86:
> fix pmd_bad and pud_bad to support huge pages ) since it prevents the
> kernel to finish booting on my (Penryn based) laptop. The boot stops
> right after freeing the init memory. Took a while to bisect (due to it
> touching page*.h, which forces a full recompile), but it definitely is
> caused by this commit...
hm, lets figure out why this patch breaks your box, ok? We obviously
have to revert it if we cannot figure it out, but lets at least try -
because the patch itself fixes a real regression and it's not obviously
wrong either. I think there might be some bug hiding somewhere that we
really want to fix instead of this revert.
Could you try to hack up a debug patch perhaps? Uninline the fucntion(s)
then add two versions (one is that breaks on your box and one is that
works on your box) of this same pmd_bad()/pud_bad() functions and do
something like this (pseudocode):
pmd_bad()
{
if (pmd_bad_working(x) != pmd_bad_broken(x))
panic_timeout++;
return pmd_bad_working(x);
}
i.e. we actually use the working function so your box should boot up
just fine - but we instrument things with the broken function as well
and detect the cases where the two values differ. It is an anomaly if
either function ever returns true instead of false.
if after bootup you have a non-zero panic_timeout then there is a
material difference somewhere. In that case try to stick a dump_stack()
into the above case as well - and if the box still boots, send us the
stackdumps. (if the box doesnt boot then perhaps the printout itself
hangs - in that case try a save_stack_trace() hack and print it out
later)
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 7:46 ` Ingo Molnar
@ 2008-03-03 9:13 ` Ingo Molnar
2008-03-03 16:41 ` Arjan van de Ven
0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-03-03 9:13 UTC (permalink / raw)
To: Arjan van de Ven
Cc: torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner, H. Peter Anvin
* Ingo Molnar <mingo@elte.hu> wrote:
> Could you try to hack up a debug patch perhaps? Uninline the
> fucntion(s) then add two versions (one is that breaks on your box and
> one is that works on your box) of this same pmd_bad()/pud_bad()
> functions and do something like this (pseudocode):
i.e. something like the (tested) patch below. It is not clear whether
your testcase is on 32-bit or 64-bit - so i went for the more likely
32-bit case first.
Ingo
------------------>
Subject: x86: patches/x86-debug-bad-page.patch
From: Ingo Molnar <mingo@elte.hu>
Date: Mon Mar 03 09:53:17 CET 2008
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/mm/pgtable_32.c | 7 +++++++
include/asm-x86/pgtable_32.h | 6 +++++-
2 files changed, 12 insertions(+), 1 deletion(-)
Index: linux-x86.q/arch/x86/mm/pgtable_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/mm/pgtable_32.c
+++ linux-x86.q/arch/x86/mm/pgtable_32.c
@@ -381,3 +381,10 @@ void __pmd_free_tlb(struct mmu_gather *t
}
#endif
+
+int pmd_bad(pmd_t pmd)
+{
+ WARN_ON_ONCE(pmd_bad_v1(pmd) != pmd_bad_v2(pmd));
+
+ return pmd_bad_v1(pmd);
+}
Index: linux-x86.q/include/asm-x86/pgtable_32.h
===================================================================
--- linux-x86.q.orig/include/asm-x86/pgtable_32.h
+++ linux-x86.q/include/asm-x86/pgtable_32.h
@@ -92,7 +92,11 @@ extern unsigned long pg0[];
/* To avoid harmful races, pmd_none(x) should check only the lower when PAE */
#define pmd_none(x) (!(unsigned long)pmd_val(x))
#define pmd_present(x) (pmd_val(x) & _PAGE_PRESENT)
-#define pmd_bad(x) ((pmd_val(x) \
+
+extern int pmd_bad(pmd_t pmd);
+
+#define pmd_bad_v1(x) ((pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)) != _KERNPG_TABLE)
+#define pmd_bad_v2(x) ((pmd_val(x) \
& ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)) \
!= _KERNPG_TABLE)
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 9:13 ` Ingo Molnar
@ 2008-03-03 16:41 ` Arjan van de Ven
2008-03-03 17:40 ` Ingo Molnar
0 siblings, 1 reply; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-03 16:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner, H. Peter Anvin
Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>> Could you try to hack up a debug patch perhaps? Uninline the
>> fucntion(s) then add two versions (one is that breaks on your box and
>> one is that works on your box) of this same pmd_bad()/pud_bad()
>> functions and do something like this (pseudocode):
>
> i.e. something like the (tested) patch below. It is not clear whether
> your testcase is on 32-bit or 64-bit - so i went for the more likely
> 32-bit case first.
------------[ cut here ]------------
WARNING: at arch/x86/mm/pgtable_32.c:387 pmd_bad+0x44/0x53()
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.25-rc3 #14
[<c0424ba5>] warn_on_slowpath+0x41/0x67
[<c0408c5c>] ? native_sched_clock+0x94/0xa6
[<c043f432>] ? lock_release_holdtime+0x1a/0x115
[<c04702d4>] ? handle_mm_fault+0x297/0x7e2
[<c063eee6>] ? _spin_unlock+0x1d/0x20
[<c04707f0>] ? handle_mm_fault+0x7b3/0x7e2
[<c04851c1>] ? do_sync_read+0xab/0xe9
[<c0417223>] pmd_bad+0x44/0x53
[<c046f37f>] follow_page+0x8b/0x1f2
[<c0470aa0>] get_user_pages+0x281/0x2ef
[<c0488dec>] get_arg_page+0x2d/0x79
[<c04f63cd>] ? strnlen_user+0x2f/0x4d
[<c0488efb>] copy_strings+0xc3/0x160
[<c0488fb4>] copy_strings_kernel+0x1c/0x2b
[<c048a109>] do_execve+0x11a/0x1f0
[<c0403351>] sys_execve+0x29/0x51
[<c0404ac6>] syscall_call+0x7/0xb
[<c0407697>] ? kernel_execve+0x17/0x1c
[<c04021d1>] ? _stext+0x69/0x12c
[<c040574f>] ? kernel_thread_helper+0x7/0x10
=======================
---[ end trace 63b854b89f6f375d ]---
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-01 18:56 bisected boot regression post 2.6.25-rc3.. please revert Arjan van de Ven
2008-03-03 7:46 ` Ingo Molnar
@ 2008-03-03 17:15 ` Nish Aravamudan
1 sibling, 0 replies; 27+ messages in thread
From: Nish Aravamudan @ 2008-03-03 17:15 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: mingo, torvalds, hans.rosenfeld, linux-kernel
On 3/1/08, Arjan van de Ven <arjan@linux.intel.com> wrote:
> Hi Linus, Ingo, Hans,
>
> Please revert commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd
> ( x86: fix pmd_bad and pud_bad to support huge pages )
> since it prevents the kernel to finish booting on my (Penryn based)
> laptop. The boot stops right after freeing the init memory.
> Took a while to bisect (due to it touching page*.h, which forces a
> full recompile), but it definitely is caused by this commit...
>
> Please revert.
>
>
>
> [arjan@localhost linux.trees.git]$ git-bisect good
> cded932b75ab0a5f9181ee3da34a0a488d1a14fd is first bad commit
> commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd
> Author: Hans Rosenfeld <hans.rosenfeld@amd.com>
> Date: Mon Feb 18 18:10:47 2008 +0100
>
> x86: fix pmd_bad and pud_bad to support huge pages
>
> I recently stumbled upon a problem in the support for huge pages. If a
> program using huge pages does not explicitly unmap them, they remain
> mapped (and therefore, are lost) after the program exits.
>
> I observed that the free huge page count in /proc/meminfo decreased when
> running my program, and it did not increase after the program exited.
> After running the program a few times, no more huge pages could be
> allocated.
>
> The reason for this seems to be that the x86 pmd_bad and pud_bad
> consider pmd/pud entries having the PSE bit set invalid. I think there
> is nothing wrong with this bit being set, it just indicates that the
> lowest level of translation has been reached. This bit has to be (and
> is) checked after the basic validity of the entry has been checked, like
> in this fragment from follow_page() in mm/memory.c:
>
> if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
> goto no_page_table;
>
> if (pmd_huge(*pmd)) {
> BUG_ON(flags & FOLL_GET);
> page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
> goto out;
> }
>
> Note that this code currently doesn't work as intended if the pmd refers
> to a huge page, the pmd_huge() check can not be reached if the page is
> huge.
>
> Extending pmd_bad() (and, for future 1GB page support, pud_bad()) to
> allow for the PSE bit being set fixes this. For similar reasons,
> allowing the NX bit being set is necessary, too. I have seen huge pages
> having the NX bit set in their pmd entry, which would cause the same
> problem.
>
> Signed-Off-By: Hans Rosenfeld <hans.rosenfeld@amd.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Hrm, not that I necessarily have the right to ask for this, but
shouldn't hugepage related patches at least be cc'd to linux-mm? it
seems this went via LKML to the x86 tree? There are a few of us that
work on hugepages that would have seen this there, but it got lost in
the noise (for me) on LKML.
Thanks,
Nish
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 16:41 ` Arjan van de Ven
@ 2008-03-03 17:40 ` Ingo Molnar
2008-03-03 17:51 ` Nish Aravamudan
2008-03-03 18:36 ` Arjan van de Ven
0 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2008-03-03 17:40 UTC (permalink / raw)
To: Arjan van de Ven
Cc: torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner, H. Peter Anvin
* Arjan van de Ven <arjan@linux.intel.com> wrote:
> ------------[ cut here ]------------
> WARNING: at arch/x86/mm/pgtable_32.c:387 pmd_bad+0x44/0x53()
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.25-rc3 #14
> [<c0424ba5>] warn_on_slowpath+0x41/0x67
> [<c0408c5c>] ? native_sched_clock+0x94/0xa6
> [<c043f432>] ? lock_release_holdtime+0x1a/0x115
> [<c04702d4>] ? handle_mm_fault+0x297/0x7e2
> [<c063eee6>] ? _spin_unlock+0x1d/0x20
> [<c04707f0>] ? handle_mm_fault+0x7b3/0x7e2
> [<c04851c1>] ? do_sync_read+0xab/0xe9
> [<c0417223>] pmd_bad+0x44/0x53
> [<c046f37f>] follow_page+0x8b/0x1f2
> [<c0470aa0>] get_user_pages+0x281/0x2ef
hm. I suspect some gcc related difference related to the handling of
this masking:
pmd_val(x) & ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)
versus:
pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)
perhaps it will work if you change it to:
pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
?
in any case, the commit has to be reverted as it clearly isnt a NOP on
your box as it was intended to be. (it should only have made a
difference in a rare hugetlbfs case)
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 17:40 ` Ingo Molnar
@ 2008-03-03 17:51 ` Nish Aravamudan
2008-03-03 17:55 ` Ingo Molnar
2008-03-03 18:36 ` Arjan van de Ven
1 sibling, 1 reply; 27+ messages in thread
From: Nish Aravamudan @ 2008-03-03 17:51 UTC (permalink / raw)
To: Ingo Molnar
Cc: Arjan van de Ven, torvalds, hans.rosenfeld, linux-kernel,
Thomas Gleixner, H. Peter Anvin
On 3/3/08, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Arjan van de Ven <arjan@linux.intel.com> wrote:
>
>
> > ------------[ cut here ]------------
> > WARNING: at arch/x86/mm/pgtable_32.c:387 pmd_bad+0x44/0x53()
> > Modules linked in:
> > Pid: 1, comm: swapper Not tainted 2.6.25-rc3 #14
> > [<c0424ba5>] warn_on_slowpath+0x41/0x67
> > [<c0408c5c>] ? native_sched_clock+0x94/0xa6
> > [<c043f432>] ? lock_release_holdtime+0x1a/0x115
> > [<c04702d4>] ? handle_mm_fault+0x297/0x7e2
> > [<c063eee6>] ? _spin_unlock+0x1d/0x20
> > [<c04707f0>] ? handle_mm_fault+0x7b3/0x7e2
> > [<c04851c1>] ? do_sync_read+0xab/0xe9
> > [<c0417223>] pmd_bad+0x44/0x53
> > [<c046f37f>] follow_page+0x8b/0x1f2
> > [<c0470aa0>] get_user_pages+0x281/0x2ef
>
>
> hm. I suspect some gcc related difference related to the handling of
> this masking:
>
>
> pmd_val(x) & ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)
>
>
> versus:
>
>
> pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)
>
>
> perhaps it will work if you change it to:
>
>
> pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
>
>
> ?
>
> in any case, the commit has to be reverted as it clearly isnt a NOP on
> your box as it was intended to be. (it should only have made a
> difference in a rare hugetlbfs case)
On x86/{,_64}, _PAGE_PSE and _PAGE_PROTNONE are the same bit. Would
that have any effect here? We encountered that collision when adding
mprotect() support for hugepages.
Thanks,
Nish
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 17:51 ` Nish Aravamudan
@ 2008-03-03 17:55 ` Ingo Molnar
2008-03-03 17:58 ` H. Peter Anvin
0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-03-03 17:55 UTC (permalink / raw)
To: Nish Aravamudan
Cc: Arjan van de Ven, torvalds, hans.rosenfeld, linux-kernel,
Thomas Gleixner, H. Peter Anvin
* Nish Aravamudan <nish.aravamudan@gmail.com> wrote:
> > in any case, the commit has to be reverted as it clearly isnt a NOP
> > on your box as it was intended to be. (it should only have made a
> > difference in a rare hugetlbfs case)
>
> On x86/{,_64}, _PAGE_PSE and _PAGE_PROTNONE are the same bit. Would
> that have any effect here? We encountered that collision when adding
> mprotect() support for hugepages.
well, this is the PMD, and PROTNONE is a pte property.
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 17:55 ` Ingo Molnar
@ 2008-03-03 17:58 ` H. Peter Anvin
0 siblings, 0 replies; 27+ messages in thread
From: H. Peter Anvin @ 2008-03-03 17:58 UTC (permalink / raw)
To: Ingo Molnar
Cc: Nish Aravamudan, Arjan van de Ven, torvalds, hans.rosenfeld,
linux-kernel, Thomas Gleixner
Ingo Molnar wrote:
> * Nish Aravamudan <nish.aravamudan@gmail.com> wrote:
>
>>> in any case, the commit has to be reverted as it clearly isnt a NOP
>>> on your box as it was intended to be. (it should only have made a
>>> difference in a rare hugetlbfs case)
>> On x86/{,_64}, _PAGE_PSE and _PAGE_PROTNONE are the same bit. Would
>> that have any effect here? We encountered that collision when adding
>> mprotect() support for hugepages.
>
> well, this is the PMD, and PROTNONE is a pte property.
The PSE bit in the PTE becomes the PATX bit. It shouldn't ever be set
in Linux.
-hpa
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 17:40 ` Ingo Molnar
2008-03-03 17:51 ` Nish Aravamudan
@ 2008-03-03 18:36 ` Arjan van de Ven
2008-03-03 18:44 ` Linus Torvalds
2008-03-03 21:13 ` Segher Boessenkool
1 sibling, 2 replies; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-03 18:36 UTC (permalink / raw)
To: Ingo Molnar
Cc: torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner, H. Peter Anvin
Ingo Molnar wrote:
> * Arjan van de Ven <arjan@linux.intel.com> wrote:
>
>> ------------[ cut here ]------------
>> WARNING: at arch/x86/mm/pgtable_32.c:387 pmd_bad+0x44/0x53()
>> Modules linked in:
>> Pid: 1, comm: swapper Not tainted 2.6.25-rc3 #14
>> [<c0424ba5>] warn_on_slowpath+0x41/0x67
>> [<c0408c5c>] ? native_sched_clock+0x94/0xa6
>> [<c043f432>] ? lock_release_holdtime+0x1a/0x115
>> [<c04702d4>] ? handle_mm_fault+0x297/0x7e2
>> [<c063eee6>] ? _spin_unlock+0x1d/0x20
>> [<c04707f0>] ? handle_mm_fault+0x7b3/0x7e2
>> [<c04851c1>] ? do_sync_read+0xab/0xe9
>> [<c0417223>] pmd_bad+0x44/0x53
>> [<c046f37f>] follow_page+0x8b/0x1f2
>> [<c0470aa0>] get_user_pages+0x281/0x2ef
>
> hm. I suspect some gcc related difference related to the handling of
> this masking:
>
> pmd_val(x) & ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)
>
> versus:
>
> pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)
>
> perhaps it will work if you change it to:
>
> pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
>
> ?
>
> in any case, the commit has to be reverted as it clearly isnt a NOP on
> your box as it was intended to be. (it should only have made a
> difference in a rare hugetlbfs case)
interesting observation: if I turn the macros into inlines... the difference goes away.
I'm half tempted to just do it as inline period ... any objections ?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 18:36 ` Arjan van de Ven
@ 2008-03-03 18:44 ` Linus Torvalds
2008-03-03 22:00 ` Arjan van de Ven
2008-03-09 11:56 ` Ingo Molnar
2008-03-03 21:13 ` Segher Boessenkool
1 sibling, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2008-03-03 18:44 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Ingo Molnar, hans.rosenfeld, linux-kernel, Thomas Gleixner,
H. Peter Anvin
On Mon, 3 Mar 2008, Arjan van de Ven wrote:
>
> interesting observation: if I turn the macros into inlines... the difference
> goes away.
>
> I'm half tempted to just do it as inline period ... any objections ?
Yes, I object. I want to understand why it would matter. If this is a
compiler bug, it's a really rather bad one. And if it's just some stupid
bug in our pmd_bad() macro, I still want to know what the problem was.
Can you compile both ways and look at what changed at the offending site
(which is apparently "follow_page()")?
And do you have some odd compiler version?
Linus
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 18:36 ` Arjan van de Ven
2008-03-03 18:44 ` Linus Torvalds
@ 2008-03-03 21:13 ` Segher Boessenkool
2008-03-03 21:22 ` Segher Boessenkool
2008-03-04 6:58 ` Ingo Molnar
1 sibling, 2 replies; 27+ messages in thread
From: Segher Boessenkool @ 2008-03-03 21:13 UTC (permalink / raw)
To: Arjan van de Ven
Cc: hans.rosenfeld, torvalds, linux-kernel, Ingo Molnar,
H. Peter Anvin, Thomas Gleixner
>> hm. I suspect some gcc related difference related to the handling of
>> this masking:
>> pmd_val(x) & ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)
>> versus:
>> pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)
>> perhaps it will work if you change it to:
>> pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
>> ?
>> in any case, the commit has to be reverted as it clearly isnt a NOP
>> on your box as it was intended to be. (it should only have made a
>> difference in a rare hugetlbfs case)
>
> interesting observation: if I turn the macros into inlines... the
> difference goes away.
include/asm-x86/pgtable.h has
#define _PAGE_BIT_PSE 7
#define _PAGE_PSE (_AC(1, L)<<_PAGE_BIT_PSE)
and
#define _PAGE_BIT_NX 63
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
#define _PAGE_NX (_AC(1, ULL) << _PAGE_BIT_NX)
so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
64-bit is 0x00000000ffffff7f, so in
(~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
all the high bits are lost, while the original
~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)
works as intended, since the bit inversion is done on a 64-bit number.
Maybe all those pagetable bit definitions should use 64-bit (ULL or a
cast),
as it is now some dangerous detail is hidden behind the macros. Using
inline
functions for simple constants seems like overkill to me, but would
also work
of course.
Segher
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 21:13 ` Segher Boessenkool
@ 2008-03-03 21:22 ` Segher Boessenkool
2008-03-03 22:33 ` Segher Boessenkool
2008-03-04 6:58 ` Ingo Molnar
1 sibling, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2008-03-03 21:22 UTC (permalink / raw)
To: Segher Boessenkool
Cc: hans.rosenfeld, torvalds, linux-kernel, Ingo Molnar,
Arjan van de Ven, H. Peter Anvin, Thomas Gleixner
> so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
> 64-bit is 0x00000000ffffff7f,
Actually, it is signed, so this isn't true. Comments about unsafeness
still apply.
Segher
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 18:44 ` Linus Torvalds
@ 2008-03-03 22:00 ` Arjan van de Ven
2008-03-04 1:05 ` Arjan van de Ven
2008-03-09 11:56 ` Ingo Molnar
1 sibling, 1 reply; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-03 22:00 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, hans.rosenfeld, linux-kernel, Thomas Gleixner,
H. Peter Anvin
Linus Torvalds wrote:
>
> On Mon, 3 Mar 2008, Arjan van de Ven wrote:
>> interesting observation: if I turn the macros into inlines... the difference
>> goes away.
>>
>> I'm half tempted to just do it as inline period ... any objections ?
>
> Yes, I object. I want to understand why it would matter. If this is a
> compiler bug, it's a really rather bad one. And if it's just some stupid
> bug in our pmd_bad() macro, I still want to know what the problem was.
>
> Can you compile both ways and look at what changed at the offending site
> (which is apparently "follow_page()")?
>
> And do you have some odd compiler version?
it's the F9 gcc, so yeah it's really new.
I'm staring at the disassembly and it's quite different but follow_page() is rather large;
trying to make a smaller testcase now
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 21:22 ` Segher Boessenkool
@ 2008-03-03 22:33 ` Segher Boessenkool
2008-03-03 22:55 ` H. Peter Anvin
2008-03-03 22:56 ` Jeremy Fitzhardinge
0 siblings, 2 replies; 27+ messages in thread
From: Segher Boessenkool @ 2008-03-03 22:33 UTC (permalink / raw)
To: Segher Boessenkool
Cc: hans.rosenfeld, torvalds, linux-kernel, Ingo Molnar,
Arjan van de Ven, H. Peter Anvin, Thomas Gleixner
>> so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
>> 64-bit is 0x00000000ffffff7f,
>
> Actually, it is signed, so this isn't true. Comments about unsafeness
> still apply.
It turns out that PAGE_SIZE is unsigned. So this gives us for
(~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
the types UL, L, L, ULL resp.
The associativity of & is left-to-right, so this in turn becomes
UL, L, ULL
UL, ULL
ULL
and that cast from UL to ULL doesn't sign-extend.
Segher
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 22:33 ` Segher Boessenkool
@ 2008-03-03 22:55 ` H. Peter Anvin
2008-03-03 22:56 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 27+ messages in thread
From: H. Peter Anvin @ 2008-03-03 22:55 UTC (permalink / raw)
To: Segher Boessenkool
Cc: hans.rosenfeld, torvalds, linux-kernel, Ingo Molnar,
Arjan van de Ven, Thomas Gleixner
Segher Boessenkool wrote:
>>> so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
>>> 64-bit is 0x00000000ffffff7f,
>>
>> Actually, it is signed, so this isn't true. Comments about unsafeness
>> still apply.
>
> It turns out that PAGE_SIZE is unsigned. So this gives us for
>
> (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
>
> the types UL, L, L, ULL resp.
>
> The associativity of & is left-to-right, so this in turn becomes
>
> UL, L, ULL
>
> UL, ULL
>
> ULL
>
> and that cast from UL to ULL doesn't sign-extend.
>
All the masks either need to be the proper size or signed.
-hpa
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 22:33 ` Segher Boessenkool
2008-03-03 22:55 ` H. Peter Anvin
@ 2008-03-03 22:56 ` Jeremy Fitzhardinge
2008-03-03 23:04 ` H. Peter Anvin
1 sibling, 1 reply; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-03 22:56 UTC (permalink / raw)
To: Segher Boessenkool
Cc: hans.rosenfeld, torvalds, linux-kernel, Ingo Molnar,
Arjan van de Ven, H. Peter Anvin, Thomas Gleixner
Segher Boessenkool wrote:
>>> so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
>>> 64-bit is 0x00000000ffffff7f,
>>
>> Actually, it is signed, so this isn't true. Comments about unsafeness
>> still apply.
>
> It turns out that PAGE_SIZE is unsigned. So this gives us for
>
> (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
>
> the types UL, L, L, ULL resp.
>
> The associativity of & is left-to-right, so this in turn becomes
>
> UL, L, ULL
>
> UL, ULL
>
> ULL
>
> and that cast from UL to ULL doesn't sign-extend.
I went through and made a bunch of these flags signed so that they would
sign-extend properly in 32 vs 64 bit. We could make them
unconditionally 64-bit, but I'm worried that will have unforeseen
consequences too.
In this case, PAGE_MASK should probably be signed too, so the sign
extension happens properly.
Alternatively, they could be cast to pteval_t and/or phys_addr_t so that
they will have an inherent size which matches the pagetable format
(using _AT() rather than _AC()).
J
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 22:56 ` Jeremy Fitzhardinge
@ 2008-03-03 23:04 ` H. Peter Anvin
0 siblings, 0 replies; 27+ messages in thread
From: H. Peter Anvin @ 2008-03-03 23:04 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Segher Boessenkool, hans.rosenfeld, torvalds, linux-kernel,
Ingo Molnar, Arjan van de Ven, Thomas Gleixner
Jeremy Fitzhardinge wrote:
>
> I went through and made a bunch of these flags signed so that they would
> sign-extend properly in 32 vs 64 bit. We could make them
> unconditionally 64-bit, but I'm worried that will have unforeseen
> consequences too.
>
> In this case, PAGE_MASK should probably be signed too, so the sign
> extension happens properly.
>
Yes, it should.
> Alternatively, they could be cast to pteval_t and/or phys_addr_t so that
> they will have an inherent size which matches the pagetable format
> (using _AT() rather than _AC()).
Either that, or we could use a custom constructor macro (PTEVAL_C()).
This is arguably The Right Thing, but the sign-extending version is OK, too.
-hpa
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 22:00 ` Arjan van de Ven
@ 2008-03-04 1:05 ` Arjan van de Ven
2008-03-04 6:53 ` Ingo Molnar
0 siblings, 1 reply; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-04 1:05 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, hans.rosenfeld, linux-kernel, Thomas Gleixner,
H. Peter Anvin
Arjan van de Ven wrote:
> Linus Torvalds wrote:
>>
>> On Mon, 3 Mar 2008, Arjan van de Ven wrote:
>>> interesting observation: if I turn the macros into inlines... the
>>> difference
>>> goes away.
>>>
>>> I'm half tempted to just do it as inline period ... any objections ?
>>
>> Yes, I object. I want to understand why it would matter. If this is a
>> compiler bug, it's a really rather bad one. And if it's just some
>> stupid bug in our pmd_bad() macro, I still want to know what the
>> problem was.
>>
>> Can you compile both ways and look at what changed at the offending
>> site (which is apparently "follow_page()")?
>>
>> And do you have some odd compiler version?
>
> it's the F9 gcc, so yeah it's really new.
>
> I'm staring at the disassembly and it's quite different but
> follow_page() is rather large;
> trying to make a smaller testcase now
sadly a more isolated testcase doesn't show the same behavior yet;
so it's some fun interaction ;(
more staring at the assembly for me
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-04 1:05 ` Arjan van de Ven
@ 2008-03-04 6:53 ` Ingo Molnar
2008-03-05 15:35 ` Arjan van de Ven
0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-03-04 6:53 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Linus Torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner,
H. Peter Anvin
* Arjan van de Ven <arjan@linux.intel.com> wrote:
>> I'm staring at the disassembly and it's quite different but
>> follow_page() is rather large; trying to make a smaller testcase now
>
> sadly a more isolated testcase doesn't show the same behavior yet; so
> it's some fun interaction ;(
>
> more staring at the assembly for me
could you post the "bad" and the "good" disassembled functions, and the
specific gcc version string?
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 21:13 ` Segher Boessenkool
2008-03-03 21:22 ` Segher Boessenkool
@ 2008-03-04 6:58 ` Ingo Molnar
1 sibling, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2008-03-04 6:58 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Arjan van de Ven, hans.rosenfeld, torvalds, linux-kernel,
H. Peter Anvin, Thomas Gleixner
* Segher Boessenkool <segher@kernel.crashing.org> wrote:
> so (on 32-bit) ~_PAGE_PSE is ~0x80L is 0xffffff7f, which when cast to
> 64-bit is 0x00000000ffffff7f, so in
>
> (~PAGE_MASK & ~_PAGE_USER & ~_PAGE_PSE & ~_PAGE_NX)
>
> all the high bits are lost, while the original
>
> ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)
>
> works as intended, since the bit inversion is done on a 64-bit number.
but we really are interested in the low bits here (lets ignore the NX
bit for now) and the patch has been in the queue for a long time (more
than a month), so if there was a trivial mask mixup problem it would
have shown on the first day. So i suspect some gcc bug instead - and
certainly the colorful mixture of types and signs in this expression
might have surprised a new version of gcc somewhere.
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-04 6:53 ` Ingo Molnar
@ 2008-03-05 15:35 ` Arjan van de Ven
0 siblings, 0 replies; 27+ messages in thread
From: Arjan van de Ven @ 2008-03-05 15:35 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, hans.rosenfeld, linux-kernel, Thomas Gleixner,
H. Peter Anvin
Ingo Molnar wrote:
> * Arjan van de Ven <arjan@linux.intel.com> wrote:
>
>>> I'm staring at the disassembly and it's quite different but
>>> follow_page() is rather large; trying to make a smaller testcase now
>> sadly a more isolated testcase doesn't show the same behavior yet; so
>> it's some fun interaction ;(
>>
>> more staring at the assembly for me
>
> could you post the "bad" and the "good" disassembled functions, and the
> specific gcc version string?
>
http://www.fenrus.org/memory.asm.macro <-- failing one
http://www.fenrus.org/memory.asm.inline <-- working one
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-03 18:44 ` Linus Torvalds
2008-03-03 22:00 ` Arjan van de Ven
@ 2008-03-09 11:56 ` Ingo Molnar
2008-03-09 17:27 ` Linus Torvalds
1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-03-09 11:56 UTC (permalink / raw)
To: Linus Torvalds
Cc: Arjan van de Ven, hans.rosenfeld, linux-kernel, Thomas Gleixner,
H. Peter Anvin
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > I'm half tempted to just do it as inline period ... any objections ?
>
> Yes, I object. I want to understand why it would matter. If this is a
> compiler bug, it's a really rather bad one. And if it's just some
> stupid bug in our pmd_bad() macro, I still want to know what the
> problem was.
ok, i think i figured it out: on PAE 32-bit we dont properly sign-extend
to a 64-bit pmd value in the new pmd_bad() macro - so if any physical
RAM is above 4GB (Arjan's laptop had 4GB of RAM in it?) then we'll start
seeing those high bits. This definitely needs PAE and more than 4GB of
RAM to trigger.
The best fix is the one below (it should solve Arjan's regression with
that now-reverted patch redone), as it is the right thing to do [that
way sign auto-extend trickles over into PAGE_MASK as well].
It boots fine on a >4GB box of mine but changing the type of PAGE_SIZE
affects _everything_ so i'll keep testing it and i'd suggest to delay
this fix to shortly after -rc5 is released instead of risking -rc5 with
such a late commit. I'll send this and the re-done hugetlbfs fix
together early next week.
Ingo
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Index: linux/include/asm-x86/page.h
===================================================================
--- linux.orig/include/asm-x86/page.h
+++ linux/include/asm-x86/page.h
@@ -5,7 +5,7 @@
/* PAGE_SHIFT determines the page size */
#define PAGE_SHIFT 12
-#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
+#define PAGE_SIZE (_AC(1,L) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
#ifdef __KERNEL__
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-09 11:56 ` Ingo Molnar
@ 2008-03-09 17:27 ` Linus Torvalds
2008-03-09 18:57 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Linus Torvalds @ 2008-03-09 17:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: Arjan van de Ven, hans.rosenfeld, linux-kernel, Thomas Gleixner,
H. Peter Anvin
On Sun, 9 Mar 2008, Ingo Molnar wrote:
>
> The best fix is the one below (it should solve Arjan's regression with
> that now-reverted patch redone), as it is the right thing to do [that
> way sign auto-extend trickles over into PAGE_MASK as well].
This *really* makes me worry.
I do agree that there are good reasons that "PAGE_MASK" should be signed
(since we do want the top bit to extend), but changing "PAGE_SIZE" to
signed seems to be a rather big change, considering that it's used
*everywhere*.
In particular, it's quite possibly used for things like
offset = something % PAGE_SIZE;
etc, where a signed divide is positively wrong.
But even for PAGE_MASK, we literally have code like this:
if ((size_avail & PAGE_MASK) < rg.size) {
where it so _happens_ that "size_avail" is unsigned, but what if it
wasn't? It could turn a unsigned comparison into a signed one, and
introduce any number of security bugs etc.
Sam goes even more for PAGE_SIZE. At least there are only about a thousand
users of PAGE_MASK in the kernel, PAGE_SIZE is used about six times as
many times, and just a _trivial_ grep like
git grep 'PAGE_SIZE.* [<>][= ]'
finds a lot of cases where I'm not at all sure that it's safe to change
PAGE_SIZE to a signed value.
In other words, there are lots of things like
if (x < PAGE_SIZE)
...
where we currently get a unsigned comparison, and where for all I know a
signed PAGE_SIZE means that we should use
if (x >= 0 && x < PAGE_SIZE)
instead.
In short, I refuse to apply this patch after an -rc1 release. I suspect
that I shouldn't apply something like this even *before* an -rc1, because
I think it's just a really bad idea to make these types signed even if it
were to give you magically easier sign extensions to "unsigned long long".
So I would *very* strongly instead argue:
- "unsigned long" is the native kernel type for all address manipulation,
and thus "PAGE_SIZE" and "PAGE_MASK" should continue to have that type.
- anything that uses any other type without explicitly making sure it's
safe is mis-using those macros. IOW, PAGE_MASk was *never* a type that
had anything what-so-ever to do with page table entry bits, and this is
purely a page table entry issue!
So my suggested patch would:
- make the page table code use a specific mask that it builds up itself,
and makes sure it's of the right type and has the rigth value in
whatever type "struct pte_entry" is. The fact that "pte_val()" is
larger than "unsigned long" on x86-32 is very clearly a PTE issue,
*not* an issue for PAGE_SIZE or PAGE_MASK.
Btw, just one look at your other patch should have convinced you of that
anyway. Do you really think this is a readable patch or that the result is
clean:
+#define pmd_bad_v1(x) ((pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)) != _KERNPG_TABLE)
+#define pmd_bad_v2(x) ((pmd_val(x) \
& ~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)) \
!= _KERNPG_TABLE)
when the real problem is that the mask you build up here isn't safe o
pretty to begin with!
So make that whole "~(PAGE_MASK | _PAGE_USER | _PAGE_PSE | _PAGE_NX)"
expression a nice *clean* expression that is about page table entries
instead, and make *that* one be the right type and have the right bits.
And suddenly the problem just goes away.
In fact, if you look at that expression, you suddenly realize that
PAGE_MASK was *totally* the wrong value to use in the first place, whether
sign-extended o not! Notice how
PAGE_MASK | _PAGE_NX
is already a totally senseless operation if PAGE_MASK has all high bits
set!
So I think your whole argument and the patch is UTTER AND UNBELIEVABLE
CRAP!
Blaming it on PAGE_MASK was totally incorrect. It has nothing to do with
PAGE_MASK, and everything to do with the fact that the page table checking
patch was utterly failed and pure shit.
Linus
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-09 17:27 ` Linus Torvalds
@ 2008-03-09 18:57 ` Ingo Molnar
2008-03-10 2:45 ` Jeremy Fitzhardinge
2008-03-10 4:35 ` Paul Mackerras
2 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2008-03-09 18:57 UTC (permalink / raw)
To: Linus Torvalds
Cc: Arjan van de Ven, hans.rosenfeld, linux-kernel, Thomas Gleixner,
H. Peter Anvin
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> So I would *very* strongly instead argue:
>
> - "unsigned long" is the native kernel type for all address manipulation,
> and thus "PAGE_SIZE" and "PAGE_MASK" should continue to have that type.
>
> - anything that uses any other type without explicitly making sure it's
> safe is mis-using those macros. IOW, PAGE_MASk was *never* a type that
> had anything what-so-ever to do with page table entry bits, and this is
> purely a page table entry issue!
>
> So my suggested patch would:
>
> - make the page table code use a specific mask that it builds up itself,
> and makes sure it's of the right type and has the rigth value in
> whatever type "struct pte_entry" is. The fact that "pte_val()" is
> larger than "unsigned long" on x86-32 is very clearly a PTE issue,
> *not* an issue for PAGE_SIZE or PAGE_MASK.
yeah, indeed my patch was sloppy, i didnt think it through - i fell for
the lure of the easy-looking 'PAGE_SIZE is small, sign-extend it' hack.
Will do it cleanly and will also clean up all the pte/address/pgprot
type mixing that currently goes on in this maze of macros.
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-09 17:27 ` Linus Torvalds
2008-03-09 18:57 ` Ingo Molnar
@ 2008-03-10 2:45 ` Jeremy Fitzhardinge
2008-03-10 4:35 ` Paul Mackerras
2 siblings, 0 replies; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-10 2:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Arjan van de Ven, hans.rosenfeld, linux-kernel,
Thomas Gleixner, H. Peter Anvin, Jan Beulich
Linus Torvalds wrote:
> So I would *very* strongly instead argue:
>
> - "unsigned long" is the native kernel type for all address manipulation,
> and thus "PAGE_SIZE" and "PAGE_MASK" should continue to have that type.
>
> - anything that uses any other type without explicitly making sure it's
> safe is mis-using those macros. IOW, PAGE_MASk was *never* a type that
> had anything what-so-ever to do with page table entry bits, and this is
> purely a page table entry issue!
>
Yes. PTE_MASK already exists for masking the address bits out of a
pte. It should have the type pteval_t, which will deal with the PAE
case cleanly. It should also only have the middle pfn bits set, so that
_PAGE_NX is also not included.
It's currently defined in terms of PHYSICAL_PAGE_MASK, which 1) is only
used to define PTE_MASK, and 2) defined (wrongly) in terms of PAGE_MASK
(though __PHYSICAL_MASK is correctly defined).
J
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: bisected boot regression post 2.6.25-rc3.. please revert
2008-03-09 17:27 ` Linus Torvalds
2008-03-09 18:57 ` Ingo Molnar
2008-03-10 2:45 ` Jeremy Fitzhardinge
@ 2008-03-10 4:35 ` Paul Mackerras
2 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2008-03-10 4:35 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Arjan van de Ven, hans.rosenfeld, linux-kernel,
Thomas Gleixner, H. Peter Anvin
Linus Torvalds writes:
> But even for PAGE_MASK, we literally have code like this:
>
> if ((size_avail & PAGE_MASK) < rg.size) {
>
> where it so _happens_ that "size_avail" is unsigned, but what if it
> wasn't? It could turn a unsigned comparison into a signed one, and
> introduce any number of security bugs etc.
We have had PAGE_MASK being signed on powerpc for ages, so if you do
find any such bugs, please let me know. :) I'm not aware of any at
present, though.
The expression you quoted will be ok as long as either size_avail or
rg.size is unsigned, as far as I can see.
Our PAGE_SIZE is unsigned long.
Paul.
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2008-03-10 4:35 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-01 18:56 bisected boot regression post 2.6.25-rc3.. please revert Arjan van de Ven
2008-03-03 7:46 ` Ingo Molnar
2008-03-03 9:13 ` Ingo Molnar
2008-03-03 16:41 ` Arjan van de Ven
2008-03-03 17:40 ` Ingo Molnar
2008-03-03 17:51 ` Nish Aravamudan
2008-03-03 17:55 ` Ingo Molnar
2008-03-03 17:58 ` H. Peter Anvin
2008-03-03 18:36 ` Arjan van de Ven
2008-03-03 18:44 ` Linus Torvalds
2008-03-03 22:00 ` Arjan van de Ven
2008-03-04 1:05 ` Arjan van de Ven
2008-03-04 6:53 ` Ingo Molnar
2008-03-05 15:35 ` Arjan van de Ven
2008-03-09 11:56 ` Ingo Molnar
2008-03-09 17:27 ` Linus Torvalds
2008-03-09 18:57 ` Ingo Molnar
2008-03-10 2:45 ` Jeremy Fitzhardinge
2008-03-10 4:35 ` Paul Mackerras
2008-03-03 21:13 ` Segher Boessenkool
2008-03-03 21:22 ` Segher Boessenkool
2008-03-03 22:33 ` Segher Boessenkool
2008-03-03 22:55 ` H. Peter Anvin
2008-03-03 22:56 ` Jeremy Fitzhardinge
2008-03-03 23:04 ` H. Peter Anvin
2008-03-04 6:58 ` Ingo Molnar
2008-03-03 17:15 ` Nish Aravamudan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).