LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* preempt bug in set_pmd_pfn?
@ 2008-03-04 21:13 Jeremy Fitzhardinge
  2008-03-04 21:28 ` Ingo Molnar
  2008-03-05  0:06 ` Andi Kleen
  0 siblings, 2 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-04 21:13 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Andi Kleen; +Cc: Linux Kernel Mailing List

I think set_pmd_pfn, which is only called by __set_fixmap, might have a 
preempt bug in it.

It can be executed with preemption enabled, but what if it gets preempted

	set_pmd(pmd, pfn_pmd(pfn, flags));
	/*
	 * It's enough to flush this one mapping.
	 * (PGE mappings get flushed as well)
	 */
>here<
	__flush_tlb_one(vaddr);
}

?


Won't this leave a stale tlb on the old processor?

I noticed this because the Xen tlb flushing code effectively has a 
smp_processor_id(), which provokes a warning when preemption is 
enabled.  It seems to me that it never makes sense to be doing a tlb 
flush unless you know which processor you're actually running on...

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-04 21:28 ` Ingo Molnar
@ 2008-03-04 21:27   ` Jeremy Fitzhardinge
  2008-03-05  6:48     ` Ingo Molnar
  0 siblings, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-04 21:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: H. Peter Anvin, Andi Kleen, Linux Kernel Mailing List

Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>> I think set_pmd_pfn, which is only called by __set_fixmap, might have 
>> a preempt bug in it.
>>     
>
> yes, and we had similar preemption bugs in the past. I guess most places 
> are either infrequent or have some natural atomicity anyway. Wanna send 
> a patch?
>   

Sure.  Should it just disable preemption, or take a lock?  It calls 
set_pte_at without holding any pte locks; that seems to be relatively 
common.  Is it OK when you're operating on init_mm?

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-04 21:13 preempt bug in set_pmd_pfn? Jeremy Fitzhardinge
@ 2008-03-04 21:28 ` Ingo Molnar
  2008-03-04 21:27   ` Jeremy Fitzhardinge
  2008-03-05  0:06 ` Andi Kleen
  1 sibling, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2008-03-04 21:28 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: H. Peter Anvin, Andi Kleen, Linux Kernel Mailing List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> I think set_pmd_pfn, which is only called by __set_fixmap, might have 
> a preempt bug in it.

yes, and we had similar preemption bugs in the past. I guess most places 
are either infrequent or have some natural atomicity anyway. Wanna send 
a patch?

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-04 21:13 preempt bug in set_pmd_pfn? Jeremy Fitzhardinge
  2008-03-04 21:28 ` Ingo Molnar
@ 2008-03-05  0:06 ` Andi Kleen
  2008-03-05  0:07   ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2008-03-05  0:06 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List


> Won't this leave a stale tlb on the old processor?

__set_fixmap should be only used in early boot, so it's always
on CPU 0

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05  0:06 ` Andi Kleen
@ 2008-03-05  0:07   ` Jeremy Fitzhardinge
  2008-03-05  0:16     ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-05  0:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List

Andi Kleen wrote:
>> Won't this leave a stale tlb on the old processor?
>>     
>
> __set_fixmap should be only used in early boot, so it's always
> on CPU 0

vdso32-setup.c:map_compat_vdso() uses it to create the compat vdso 
mapping, which typically happens on the first execve(), and could happen 
if you turn compat vdso off and on during runtime.  (It follows the call 
with flush_tlb_all(), so there's no risk of stray tlb entries in this case.)

But that answers my broader question about how __set_fixmap can get away 
with just flushing on the current cpu; I'll add a preempt disable 
bracket to keep everyone happy.

Any opinions on set_pte_at(&init_mm, ...) without holding any locks?  
All ok?

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05  0:07   ` Jeremy Fitzhardinge
@ 2008-03-05  0:16     ` Andi Kleen
  2008-03-05  0:19       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2008-03-05  0:16 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List

On Wednesday 05 March 2008 01:07:11 Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> >> Won't this leave a stale tlb on the old processor?
> >>     
> >
> > __set_fixmap should be only used in early boot, so it's always
> > on CPU 0
> 
> vdso32-setup.c:map_compat_vdso() uses it to create the compat vdso 
> mapping, which typically happens on the first execve(),

First execve for 32bit binaries?

Anyways __set_fixmap is __init and at the first  execve (unless you have initramfs) 
init.text should be already freed.

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05  0:16     ` Andi Kleen
@ 2008-03-05  0:19       ` Jeremy Fitzhardinge
  2008-03-05  1:28         ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-05  0:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List

Andi Kleen wrote:
> On Wednesday 05 March 2008 01:07:11 Jeremy Fitzhardinge wrote:
>   
>> Andi Kleen wrote:
>>     
>>>> Won't this leave a stale tlb on the old processor?
>>>>     
>>>>         
>>> __set_fixmap should be only used in early boot, so it's always
>>> on CPU 0
>>>       
>> vdso32-setup.c:map_compat_vdso() uses it to create the compat vdso 
>> mapping, which typically happens on the first execve(),
>>     
>
> First execve for 32bit binaries?
>   

Yes, 32-bit kernel.

> Anyways __set_fixmap is __init and at the first  execve (unless you have initramfs) 
> init.text should be already freed.

No, its not __init.  If it were I don't think there would have been much 
booting going on for the last few months.

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05  0:19       ` Jeremy Fitzhardinge
@ 2008-03-05  1:28         ` Andi Kleen
  0 siblings, 0 replies; 17+ messages in thread
From: Andi Kleen @ 2008-03-05  1:28 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, H. Peter Anvin, Linux Kernel Mailing List

On Wednesday 05 March 2008 01:19:36 Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > On Wednesday 05 March 2008 01:07:11 Jeremy Fitzhardinge wrote:
> >   
> >> Andi Kleen wrote:
> >>     
> >>>> Won't this leave a stale tlb on the old processor?
> >>>>     
> >>>>         
> >>> __set_fixmap should be only used in early boot, so it's always
> >>> on CPU 0
> >>>       
> >> vdso32-setup.c:map_compat_vdso() uses it to create the compat vdso 
> >> mapping, which typically happens on the first execve(),
> >>     
> >
> > First execve for 32bit binaries?
> >   
> 
> Yes, 32-bit kernel.

Ok I'm talking about 64bit. If you ask me something about 32bit please
always mention it especially.

> > Anyways __set_fixmap is __init and at the first  execve (unless you have initramfs) 
> > init.text should be already freed.
> 
> No, its not __init.  If it were I don't think there would have been much 
> booting going on for the last few months.

For 64bit my original statement was correct.

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-04 21:27   ` Jeremy Fitzhardinge
@ 2008-03-05  6:48     ` Ingo Molnar
  2008-03-05 14:29       ` Hugh Dickins
  2008-03-05 16:45       ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 17+ messages in thread
From: Ingo Molnar @ 2008-03-05  6:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: H. Peter Anvin, Andi Kleen, Linux Kernel Mailing List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>
>>   
>>> I think set_pmd_pfn, which is only called by __set_fixmap, might have a 
>>> preempt bug in it.
>>>     
>>
>> yes, and we had similar preemption bugs in the past. I guess most places 
>> are either infrequent or have some natural atomicity anyway. Wanna send a 
>> patch?
>
> Sure.  Should it just disable preemption, or take a lock?  It calls 
> set_pte_at without holding any pte locks; that seems to be relatively 
> common.  Is it OK when you're operating on init_mm?

no, it's not OK to modify the kernel pagetable without locking - taking 
the pgd_lock should do the trick. Could you send the stacktrace that 
shows the place that is preemptible?

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05  6:48     ` Ingo Molnar
@ 2008-03-05 14:29       ` Hugh Dickins
  2008-03-05 16:48         ` Jeremy Fitzhardinge
  2008-03-05 16:45       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 17+ messages in thread
From: Hugh Dickins @ 2008-03-05 14:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, H. Peter Anvin, Andi Kleen,
	Linux Kernel Mailing List

On Wed, 5 Mar 2008, Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> > Ingo Molnar wrote:
> >> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> >>   
> >>> I think set_pmd_pfn, which is only called by __set_fixmap, might have a 
> >>> preempt bug in it.
> >>>     
> >>
> >> yes, and we had similar preemption bugs in the past. I guess most places 
> >> are either infrequent or have some natural atomicity anyway. Wanna send a 
> >> patch?
> >
> > Sure.  Should it just disable preemption, or take a lock?  It calls 
> > set_pte_at without holding any pte locks; that seems to be relatively 
> > common.  Is it OK when you're operating on init_mm?
> 
> no, it's not OK to modify the kernel pagetable without locking - taking 
> the pgd_lock should do the trick. Could you send the stacktrace that 
> shows the place that is preemptible?

Please, Ingo, could you give an example of where such additional locking
is actually necessary?

I ask because I went over those places when splitting the page_table_lock
for userspace in 2.6.15.  Some things took init_mm.page_table_lock and
some things didn't, and I concluded that actually none of them needed it.

With the userspace pagetables, we need to guard against racing threads
and vmscan/rmap.  But with the kernel pagetables, we'd already be in
serious trouble if two cpus could be modifying the same pte at the
same time - there needs to be other serialization already e.g. vmalloc
has its own locking for parcelling out areas to different uses, so down
at the page table level there should be no conflict.

Allocation of new page tables, yes, that needs locking, and does use
the page_table_lock for kernel space just as for user space.

That was all two years ago, I may have been wrong then, or a lot may
have changed since.  But I've heard of a grand total of 0 problems
from not having such locking.

And on the original topic of flush TLB without preemption disabled:
again I'm not sure there's a bug there, but it's less clear.  Aren't
some of those __flush_tlb_ones just unnecessary, we're simply filling
a previously empty slot?  And if there's a guarantee that preemption
will itself involve a TLB flush (maybe there's no such guarantee for
these kernel entries, it's quite a different case from the userspace
one, and you'll be worrying about the global bit), if, then it'd be
okay to __flush_tlb_one without disabling preemption.

Hugh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05  6:48     ` Ingo Molnar
  2008-03-05 14:29       ` Hugh Dickins
@ 2008-03-05 16:45       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-05 16:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: H. Peter Anvin, Andi Kleen, Linux Kernel Mailing List

Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>> Ingo Molnar wrote:
>>     
>>> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>>
>>>   
>>>       
>>>> I think set_pmd_pfn, which is only called by __set_fixmap, might have a 
>>>> preempt bug in it.
>>>>     
>>>>         
>>> yes, and we had similar preemption bugs in the past. I guess most places 
>>> are either infrequent or have some natural atomicity anyway. Wanna send a 
>>> patch?
>>>       
>> Sure.  Should it just disable preemption, or take a lock?  It calls 
>> set_pte_at without holding any pte locks; that seems to be relatively 
>> common.  Is it OK when you're operating on init_mm?
>>     
>
> no, it's not OK to modify the kernel pagetable without locking - taking 
> the pgd_lock should do the trick. Could you send the stacktrace that 
> shows the place that is preemptible?
So far I've noticed two places:

1. __set_fixmap to set up the vdso compat mapping (set_pte_at and tlb 
flush):

BUG: using smp_processor_id() in preemptible [00000000] code: init/1
caller is paravirt_get_lazy_mode+0xe/0x1b
Pid: 1, comm: init Not tainted 2.6.25-rc3-x86-latest.git #196
 [<c022be65>] debug_smp_processor_id+0x99/0xb0
 [<c011aa88>] paravirt_get_lazy_mode+0xe/0x1b
 [<c0105092>] xen_set_pte_at+0x2e/0xc0
 [<c011d322>] __set_fixmap+0x14a/0x176
 [<c011e4bb>] arch_setup_additional_pages+0x83/0x11d
 [<c01934a3>] load_elf_binary+0xad8/0x113a
 [<c016d410>] ? vfs_read+0xef/0x106
 [<c01700dd>] search_binary_handler+0xb8/0x19f
 [<c01929cb>] ? load_elf_binary+0x0/0x113a
 [<c01703ab>] ? prepare_binprm+0xc3/0xcb
 [<c0192375>] load_script+0x179/0x18c
 [<c0159caf>] ? get_user_pages+0x31d/0x397
 [<c016fe47>] ? get_arg_page+0x2d/0x80
 [<c01700dd>] search_binary_handler+0xb8/0x19f
 [<c01921fc>] ? load_script+0x0/0x18c
 [<c0171302>] do_execve+0x121/0x16a
 [<c01067d9>] sys_execve+0x29/0x52
 [<c0108286>] syscall_call+0x7/0xb
 [<c017007b>] ? search_binary_handler+0x56/0x19f
 [<c010af2f>] ? kernel_execve+0x17/0x1c
 [<c010217f>] ? _stext+0x17/0x19
 [<c01021d6>] ? init_post+0x55/0xbb
 [<c01035e7>] ? xen_irq_disable+0x21/0x23
 [<c010828f>] ? syscall_exit+0x5/0x1d
 [<c0108ee7>] ? kernel_thread_helper+0x7/0x10
 =======================
BUG: using smp_processor_id() in preemptible [00000000] code: init/1
caller is xen_flush_tlb_single+0x11/0x89
Pid: 1, comm: init Not tainted 2.6.25-rc3-x86-latest.git #196
 [<c022be65>] debug_smp_processor_id+0x99/0xb0
 [<c0103c0b>] xen_flush_tlb_single+0x11/0x89
 [<c011d33f>] __set_fixmap+0x167/0x176
 [<c011e4bb>] arch_setup_additional_pages+0x83/0x11d
 [<c01934a3>] load_elf_binary+0xad8/0x113a
 [<c016d410>] ? vfs_read+0xef/0x106
 [<c01700dd>] search_binary_handler+0xb8/0x19f
 [<c01929cb>] ? load_elf_binary+0x0/0x113a
 [<c01703ab>] ? prepare_binprm+0xc3/0xcb
 [<c0192375>] load_script+0x179/0x18c
 [<c0159caf>] ? get_user_pages+0x31d/0x397
 [<c016fe47>] ? get_arg_page+0x2d/0x80
 [<c01700dd>] search_binary_handler+0xb8/0x19f
 [<c01921fc>] ? load_script+0x0/0x18c
 [<c0171302>] do_execve+0x121/0x16a
 [<c01067d9>] sys_execve+0x29/0x52
 [<c0108286>] syscall_call+0x7/0xb
 [<c017007b>] ? search_binary_handler+0x56/0x19f
 [<c010af2f>] ? kernel_execve+0x17/0x1c
 [<c010217f>] ? _stext+0x17/0x19
 [<c01021d6>] ? init_post+0x55/0xbb
 [<c01035e7>] ? xen_irq_disable+0x21/0x23
 [<c010828f>] ? syscall_exit+0x5/0x1d
 [<c0108ee7>] ? kernel_thread_helper+0x7/0x10
 =======================


2. and vmalloc:

BUG: using smp_processor_id() in preemptible [00000000] code: multipath.stati/1981
caller is paravirt_get_lazy_mode+0xe/0x1b
Pid: 1981, comm: multipath.stati Not tainted 2.6.25-rc3-x86-latest.git #196
 [<c022be65>] debug_smp_processor_id+0x99/0xb0
 [<c011aa88>] paravirt_get_lazy_mode+0xe/0x1b
 [<c0105092>] xen_set_pte_at+0x2e/0xc0
 [<c015f736>] map_vm_area+0x1fa/0x255
 [<c015fc83>] __vmalloc_area_node+0xdb/0xfa
 [<c015fceb>] __vmalloc_node+0x49/0x58
 [<c015fd26>] __vmalloc+0x10/0x12
 [<c015fdca>] vmalloc+0x19/0x1b
 [<c038d3b4>] dm_ctl_ioctl+0x155/0x248
 [<c038c56b>] ? list_versions+0x0/0x79
 [<c0103c00>] ? xen_flush_tlb_single+0x6/0x89
 [<c038d25f>] ? dm_ctl_ioctl+0x0/0x248
 [<c01767be>] vfs_ioctl+0x22/0x67
 [<c0176a54>] do_vfs_ioctl+0x251/0x268
 [<c015b45f>] ? remove_vma+0x34/0x3a
 [<c015bdc8>] ? do_munmap+0x17d/0x197
 [<c0176a97>] sys_ioctl+0x2c/0x45
 [<c0108286>] syscall_call+0x7/0xb
 =======================

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05 14:29       ` Hugh Dickins
@ 2008-03-05 16:48         ` Jeremy Fitzhardinge
  2008-03-05 17:38           ` Hugh Dickins
  0 siblings, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-05 16:48 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Linux Kernel Mailing List

Hugh Dickins wrote:
> Please, Ingo, could you give an example of where such additional locking
> is actually necessary?
>
> I ask because I went over those places when splitting the page_table_lock
> for userspace in 2.6.15.  Some things took init_mm.page_table_lock and
> some things didn't, and I concluded that actually none of them needed it.
>
> With the userspace pagetables, we need to guard against racing threads
> and vmscan/rmap.  But with the kernel pagetables, we'd already be in
> serious trouble if two cpus could be modifying the same pte at the
> same time - there needs to be other serialization already e.g. vmalloc
> has its own locking for parcelling out areas to different uses, so down
> at the page table level there should be no conflict.
>
> Allocation of new page tables, yes, that needs locking, and does use
> the page_table_lock for kernel space just as for user space.
>
> That was all two years ago, I may have been wrong then, or a lot may
> have changed since.  But I've heard of a grand total of 0 problems
> from not having such locking.
>   

It's not obvious to me what problems might arise, but the Xen set_pte 
operations do currently rely on preemption being inhibited.  At worst I 
can add preempt_disable/enable calls to them.

> And on the original topic of flush TLB without preemption disabled:
> again I'm not sure there's a bug there, but it's less clear.  Aren't
> some of those __flush_tlb_ones just unnecessary, we're simply filling
> a previously empty slot?  And if there's a guarantee that preemption
> will itself involve a TLB flush (maybe there's no such guarantee for
> these kernel entries, it's quite a different case from the userspace
> one, and you'll be worrying about the global bit), if, then it'd be
> okay to __flush_tlb_one without disabling preemption.

If a thread goes from processor A -> B -> A, where A is first preempted 
between a pagetable update and a tlb flush, then the second time the 
thread runs on A may run with a stale tlb (if in the meantime A has 
either been idle or only running kernel threads).

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05 16:48         ` Jeremy Fitzhardinge
@ 2008-03-05 17:38           ` Hugh Dickins
  2008-03-05 19:18             ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 17+ messages in thread
From: Hugh Dickins @ 2008-03-05 17:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Linux Kernel Mailing List

On Wed, 5 Mar 2008, Jeremy Fitzhardinge wrote:
> 
> If a thread goes from processor A -> B -> A, where A is first preempted
> between a pagetable update and a tlb flush, then the second time the thread
> runs on A may run with a stale tlb (if in the meantime A has either been idle
> or only running kernel threads).

Right, thanks, because __flush_tlb_one opts out of the full active_mm
checking which goes on for userspace mms (which covers, for example,
the case of preemption before dup_mmap's flush_tlb_mm).

But is there actually a case where there's a problem?  So far as I can see,
set_pmd_pfn is there solely for discontig_32's __init remap_numa_kva; and
set_pte_pfn is there solely for set_fixmap, which operates on a per-cpu
area of pagetable, which would already be in bigger trouble if preemption
to another cpu were possible.

Hugh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05 17:38           ` Hugh Dickins
@ 2008-03-05 19:18             ` Jeremy Fitzhardinge
  2008-03-05 20:40               ` Hugh Dickins
  2008-03-06 12:52               ` Ingo Molnar
  0 siblings, 2 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-05 19:18 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Linux Kernel Mailing List

Hugh Dickins wrote:
> On Wed, 5 Mar 2008, Jeremy Fitzhardinge wrote:
>   
>> If a thread goes from processor A -> B -> A, where A is first preempted
>> between a pagetable update and a tlb flush, then the second time the thread
>> runs on A may run with a stale tlb (if in the meantime A has either been idle
>> or only running kernel threads).
>>     
>
> Right, thanks, because __flush_tlb_one opts out of the full active_mm
> checking which goes on for userspace mms (which covers, for example,
> the case of preemption before dup_mmap's flush_tlb_mm).
>
> But is there actually a case where there's a problem?  So far as I can see,
> set_pmd_pfn is there solely for discontig_32's __init remap_numa_kva; and
> set_pte_pfn is there solely for set_fixmap, which operates on a per-cpu
> area of pagetable, which would already be in bigger trouble if preemption
> to another cpu were possible.

Yes, I meant set_pte_pfn; set_pmd_pfn is a typo.

Fixmap slots are global, not percpu; you may be thinking of kmap_atomic, 
which reserves percpu fixmap slots for its use.  Most uses of set_fixmap 
are early in boot, where preemption (or other CPUs) isn't a factor.  The 
exception is mapping the compat vdso global mapping.   However, that is 
special-cased anyway, since the set_fixmap is followed by an explicit 
all-cpu tlb flush.

It seems to me that the correct fix is to just make __set_fixmap disable 
preemption for its duration; it probably doesn't make much difference 
for the native case, and it makes Xen happy.

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05 19:18             ` Jeremy Fitzhardinge
@ 2008-03-05 20:40               ` Hugh Dickins
  2008-03-06 12:52               ` Ingo Molnar
  1 sibling, 0 replies; 17+ messages in thread
From: Hugh Dickins @ 2008-03-05 20:40 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Linux Kernel Mailing List

On Wed, 5 Mar 2008, Jeremy Fitzhardinge wrote:
> 
> Fixmap slots are global, not percpu; you may be thinking of kmap_atomic, which
> reserves percpu fixmap slots for its use.

Ah yes, of course you're right, I was misremembering: thanks.

> Most uses of set_fixmap are early
> in boot, where preemption (or other CPUs) isn't a factor.  The exception is
> mapping the compat vdso global mapping.   However, that is special-cased
> anyway, since the set_fixmap is followed by an explicit all-cpu tlb flush.

Mmm, yes, I hadn't noticed that one: rather a weird case.

(If root had a mind to do so, she could be flipping compat_vdso
back and forth on all cpus concurrently; but I don't see an actual
problem of getting a worse outcome than she deserves for doing so -
the pages involved never get freed.)

> It seems to me that the correct fix is to just make __set_fixmap disable
> preemption for its duration; it probably doesn't make much difference for the
> native case, and it makes Xen happy.

I'd have thought it better done just in that one peculiar case,
with a comment that it's to keep Xen happy; but I'd better back
out now and leave it to you and Ingo.

Hugh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-05 19:18             ` Jeremy Fitzhardinge
  2008-03-05 20:40               ` Hugh Dickins
@ 2008-03-06 12:52               ` Ingo Molnar
  2008-03-06 18:19                 ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2008-03-06 12:52 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Hugh Dickins, H. Peter Anvin, Andi Kleen, Linux Kernel Mailing List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> It seems to me that the correct fix is to just make __set_fixmap 
> disable preemption for its duration; it probably doesn't make much 
> difference for the native case, and it makes Xen happy.

actually, i think the correct approach is to remove the TLB flushing and 
perhaps to check that the old pte is not present. Do we ever _change_ 
mappings via __set_fixmap()? I think we only ever install them.

but if we ever change them somewhere then the correct approach is to do 
a flush_tlb_all(). It's not just about preemption but about the fact 
that we modified the kernel address space and we must propagate that to 
all CPUs.

the vmalloc() backtrace you sent - how did set_pte_pfn() get into that 
codepath - vmalloc shouldnt be using __set_fixmap().

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: preempt bug in set_pmd_pfn?
  2008-03-06 12:52               ` Ingo Molnar
@ 2008-03-06 18:19                 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-06 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hugh Dickins, H. Peter Anvin, Andi Kleen, Linux Kernel Mailing List

Ingo Molnar wrote:
> actually, i think the correct approach is to remove the TLB flushing and 
> perhaps to check that the old pte is not present. Do we ever _change_ 
> mappings via __set_fixmap()? I think we only ever install them.
>   

Yes, I think that's the case.  clear_fixmap() exists for clearing out an 
existing mapping, but its only used to clear out the WP test mapping and 
in early_iounmap (if called late).  I couldn't see any instances of 
replacing a mapping.

> but if we ever change them somewhere then the correct approach is to do 
> a flush_tlb_all(). It's not just about preemption but about the fact 
> that we modified the kernel address space and we must propagate that to 
> all CPUs.
>   

Yes, I was wondering about that.  If __set_fixmap is only used at boot 
time, then a global flush isn't necessary, but if its deemed a 
general-purpose API in a normal running kernel, it needs to deal with 
cross-cpu flushes.

64-bit set_fixmap is __init only, and I'd be OK with that.  The only 
non-__init use in the 32-bit kernel is the compat vdso mapping, and that 
could easily be done by other means (though it would effectively become 
an opencoded set_fixmap, so perhaps that's not a good idea...).

> the vmalloc() backtrace you sent - how did set_pte_pfn() get into that 
> codepath - vmalloc shouldnt be using __set_fixmap().
>   

No, that's set_pte_at(), which is the real issue in both cases.  
__set_fixmap calls both set_pte_at and flush_tlb_one, which is why it 
gets two backtrackes.

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-03-06 18:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-04 21:13 preempt bug in set_pmd_pfn? Jeremy Fitzhardinge
2008-03-04 21:28 ` Ingo Molnar
2008-03-04 21:27   ` Jeremy Fitzhardinge
2008-03-05  6:48     ` Ingo Molnar
2008-03-05 14:29       ` Hugh Dickins
2008-03-05 16:48         ` Jeremy Fitzhardinge
2008-03-05 17:38           ` Hugh Dickins
2008-03-05 19:18             ` Jeremy Fitzhardinge
2008-03-05 20:40               ` Hugh Dickins
2008-03-06 12:52               ` Ingo Molnar
2008-03-06 18:19                 ` Jeremy Fitzhardinge
2008-03-05 16:45       ` Jeremy Fitzhardinge
2008-03-05  0:06 ` Andi Kleen
2008-03-05  0:07   ` Jeremy Fitzhardinge
2008-03-05  0:16     ` Andi Kleen
2008-03-05  0:19       ` Jeremy Fitzhardinge
2008-03-05  1:28         ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).