LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: proxy_pda was Re: What was in the x86 merge for .20
@ 2007-01-15 20:41 Paweł Sikora
0 siblings, 0 replies; 6+ messages in thread
From: Paweł Sikora @ 2007-01-15 20:41 UTC (permalink / raw)
To: linux-kernel
Hi,
I've reviewed the thread and can propose a solution.
Let's see e.g. the dev.s ( from fuse.ko ). Currently with gcc-4.2 we get:
fuse_req_init_context:
movl $_proxy_pda+8, %edx #, tmp62
#APP
movl %gs:8,%ecx #, ret__
#NO_APP
movl 344(%ecx), %ecx # <variable>.fsuid, <variable>.fsuid
movl %ecx, 60(%eax) # <variable>.fsuid, <variable>.in.h.uid
#APP
movl %gs:8,%ecx #, ret__
#NO_APP
movl 360(%ecx), %ecx # <variable>.fsgid, <variable>.fsgid
movl %ecx, 64(%eax) # <variable>.fsgid, <variable>.in.h.gid
#APP
movl %gs:8,%edx #, ret__
#NO_APP
movl 164(%edx), %edx # <variable>.pid, <variable>.pid
movl %edx, 68(%eax) # <variable>.pid, <variable>.in.h.pid
ret
In this scenario gcc is explictly blocked by -fno-strict-aliasing
and massive %gs:8 reloads are present. If you fix aliasing violations
in kernel then you could use -fstrict-aliasing to get what you want.
fuse_req_init_context:
#APP
movl %gs:8,%ecx #, ret__
#NO_APP
movl 344(%ecx), %edx # <variable>.fsuid, <variable>.fsuid
movl %edx, 60(%eax) # <variable>.fsuid, <variable>.in.h.uid
movl 360(%ecx), %edx # <variable>.fsgid, <variable>.fsgid
movl %edx, 64(%eax) # <variable>.fsgid, <variable>.in.h.gid
movl 164(%ecx), %edx # <variable>.pid, <variable>.pid
movl %edx, 68(%eax) # <variable>.pid, <variable>.in.h.pid
ret
BR,
Pawel.
^ permalink raw reply [flat|nested] 6+ messages in thread
* What was in the x86 merge for .20
@ 2006-12-08 3:01 Andi Kleen
2006-12-08 12:04 ` Arkadiusz Miskiewicz
0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2006-12-08 3:01 UTC (permalink / raw)
To: linux-kernel, discuss
[The merge already made it to Linus' tree. Sorry for sending this message
late]
Most of this is for both i386 and x86-64, unless when noted
These are just some high lights. As usual there are more
smaller optimizations, cleanups etc
- paravirt support for i386: the basic hooks for replacing all
non virtualizable instructions on x86 are in. This currently
only runs on native hardware, but will allow to link in
modules for paravirtualized Xen/Vmware/lhype.
There are some limitations like no SMP support yet.
- Support for a Processor Data Area (PDA) on i386. This makes
the code more similar to x86-64 and will allow some other
optimizations in the future.
- Relocatable kernel support for i386. This allows to load
a single kernel binary on multiple addresses. This is useful
to use kdump kernels without having to maintain separate
binaries.
- Sleazy FPU feature also supported now on i386 -- this will
give a small improvement to FPU intensive programs because
they have to do less lazy FPU exceptions.
- When a spinlock lockup occurs print backtraces of all CPUs. This
makes debugging deadlocks easier
- x86-64 now also spins on spinlocks with interrupts enabled when
possible.
- Various dwarf2 unwinder improvements.
In particular better debugging support for figuring out what's wrong
and the unwinder should be less likely to crash now when it finds
invalid unwinding data.
- Use more efficient cache flushing when cache attributes are changed
- Allow compiling kernel for core2. To be really useful this
will require gcc support to compile for core2 which isn't ready yet.
- Various fixes to the MTRR code
- Some preparatory infrastructure for perfmon
- Improve TSC setup heuristics on Core2 and AMD K8
- Don't try to synchronize TSCs on boot anymore. Instead just
checks if they are synchronized or not and disable TSC use
when unsynchronized.
- More fixes to the idle notifier
- Various other bug fixes and cleanups
-Andi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What was in the x86 merge for .20
2006-12-08 3:01 Andi Kleen
@ 2006-12-08 12:04 ` Arkadiusz Miskiewicz
2006-12-08 20:35 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 6+ messages in thread
From: Arkadiusz Miskiewicz @ 2006-12-08 12:04 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
On Friday 08 December 2006 04:01, Andi Kleen wrote:
> - Support for a Processor Data Area (PDA) on i386. This makes
> the code more similar to x86-64 and will allow some other
> optimizations in the future.
LD .tmp_vmlinux1
arch/i386/kernel/built-in.o: In function `math_emulate':
(.text+0x3809): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o: In function `smp_apic_timer_interrupt':
(.text+0xe140): undefined reference to `_proxy_pda'
kernel/built-in.o: In function `sys_set_tid_address':
(.text+0x370b): undefined reference to `_proxy_pda'
kernel/built-in.o: In function `switch_uid':
(.text+0xcc6c): undefined reference to `_proxy_pda'
mm/built-in.o: In function `sys_munlock':
(.text+0xcaf1): undefined reference to `_proxy_pda'
mm/built-in.o:(.text+0xcc11): more undefined references to `_proxy_pda' follow
make: *** [.tmp_vmlinux1] Błąd 1
Something related (git tree fetched 1-2h ago) ?
> -Andi
--
Arkadiusz Miśkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What was in the x86 merge for .20
2006-12-08 12:04 ` Arkadiusz Miskiewicz
@ 2006-12-08 20:35 ` Jeremy Fitzhardinge
2006-12-08 21:06 ` proxy_pda was " Andi Kleen
0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2006-12-08 20:35 UTC (permalink / raw)
To: Arkadiusz Miskiewicz; +Cc: Andi Kleen, linux-kernel
Arkadiusz Miskiewicz wrote:
> LD .tmp_vmlinux1
> arch/i386/kernel/built-in.o: In function `math_emulate':
> (.text+0x3809): undefined reference to `_proxy_pda'
>
Hm, in theory nothing should ever generate a reference to _proxy_pda.
What compiler are you using?
J
^ permalink raw reply [flat|nested] 6+ messages in thread
* proxy_pda was Re: What was in the x86 merge for .20
2006-12-08 20:35 ` Jeremy Fitzhardinge
@ 2006-12-08 21:06 ` Andi Kleen
2006-12-08 21:09 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2006-12-08 21:06 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Arkadiusz Miskiewicz, linux-kernel
On Friday 08 December 2006 21:35, Jeremy Fitzhardinge wrote:
> Arkadiusz Miskiewicz wrote:
> > LD .tmp_vmlinux1
> > arch/i386/kernel/built-in.o: In function `math_emulate':
> > (.text+0x3809): undefined reference to `_proxy_pda'
> >
>
> Hm, in theory nothing should ever generate a reference to _proxy_pda.
> What compiler are you using?
Looking at Arkadiusz' output file it looks like gcc 4.2 decided to CSE the
address :/
movl $_proxy_pda+8, %edx #, tmp65
Very sad, but legitimate.
The only workaround I can think of would be to define it as a symbol
(or away in vmlinux.lds.S). Or do away with the idea of proxy_pda
again.
-Andi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: proxy_pda was Re: What was in the x86 merge for .20
2006-12-08 21:06 ` proxy_pda was " Andi Kleen
@ 2006-12-08 21:09 ` Jeremy Fitzhardinge
2006-12-08 21:22 ` Andi Kleen
0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2006-12-08 21:09 UTC (permalink / raw)
To: Andi Kleen; +Cc: Arkadiusz Miskiewicz, linux-kernel
Andi Kleen wrote:
> Looking at Arkadiusz' output file it looks like gcc 4.2 decided to CSE the
> address :/
>
> movl $_proxy_pda+8, %edx #, tmp65
>
> Very sad, but legitimate.
>
Yes, that was my conclusion too. Though in this case the code could be
cleaned up by cutting down on the number of uses of "current" - but
that's hardly a general fix.
> The only workaround I can think of would be to define it as a symbol
> (or away in vmlinux.lds.S)
Yes. I was thinking about setting it in vmlinux.lds to some obviously
bad address so that any access will be highly visible.
> . Or do away with the idea of proxy_pda
> again.
>
That would be very sad indeed.
J
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: proxy_pda was Re: What was in the x86 merge for .20
2006-12-08 21:09 ` Jeremy Fitzhardinge
@ 2006-12-08 21:22 ` Andi Kleen
2006-12-08 21:34 ` Jeremy Fitzhardinge
2006-12-08 22:04 ` Arkadiusz Miskiewicz
0 siblings, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2006-12-08 21:22 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Arkadiusz Miskiewicz, linux-kernel
On Friday 08 December 2006 22:09, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > Looking at Arkadiusz' output file it looks like gcc 4.2 decided to CSE the
> > address :/
> >
> > movl $_proxy_pda+8, %edx #, tmp65
> >
> > Very sad, but legitimate.
> >
>
> Yes, that was my conclusion too. Though in this case the code could be
> cleaned up by cutting down on the number of uses of "current" - but
> that's hardly a general fix.
>
> > The only workaround I can think of would be to define it as a symbol
> > (or away in vmlinux.lds.S)
>
> Yes. I was thinking about setting it in vmlinux.lds to some obviously
> bad address so that any access will be highly visible.
>
> > . Or do away with the idea of proxy_pda
> > again.
> >
>
> That would be very sad indeed.
The trouble is when it's CSEd it actually causes worse code because
a register is tied up. That might not be worth the advantage of having it?
Hmm, maybe marking it volatile would help? Arkadiusz, does the following patch
help?
-Andi
Work around gcc 4.2 CSEing the proxy_pda
proxy_pda doesn't exactly exist -- we just use it to describe side effects
of the PDA manipulation to gcc. But when gcc 4.2 CSEs the address
it generates references and worse code and the link breaks.
Let's cast it to volatile and see if it that helps.
TBD same for x86-64
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux/include/asm-i386/pda.h
===================================================================
--- linux.orig/include/asm-i386/pda.h
+++ linux/include/asm-i386/pda.h
@@ -40,19 +40,19 @@ extern struct i386_pda _proxy_pda;
switch (sizeof(_proxy_pda.field)) { \
case 1: \
asm(op "b %1,%%gs:%c2" \
- : "+m" (_proxy_pda.field) \
+ : "+m" (*(volatile T__ *)&_proxy_pda.field) \
:"ri" ((T__)val), \
"i"(pda_offset(field))); \
break; \
case 2: \
asm(op "w %1,%%gs:%c2" \
- : "+m" (_proxy_pda.field) \
+ : "+m" (*(volatile T__ *)&_proxy_pda.field) \
:"ri" ((T__)val), \
"i"(pda_offset(field))); \
break; \
case 4: \
asm(op "l %1,%%gs:%c2" \
- : "+m" (_proxy_pda.field) \
+ : "+m" (*(volatile T__ *)&_proxy_pda.field) \
:"ri" ((T__)val), \
"i"(pda_offset(field))); \
break; \
@@ -63,24 +63,25 @@ extern struct i386_pda _proxy_pda;
#define pda_from_op(op,field) \
({ \
typeof(_proxy_pda.field) ret__; \
+ typedef typeof(_proxy_pda.field) T__; \
switch (sizeof(_proxy_pda.field)) { \
case 1: \
asm(op "b %%gs:%c1,%0" \
: "=r" (ret__) \
: "i" (pda_offset(field)), \
- "m" (_proxy_pda.field)); \
+ "m" (*(volatile T__ *)&_proxy_pda.field)); \
break; \
case 2: \
asm(op "w %%gs:%c1,%0" \
: "=r" (ret__) \
: "i" (pda_offset(field)), \
- "m" (_proxy_pda.field)); \
+ "m" (*(volatile T__ *)&_proxy_pda.field)); \
break; \
case 4: \
asm(op "l %%gs:%c1,%0" \
: "=r" (ret__) \
: "i" (pda_offset(field)), \
- "m" (_proxy_pda.field)); \
+ "m" (*(volatile T__ *)&_proxy_pda.field)); \
break; \
default: __bad_pda_field(); \
} \
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: proxy_pda was Re: What was in the x86 merge for .20
2006-12-08 21:22 ` Andi Kleen
@ 2006-12-08 21:34 ` Jeremy Fitzhardinge
2006-12-08 22:04 ` Arkadiusz Miskiewicz
1 sibling, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2006-12-08 21:34 UTC (permalink / raw)
To: Andi Kleen; +Cc: Arkadiusz Miskiewicz, linux-kernel
Andi Kleen wrote:
> The trouble is when it's CSEd it actually causes worse code because
> a register is tied up. That might not be worth the advantage of having it?
>
I think so, definitely; without proxy_pda you need to make it asm
volatile+mem clobber, which completely eliminates all optimisation
opportunities; in general the proxy_pda allows gcc to CSE and reorder
pda accesses. I guess in this case the memory writes inhibited the
overall CSE of current, so its just making do by CSEing the address.
> Hmm, maybe marking it volatile would help? Arkadiusz, does the following patch
> help?
>
Might work. But doesn't this make the pointed-at proxy_pda volatile,
not the proxy_pda pointer itself? Should it be something like (volatile
__T * volatile)?
J
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: proxy_pda was Re: What was in the x86 merge for .20
2006-12-08 21:22 ` Andi Kleen
2006-12-08 21:34 ` Jeremy Fitzhardinge
@ 2006-12-08 22:04 ` Arkadiusz Miskiewicz
1 sibling, 0 replies; 6+ messages in thread
From: Arkadiusz Miskiewicz @ 2006-12-08 22:04 UTC (permalink / raw)
To: Andi Kleen; +Cc: Jeremy Fitzhardinge, linux-kernel
On Friday 08 December 2006 22:22, Andi Kleen wrote:
> The trouble is when it's CSEd it actually causes worse code because
> a register is tied up. That might not be worth the advantage of having it?
>
> Hmm, maybe marking it volatile would help? Arkadiusz, does the following
> patch help?
Unfortunately - no.
LD .tmp_vmlinux1
arch/i386/kernel/built-in.o: In function `dump_stack':
(.text+0x36a7): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o: In function `math_emulate':
(.text+0x3910): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o: In function `smp_error_interrupt':
(.text+0xe093): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o: In function `smp_apic_timer_interrupt':
(.text+0xe16d): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o: In function `smp_apic_timer_interrupt':
(.text+0xe184): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o:(.init.text+0x268a): more undefined references to
`_proxy_pda' follow
make: *** [.tmp_vmlinux1] Błąd 1
.i, .S offlist.
> -Andi
--
Arkadiusz Miśkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-01-15 21:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-15 20:41 proxy_pda was Re: What was in the x86 merge for .20 Paweł Sikora
-- strict thread matches above, loose matches on Subject: below --
2006-12-08 3:01 Andi Kleen
2006-12-08 12:04 ` Arkadiusz Miskiewicz
2006-12-08 20:35 ` Jeremy Fitzhardinge
2006-12-08 21:06 ` proxy_pda was " Andi Kleen
2006-12-08 21:09 ` Jeremy Fitzhardinge
2006-12-08 21:22 ` Andi Kleen
2006-12-08 21:34 ` Jeremy Fitzhardinge
2006-12-08 22:04 ` Arkadiusz Miskiewicz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).