LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: proxy_pda was Re: What was in the x86 merge for .20
@ 2007-01-15 20:41 Paweł Sikora
  0 siblings, 0 replies; 6+ messages in thread
From: Paweł Sikora @ 2007-01-15 20:41 UTC (permalink / raw)
  To: linux-kernel

Hi,

I've reviewed the thread and can propose a solution.
Let's see e.g. the dev.s ( from fuse.ko ). Currently with gcc-4.2 we get:

fuse_req_init_context:
        movl    $_proxy_pda+8, %edx     #, tmp62
#APP
        movl %gs:8,%ecx #, ret__
#NO_APP
        movl    344(%ecx), %ecx # <variable>.fsuid, <variable>.fsuid
        movl    %ecx, 60(%eax)  # <variable>.fsuid, <variable>.in.h.uid
#APP
        movl %gs:8,%ecx #, ret__
#NO_APP
        movl    360(%ecx), %ecx # <variable>.fsgid, <variable>.fsgid
        movl    %ecx, 64(%eax)  # <variable>.fsgid, <variable>.in.h.gid
#APP
        movl %gs:8,%edx #, ret__
#NO_APP
        movl    164(%edx), %edx # <variable>.pid, <variable>.pid
        movl    %edx, 68(%eax)  # <variable>.pid, <variable>.in.h.pid
        ret

In this scenario gcc is explictly blocked by -fno-strict-aliasing
and massive %gs:8 reloads are present. If you fix aliasing violations
in kernel then you could use -fstrict-aliasing to get what you want.

fuse_req_init_context:
#APP
        movl %gs:8,%ecx #, ret__
#NO_APP
        movl    344(%ecx), %edx # <variable>.fsuid, <variable>.fsuid
        movl    %edx, 60(%eax)  # <variable>.fsuid, <variable>.in.h.uid
        movl    360(%ecx), %edx # <variable>.fsgid, <variable>.fsgid
        movl    %edx, 64(%eax)  # <variable>.fsgid, <variable>.in.h.gid
        movl    164(%ecx), %edx # <variable>.pid, <variable>.pid
        movl    %edx, 68(%eax)  # <variable>.pid, <variable>.in.h.pid
        ret

BR,
Pawel.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: proxy_pda was Re: What was in the x86 merge for .20
  2006-12-08 21:22         ` Andi Kleen
  2006-12-08 21:34           ` Jeremy Fitzhardinge
@ 2006-12-08 22:04           ` Arkadiusz Miskiewicz
  1 sibling, 0 replies; 6+ messages in thread
From: Arkadiusz Miskiewicz @ 2006-12-08 22:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jeremy Fitzhardinge, linux-kernel

On Friday 08 December 2006 22:22, Andi Kleen wrote:

> The trouble is when it's CSEd it actually causes worse code because
> a register is tied up. That might not be worth the advantage of having it?
>
> Hmm, maybe marking it volatile would help? Arkadiusz, does the following
> patch help?

Unfortunately - no.

  LD      .tmp_vmlinux1
arch/i386/kernel/built-in.o: In function `dump_stack':
(.text+0x36a7): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o: In function `math_emulate':
(.text+0x3910): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o: In function `smp_error_interrupt':
(.text+0xe093): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o: In function `smp_apic_timer_interrupt':
(.text+0xe16d): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o: In function `smp_apic_timer_interrupt':
(.text+0xe184): undefined reference to `_proxy_pda'
arch/i386/kernel/built-in.o:(.init.text+0x268a): more undefined references to 
`_proxy_pda' follow
make: *** [.tmp_vmlinux1] Błąd 1

.i, .S offlist.

> -Andi


-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: proxy_pda was Re: What was in the x86 merge for .20
  2006-12-08 21:22         ` Andi Kleen
@ 2006-12-08 21:34           ` Jeremy Fitzhardinge
  2006-12-08 22:04           ` Arkadiusz Miskiewicz
  1 sibling, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2006-12-08 21:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Arkadiusz Miskiewicz, linux-kernel

Andi Kleen wrote:
> The trouble is when it's CSEd it actually causes worse code because
> a register is tied up. That might not be worth the advantage of having it?
>   

I think so, definitely; without proxy_pda you need to make it asm
volatile+mem clobber, which completely eliminates all optimisation
opportunities; in general the proxy_pda allows gcc to CSE and reorder
pda accesses.  I guess in this case the memory writes inhibited the
overall CSE of current, so its just making do by CSEing the address.

> Hmm, maybe marking it volatile would help? Arkadiusz, does the following patch
> help?
>   

Might work.  But doesn't this make the pointed-at proxy_pda volatile,
not the proxy_pda pointer itself?  Should it be something like (volatile
__T * volatile)?

    J

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: proxy_pda was Re: What was in the x86 merge for .20
  2006-12-08 21:09       ` Jeremy Fitzhardinge
@ 2006-12-08 21:22         ` Andi Kleen
  2006-12-08 21:34           ` Jeremy Fitzhardinge
  2006-12-08 22:04           ` Arkadiusz Miskiewicz
  0 siblings, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2006-12-08 21:22 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Arkadiusz Miskiewicz, linux-kernel

On Friday 08 December 2006 22:09, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > Looking at Arkadiusz' output file it looks like gcc 4.2 decided to CSE the
> > address :/
> >
> > 	movl	$_proxy_pda+8, %edx	#, tmp65
> >
> > Very sad, but legitimate.
> >   
> 
> Yes, that was my conclusion too.  Though in this case the code could be
> cleaned up by cutting down on the number of uses of "current" - but
> that's hardly a general fix.
> 
> > The only workaround I can think of would be to define it as a symbol
> > (or away in vmlinux.lds.S)
> 
> Yes.  I was thinking about setting it in vmlinux.lds to some obviously
> bad address so that any access will be highly visible.
> 
> > . Or do away with the idea of proxy_pda
> > again.
> >   
> 
> That would be very sad indeed.

The trouble is when it's CSEd it actually causes worse code because
a register is tied up. That might not be worth the advantage of having it?

Hmm, maybe marking it volatile would help? Arkadiusz, does the following patch
help?

-Andi

Work around gcc 4.2 CSEing the proxy_pda

proxy_pda doesn't exactly exist -- we just use it to describe side effects
of the PDA manipulation to gcc. But when gcc 4.2 CSEs the address
it generates references and worse code and the link breaks.

Let's cast it to volatile and see if it that helps.

TBD same for x86-64
Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/include/asm-i386/pda.h
===================================================================
--- linux.orig/include/asm-i386/pda.h
+++ linux/include/asm-i386/pda.h
@@ -40,19 +40,19 @@ extern struct i386_pda _proxy_pda;
 		switch (sizeof(_proxy_pda.field)) {			\
 		case 1:							\
 			asm(op "b %1,%%gs:%c2"				\
-			    : "+m" (_proxy_pda.field)			\
+			    : "+m" (*(volatile T__ *)&_proxy_pda.field)			\
 			    :"ri" ((T__)val),				\
 			     "i"(pda_offset(field)));			\
 			break;						\
 		case 2:							\
 			asm(op "w %1,%%gs:%c2"				\
-			    : "+m" (_proxy_pda.field)			\
+			    : "+m" (*(volatile T__ *)&_proxy_pda.field)			\
 			    :"ri" ((T__)val),				\
 			     "i"(pda_offset(field)));			\
 			break;						\
 		case 4:							\
 			asm(op "l %1,%%gs:%c2"				\
-			    : "+m" (_proxy_pda.field)			\
+			    : "+m" (*(volatile T__ *)&_proxy_pda.field)			\
 			    :"ri" ((T__)val),				\
 			     "i"(pda_offset(field)));			\
 			break;						\
@@ -63,24 +63,25 @@ extern struct i386_pda _proxy_pda;
 #define pda_from_op(op,field)						\
 	({								\
 		typeof(_proxy_pda.field) ret__;				\
+		typedef typeof(_proxy_pda.field) T__;			\
 		switch (sizeof(_proxy_pda.field)) {			\
 		case 1:							\
 			asm(op "b %%gs:%c1,%0"				\
 			    : "=r" (ret__)				\
 			    : "i" (pda_offset(field)),			\
-			      "m" (_proxy_pda.field));			\
+			      "m" (*(volatile T__ *)&_proxy_pda.field));			\
 			break;						\
 		case 2:							\
 			asm(op "w %%gs:%c1,%0"				\
 			    : "=r" (ret__)				\
 			    : "i" (pda_offset(field)),			\
-			      "m" (_proxy_pda.field));			\
+			      "m" (*(volatile T__ *)&_proxy_pda.field));			\
 			break;						\
 		case 4:							\
 			asm(op "l %%gs:%c1,%0"				\
 			    : "=r" (ret__)				\
 			    : "i" (pda_offset(field)),			\
-			      "m" (_proxy_pda.field));			\
+			      "m" (*(volatile T__ *)&_proxy_pda.field));			\
 			break;						\
 		default: __bad_pda_field();				\
 		}							\

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: proxy_pda was Re: What was in the x86 merge for .20
  2006-12-08 21:06     ` proxy_pda was " Andi Kleen
@ 2006-12-08 21:09       ` Jeremy Fitzhardinge
  2006-12-08 21:22         ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2006-12-08 21:09 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Arkadiusz Miskiewicz, linux-kernel

Andi Kleen wrote:
> Looking at Arkadiusz' output file it looks like gcc 4.2 decided to CSE the
> address :/
>
> 	movl	$_proxy_pda+8, %edx	#, tmp65
>
> Very sad, but legitimate.
>   

Yes, that was my conclusion too.  Though in this case the code could be
cleaned up by cutting down on the number of uses of "current" - but
that's hardly a general fix.

> The only workaround I can think of would be to define it as a symbol
> (or away in vmlinux.lds.S)

Yes.  I was thinking about setting it in vmlinux.lds to some obviously
bad address so that any access will be highly visible.

> . Or do away with the idea of proxy_pda
> again.
>   

That would be very sad indeed.

    J

^ permalink raw reply	[flat|nested] 6+ messages in thread

* proxy_pda was Re: What was in the x86 merge for .20
  2006-12-08 20:35   ` Jeremy Fitzhardinge
@ 2006-12-08 21:06     ` Andi Kleen
  2006-12-08 21:09       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2006-12-08 21:06 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Arkadiusz Miskiewicz, linux-kernel

On Friday 08 December 2006 21:35, Jeremy Fitzhardinge wrote:
> Arkadiusz Miskiewicz wrote:
> >   LD      .tmp_vmlinux1
> > arch/i386/kernel/built-in.o: In function `math_emulate':
> > (.text+0x3809): undefined reference to `_proxy_pda'
> >   
> 
> Hm, in theory nothing should ever generate a reference to _proxy_pda. 
> What compiler are you using?

Looking at Arkadiusz' output file it looks like gcc 4.2 decided to CSE the
address :/

	movl	$_proxy_pda+8, %edx	#, tmp65

Very sad, but legitimate.

The only workaround I can think of would be to define it as a symbol
(or away in vmlinux.lds.S). Or do away with the idea of proxy_pda
again.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-01-15 21:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-15 20:41 proxy_pda was Re: What was in the x86 merge for .20 Paweł Sikora
  -- strict thread matches above, loose matches on Subject: below --
2006-12-08  3:01 Andi Kleen
2006-12-08 12:04 ` Arkadiusz Miskiewicz
2006-12-08 20:35   ` Jeremy Fitzhardinge
2006-12-08 21:06     ` proxy_pda was " Andi Kleen
2006-12-08 21:09       ` Jeremy Fitzhardinge
2006-12-08 21:22         ` Andi Kleen
2006-12-08 21:34           ` Jeremy Fitzhardinge
2006-12-08 22:04           ` Arkadiusz Miskiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).