* What was in the x86 merge for .20
@ 2006-12-08  3:01 Andi Kleen
  2006-12-08 10:08 ` Andrew Morton
  2006-12-08 12:04 ` Arkadiusz Miskiewicz
From: Andi Kleen @ 2006-12-08  3:01 UTC (permalink / raw)
  To: linux-kernel, discuss

[The merge already made it to Linus' tree. Sorry for sending this message

Most of this is for both i386 and x86-64, unless when noted

These are just some high lights. As usual there are more
smaller optimizations, cleanups etc

- paravirt support for i386: the basic hooks for replacing all 
non virtualizable instructions on x86 are in. This currently
only runs on native hardware, but will allow to link in
modules  for paravirtualized Xen/Vmware/lhype.
There are some limitations like no SMP support yet. 
- Support for a Processor Data Area (PDA) on i386. This makes
the code more similar to x86-64 and will allow some other
optimizations in the future. 
- Relocatable kernel support for i386. This allows to load 
a single kernel binary on multiple addresses. This is useful
to use kdump kernels without having to maintain separate
- Sleazy FPU feature also supported now on i386 -- this will
give a small improvement to FPU intensive programs because
they have to do  less lazy FPU exceptions.
- When a spinlock lockup occurs print backtraces of all CPUs. This 
makes debugging deadlocks easier
- x86-64 now also spins on spinlocks with interrupts enabled when
- Various dwarf2 unwinder improvements.
In particular better debugging support for figuring out what's wrong
and the unwinder should be less likely to crash now when it finds
invalid unwinding data.
- Use more efficient cache flushing when cache attributes are changed
- Allow compiling kernel for core2.  To be really useful this
will require gcc support to compile for core2 which isn't ready yet.
- Various fixes to the MTRR code
- Some preparatory infrastructure for perfmon
- Improve TSC setup heuristics on Core2 and AMD K8
- Don't try to synchronize TSCs on boot anymore. Instead just
checks if they are synchronized or not and disable TSC use
when unsynchronized. 
- More fixes to the idle notifier
- Various other bug fixes and cleanups


* Re: proxy_pda was Re: What was in the x86 merge for .20
@ 2007-01-15 20:41 Paweł Sikora
From: Paweł Sikora @ 2007-01-15 20:41 UTC (permalink / raw)
  To: linux-kernel


I've reviewed the thread and can propose a solution.
Let's see e.g. the dev.s ( from fuse.ko ). Currently with gcc-4.2 we get:

        movl    $_proxy_pda+8, %edx     #, tmp62
        movl %gs:8,%ecx #, ret__
        movl    344(%ecx), %ecx # <variable>.fsuid, <variable>.fsuid
        movl    %ecx, 60(%eax)  # <variable>.fsuid, <variable>.in.h.uid
        movl %gs:8,%ecx #, ret__
        movl    360(%ecx), %ecx # <variable>.fsgid, <variable>.fsgid
        movl    %ecx, 64(%eax)  # <variable>.fsgid, <variable>.in.h.gid
        movl %gs:8,%edx #, ret__
        movl    164(%edx), %edx # <variable>.pid, <variable>.pid
        movl    %edx, 68(%eax)  # <variable>.pid, <variable>

In this scenario gcc is explictly blocked by -fno-strict-aliasing
and massive %gs:8 reloads are present. If you fix aliasing violations
in kernel then you could use -fstrict-aliasing to get what you want.

        movl %gs:8,%ecx #, ret__
        movl    344(%ecx), %edx # <variable>.fsuid, <variable>.fsuid
        movl    %edx, 60(%eax)  # <variable>.fsuid, <variable>.in.h.uid
        movl    360(%ecx), %edx # <variable>.fsgid, <variable>.fsgid
        movl    %edx, 64(%eax)  # <variable>.fsgid, <variable>.in.h.gid
        movl    164(%ecx), %edx # <variable>.pid, <variable>.pid
        movl    %edx, 68(%eax)  # <variable>.pid, <variable>


