LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [x86.git#mm] stack protector fixes, vmsplice exploit
@ 2008-02-14 17:00 Ingo Molnar
  2008-02-14 17:16 ` pageexec
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2008-02-14 17:00 UTC (permalink / raw)
  To: pageexec
  Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds,
	Thomas Gleixner, H. Peter Anvin


update: latest x86.git#mm has a pretty much working stack-protector 
feature - you can pick it up for testing via:

   http://people.redhat.com/mingo/x86.git/README

as pageexec@freemail.hu has indicated it already in his analysis and 
patch, there were multiple bugs hitting us here. The amount and scope of 
these problems show structural problems in how security in this area was 
approached. So with these changes we try to go deeper than just 
minimally fixing the feature. We've got 15 changes so far in and around 
this area:

  x86: fix execve with -fstack-protect
  x86: exclude vsyscall files from stackprotect
  x86: fix stackprotect Makefile rule
  x86: fix stackprotector canary updates during context switches
  panic: print more informative messages on stackprotect failure
  panic: print out stacktrace if DEBUG_BUGVERBOSE
  x86: enable stack-protector by default
  x86: setup stack canary for the idle threads
  x86: fix canary of the boot CPU's idle task
  stackprotector: include files
  stackprotector: add boot_init_stack_canary()
  x86: fix the stackprotector canary of the boot CPU
  x86: stackprotector: mix TSC to the boot canary
  x86: test the presence of the stackprotector
  x86: streamline stackprotector

but we've not completed this work yet. We'll push the independent bits 
to Linus ASAP.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [x86.git#mm] stack protector fixes, vmsplice exploit
  2008-02-14 17:00 [x86.git#mm] stack protector fixes, vmsplice exploit Ingo Molnar
@ 2008-02-14 17:16 ` pageexec
  2008-02-14 19:00   ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: pageexec @ 2008-02-14 17:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds,
	Thomas Gleixner, H. Peter Anvin

On 14 Feb 2008 at 18:00, Ingo Molnar wrote:

some comments:

>   x86: fix execve with -fstack-protect

the commit comment says:

> This hack was true up to the point the stackprotector added
> another word to the stack frame. Shifting all the addresses
> by 8 bytes, crashing and burning any exec attempt.

actually, that's not the problem here because the canary is in the
local variable area, it doesn't affect the passed arguments. what
happens here is that gcc treats the argument area as owned by the
callee, not the caller and is allowed to do certain tricks. for ssp
it will make a copy of the struct passed by value into the local
variable area and pass *its* address down, and it won't copy it back
into the original instance stored in the argument area.

so once sys_execve returns, the pt_regs passed by value hasn't at all
changed and its default content will cause a nice double fault (FWIW,
this part took me the longest to debug, being down with cold didn't
help it either ;).

>   x86: fix stackprotector canary updates during context switches

the commit says:

> Note: this means that we call __switch_to() [and its sub-functions]
> still with the old canary, but that is not a problem, both the previous
> and the next task has a high-quality canary. The only (mostly academic)
> disadvantage is that the canary of one task may leak onto the stack of
> another task, increasing the risk of information leaks, were an attacker
> able to read the stack of specific tasks (but not that of others).

the best practical defense against leaking the canary is to change its
value on every syscall but it has some performance impact in microbenchmarks.

>   x86: stackprotector: mix TSC to the boot canary

this should probably be implemented in a general get_random_long()
function...

>   x86: test the presence of the stackprotector
>   x86: streamline stackprotector

the config comment says -fstack-protector whereas you're really enabling
the stonger -fstack-protector-all feature (without the latter the ssp
test function sample_stackprotector_fn wouldn't even be instrumented ;-).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [x86.git#mm] stack protector fixes, vmsplice exploit
  2008-02-14 19:00   ` Ingo Molnar
@ 2008-02-14 18:55     ` pageexec
  2008-02-14 20:25       ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: pageexec @ 2008-02-14 18:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds,
	Thomas Gleixner, H. Peter Anvin

On 14 Feb 2008 at 20:00, Ingo Molnar wrote:

> > the best practical defense against leaking the canary is to change its 
> > value on every syscall but it has some performance impact in 
> > microbenchmarks.
> 
> yeah, that's not really practical (especially as it would deplete our 
> entropy pool pretty fast would could in some circumstances introduce a 
> higher risk to the system than the risk of a canary leaking).

you don't necessarily have to use the heavy-handed ip id code as it
is used now, random32 is plenty good here.

> I think we can avoid the leakage across tasks by being careful during 
> context-switch time: never calling with the old canary still in the PDA. 
> I think this should be fairly easy as we'd just have to load the new 
> pdaptr in the switch_to() assembly code.

i don't think you have to worry about cross-task leaking at all, a
hypothetical exploit is happy to learn its own canary and that's
actually easier than learning some other task's canary by virtue
of bugs that leak uninitialized struct padding and stuff...

really, the best defense is to reduce the useful lifetime of any
leaked canary, and you can't get better than syscall granularity
without disproportional effort and impact elsewhere (and i'm sure
some would find even this disproportional ;).

> TODO: perhaps all vsyscall functions need to move into separate .o 
> files. (probably academic as the functions affected by that tend to be 
> very simple and do not deal with anything overflowable.)

yeah, i wasn't suggesting it for its security value, more like for
'full coverage'. if such a separation isn't useful otherwise (no idea
if not having the .vsyscall* code along with related kernel code would be
confusing for the reader for example), i'd say this isn't important.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [x86.git#mm] stack protector fixes, vmsplice exploit
  2008-02-14 17:16 ` pageexec
@ 2008-02-14 19:00   ` Ingo Molnar
  2008-02-14 18:55     ` pageexec
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2008-02-14 19:00 UTC (permalink / raw)
  To: pageexec
  Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds,
	Thomas Gleixner, H. Peter Anvin


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> the commit comment says:
> 
> > This hack was true up to the point the stackprotector added another 
> > word to the stack frame. Shifting all the addresses by 8 bytes, 
> > crashing and burning any exec attempt.
> 
> actually, that's not the problem here because the canary is in the 
> local variable area, it doesn't affect the passed arguments. what 
> happens here is that gcc treats the argument area as owned by the 
> callee, not the caller and is allowed to do certain tricks. for ssp it 
> will make a copy of the struct passed by value into the local variable 
> area and pass *its* address down, and it won't copy it back into the 
> original instance stored in the argument area.
> 
> so once sys_execve returns, the pt_regs passed by value hasn't at all 
> changed and its default content will cause a nice double fault (FWIW, 
> this part took me the longest to debug, being down with cold didn't 
> help it either ;).

heh, indeed :-) I saw the double fault and was happy that you showed how 
to fix it (it did not seem obvious to me _at all_, and i've seen my fair 
share of fixes) and i quickly lured myself into having understood it :-/ 

I've now updated the commit message with your (correct) description.

> > Note: this means that we call __switch_to() [and its sub-functions] 
> > still with the old canary, but that is not a problem, both the 
> > previous and the next task has a high-quality canary. The only 
> > (mostly academic) disadvantage is that the canary of one task may 
> > leak onto the stack of another task, increasing the risk of 
> > information leaks, were an attacker able to read the stack of 
> > specific tasks (but not that of others).
> 
> the best practical defense against leaking the canary is to change its 
> value on every syscall but it has some performance impact in 
> microbenchmarks.

yeah, that's not really practical (especially as it would deplete our 
entropy pool pretty fast would could in some circumstances introduce a 
higher risk to the system than the risk of a canary leaking).

I think we can avoid the leakage across tasks by being careful during 
context-switch time: never calling with the old canary still in the PDA. 
I think this should be fairly easy as we'd just have to load the new 
pdaptr in the switch_to() assembly code.

[ the only (even more academic) possibility here would be an NMI to hit
  during those two instructions that update RSP and load the new pdpptr,
  and leaving the old canary on the stack. But NMIs go to separate stack
  frames anyway on 64-bit so this isnt really a practical worry IMO. ]

Hm?

> >   x86: stackprotector: mix TSC to the boot canary
> 
> this should probably be implemented in a general get_random_long() 
> function...

yeah ... this is the second time i could have used it recently.

> >   x86: test the presence of the stackprotector
> >   x86: streamline stackprotector
> 
> the config comment says -fstack-protector whereas you're really 
> enabling the stonger -fstack-protector-all feature (without the latter 
> the ssp test function sample_stackprotector_fn wouldn't even be 
> instrumented ;-).

yeah, indeed. I fixed the comments and flipped the commits around :-)

TODO: perhaps all vsyscall functions need to move into separate .o 
files. (probably academic as the functions affected by that tend to be 
very simple and do not deal with anything overflowable.)

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [x86.git#mm] stack protector fixes, vmsplice exploit
  2008-02-14 18:55     ` pageexec
@ 2008-02-14 20:25       ` Ingo Molnar
  2008-02-14 21:00         ` Arjan van de Ven
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Ingo Molnar @ 2008-02-14 20:25 UTC (permalink / raw)
  To: pageexec
  Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds,
	Thomas Gleixner, H. Peter Anvin


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> really, the best defense is to reduce the useful lifetime of any 
> leaked canary, and you can't get better than syscall granularity 
> without disproportional effort and impact elsewhere (and i'm sure some 
> would find even this disproportional ;).

hm, i think per syscall canaries are really expensive.

The per function call overhead from stackprotector is already pretty 
serious IMO, but at least that's something that GCC _could_ be doing 
(much) smarter (why doesnt it jne forward out to __check_stk_failure, 
instead of generating 4 instructions, one of them a default-mispredicted 
branch instruction??), so that overhead could in theory be something 
like 4 fall-through instructions per function, instead of the current 6.

Also, even with -fstack-protector-all, gcc could detect the _provably_ 
correct and overflow-safe functions and skip their annotation 
altogether. And those are the ones that hurt most: the small and 
straightforward (and also, most of the time, overflow-free) ones.

The heuristics for -fstack-protector were really botched, as you noted 
they happened to mark vmsplice as "safe". So either we do the full thing 
or nothing - anything inbetween is just russian roulette and false sense 
of security.

But per syscall canary randomization is something that would be a drag 
on our null syscall latency forever. I'm really unsure about it. It will 
also be driven in the wrong direction: people would try to "optimize" 
the per syscall random number generation which is a constant erosive 
force on its strength and there's no strong counter-force. (and no clear 
boundary either 'random32 should be plenty enough' is too vague as 
limit.)

So lets perhaps also analyze the security effects of not having per 
system call canaries and make the best out of it: the main threat is 
information leaks via padding and similar initialization mistakes [which 
have almost zero accidental discovery rate], right?

So could we perhaps somehow turn kernel-space information leaks into 
functional, and hence detectable failures? Wouldnt for example kmemcheck 
notice them, once they are copied to user-space? [ If you havent seen 
kmemcheck yet i think you'll love it, it's a similarly neat trick as the 
pte noexec trick you did in the PaX kernel. It's similarly disgusting as 
well ;-) ]

> > TODO: perhaps all vsyscall functions need to move into separate .o 
> > files. (probably academic as the functions affected by that tend to 
> > be very simple and do not deal with anything overflowable.)
> 
> yeah, i wasn't suggesting it for its security value, more like for 
> 'full coverage'. if such a separation isn't useful otherwise (no idea 
> if not having the .vsyscall* code along with related kernel code would 
> be confusing for the reader for example), i'd say this isn't 
> important.

perhaps it would be more useful to have some "no stack protector" gcc 
attribute. We could then stick that into the __vsyscall definition and 
it's all done and self-maintaining from that point on. (because a 
missing __vsyscall annotation results in a runtime failure.) (I'd fully 
expect GCC to not have provided such an attribute.)

but yes, i agree in general that it's easier to maintain something if 
it's full coverage, that way one does not have to keep in mind (and 
cannot accidentally forget ;-) the exceptions.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [x86.git#mm] stack protector fixes, vmsplice exploit
  2008-02-14 20:25       ` Ingo Molnar
@ 2008-02-14 21:00         ` Arjan van de Ven
  2008-02-14 21:12         ` pageexec
  2008-02-14 22:35         ` Jakub Jelinek
  2 siblings, 0 replies; 11+ messages in thread
From: Arjan van de Ven @ 2008-02-14 21:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: pageexec, Sam Ravnborg, linux-kernel, torvalds, Thomas Gleixner,
	H. Peter Anvin

On Thu, 14 Feb 2008 21:25:35 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > really, the best defense is to reduce the useful lifetime of any 
> > leaked canary, and you can't get better than syscall granularity 
> > without disproportional effort and impact elsewhere (and i'm sure
> > some would find even this disproportional ;).
> 
> hm, i think per syscall canaries are really expensive.


it's not that bad. Assuming you use a PNR that you re-seed periodically,
it's 
* go to the next random number with PNR
* write to PDA and task struct

give or take 10 cycles total if you squeeze it hard, 20 if you don't.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [x86.git#mm] stack protector fixes, vmsplice exploit
  2008-02-14 20:25       ` Ingo Molnar
  2008-02-14 21:00         ` Arjan van de Ven
@ 2008-02-14 21:12         ` pageexec
  2008-02-14 22:35         ` Jakub Jelinek
  2 siblings, 0 replies; 11+ messages in thread
From: pageexec @ 2008-02-14 21:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds,
	Thomas Gleixner, H. Peter Anvin

On 14 Feb 2008 at 21:25, Ingo Molnar wrote:
> * pageexec@freemail.hu <pageexec@freemail.hu> wrote:
> 
> > really, the best defense is to reduce the useful lifetime of any 
> > leaked canary, and you can't get better than syscall granularity 
> > without disproportional effort and impact elsewhere (and i'm sure some 
> > would find even this disproportional ;).
> 
> hm, i think per syscall canaries are really expensive.

it depends on how it's implemented and what expensive is. i'd say
offering this as a config option would make sense for those who want
to turn ssp from a debugging feature (which it is at the moment) into
a security one (i know i will do it once i port the kernel stack
randomization to amd64 in PaX, it's the same code path).

speaking of that, for years PaX has had a feature to randomize the kernel
stack ptr on each kernel->user transition based on rdtsc (using only a few
bits of it obviously). its performance impact is really hard to measure
on anything (i mean, macrobenchmarks) and frankly, people blindly enable
it without having second thoughts about its impact because it just doesn't
matter.

as Arjan said, this ssp canary randomization could be done cheaply from
a PRNG, or heck, even just a simple ror by a random amount from rdtsc on
it will raise the bar enough (and the PRNG state is subject to info leaking
itself whereas rdtsc isn't ;).

in short, i suggest that this per-syscall canary be offered as a config
option at least and use something cheap and not particularly strong
(in crypto terms) just to raise the bar enough that no attacker would
accept the risk of panic'ing the box should the opportunity arise.

> The per function call overhead from stackprotector is already pretty 
> serious IMO,

it's a mov/mov/.../mov/xor/jz on the normal path basically, it's a
few cycles except for the forward jz but that should be easy to fix
in gcc.

with that said, this whole ssp business should have been changed into
return address verification a long time ago (i.e., use the saved return
address as the canary itself), that would detect both stack overflows
and prevent ret2libc style attacks for practical purposes.

> But per syscall canary randomization is something that would be a drag 
> on our null syscall latency forever. I'm really unsure about it. It will 
> also be driven in the wrong direction: people would try to "optimize" 
> the per syscall random number generation which is a constant erosive 
> force on its strength and there's no strong counter-force. (and no clear 
> boundary either 'random32 should be plenty enough' is too vague as 
> limit.)

i think you're getting lost in the details a bit here. the purpose of
changing the canary is to create a guaranteed minimum risk for an attacker
to abuse a leaked canary. obviously a constant (per task) canary means
0 risk for an attacker, so we can get only better than that. how much
better we want to get is mainly limited by what is considered cheap
implementation-wise and 'good enough' from a risk assessment point of
view.

say, if you were to flip a random bit in the canary then a leaked canary
would give an attacker 50% chance to succeed and 50% chance to panic the
box. even that is already high (for an attacker) but you see where i'm
getting at when you increase the randomly changed bits in the canary.

in practice, there's no need to change all 32/64 bits, that's for crypto
nuts only, but for real life attackers there's no difference between 99%
and 99.99999% chances of panic'ing a box (or even a box farm, one alarm
raised is an alarm that will draw attention regardless of how many other
boxes panic'ed). that's why i keep saying that as simple a thing as using
rdtsc/xor/ror/etc is 'plenty good'. i know it's not good for a cryptanalyst
in a generic setting, but this is a special situation (if you fail to guess
you don't get to try infinitely, you fail hard) and we can relax the crypto
requirements.

> So lets perhaps also analyze the security effects of not having per 
> system call canaries and make the best out of it: the main threat is 
> information leaks via padding and similar initialization mistakes [which 
> have almost zero accidental discovery rate], right?

that was only an example for a bug class that could provide/leak such
info, but obviously programmers are a lot more creative in accomplishing
the same.

the problem with kmemcheck is that it's a debugging aid, it obviously
cannot be used in production. also it's not comprehensive when you have
bugs that leak memory like the do_brk (or mremap, i forget) one did: a
valid userland pte kept pointing to a free'd and later reused page. i'm
not saying that one would use such a bug for canary leaking when it can
be abused other ways a lot better, it's just to say that there're many
more bug classes in the kernel than userland.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [x86.git#mm] stack protector fixes, vmsplice exploit
  2008-02-14 22:35         ` Jakub Jelinek
@ 2008-02-14 21:43           ` pageexec
  2008-02-14 23:19             ` Ingo Molnar
  2008-02-14 23:16           ` Ingo Molnar
  1 sibling, 1 reply; 11+ messages in thread
From: pageexec @ 2008-02-14 21:43 UTC (permalink / raw)
  To: Ingo Molnar, Jakub Jelinek
  Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds,
	Thomas Gleixner, H. Peter Anvin

On 14 Feb 2008 at 17:35, Jakub Jelinek wrote:

> Where do you see a mispredicted branch?

try it with -Os (gentoo gcc 4.2.2):

0000000000000000 <foo>:
   0:   48 83 ec 58             sub    $0x58,%rsp
   4:   65 48 8b 04 25 28 00 00 00      mov    %gs:0x28,%rax
   d:   48 89 44 24 48          mov    %rax,0x48(%rsp)
  12:   31 c0                   xor    %eax,%eax
  14:   48 89 e7                mov    %rsp,%rdi
  17:   e8 00 00 00 00          callq  1c <foo+0x1c>    18: R_X86_64_PC32       
bar+0xfffffffffffffffc
  1c:   48 8b 54 24 48          mov    0x48(%rsp),%rdx
  21:   65 48 33 14 25 28 00 00 00      xor    %gs:0x28,%rdx
  2a:   b8 06 00 00 00          mov    $0x6,%eax
  2f:   74 05                   je     36 <foo+0x36>
  31:   e8 00 00 00 00          callq  36 <foo+0x36>    32: R_X86_64_PC32       
__stack_chk_fail+0xfffffffffffffffc
  36:   48 83 c4 58             add    $0x58,%rsp
  3a:   c3                      retq


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [x86.git#mm] stack protector fixes, vmsplice exploit
  2008-02-14 20:25       ` Ingo Molnar
  2008-02-14 21:00         ` Arjan van de Ven
  2008-02-14 21:12         ` pageexec
@ 2008-02-14 22:35         ` Jakub Jelinek
  2008-02-14 21:43           ` pageexec
  2008-02-14 23:16           ` Ingo Molnar
  2 siblings, 2 replies; 11+ messages in thread
From: Jakub Jelinek @ 2008-02-14 22:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: pageexec, Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds,
	Thomas Gleixner, H. Peter Anvin

On Thu, Feb 14, 2008 at 09:25:35PM +0100, Ingo Molnar wrote:
> The per function call overhead from stackprotector is already pretty 
> serious IMO, but at least that's something that GCC _could_ be doing 
> (much) smarter (why doesnt it jne forward out to __check_stk_failure, 
> instead of generating 4 instructions, one of them a default-mispredicted 
> branch instruction??), so that overhead could in theory be something 
> like 4 fall-through instructions per function, instead of the current 6.

Where do you see a mispredicted branch?
int foo (void)
{
  char buf[64];
  bar (buf);
  return 6;
}

-O2 -fstack-protector -m64:
        subq    $88, %rsp
        movq    %fs:40, %rax
        movq    %rax, 72(%rsp)
        xorl    %eax, %eax
        movq    %rsp, %rdi
        call    bar
        movq    72(%rsp), %rdx
        xorq    %fs:40, %rdx
        movl    $6, %eax
        jne     .L5
        addq    $88, %rsp
        ret
.L5:
        .p2align 4,,6
        .p2align 3
        call    __stack_chk_fail
-O2 -fstack-protector -m32:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $88, %esp
        movl    %gs:20, %eax
        movl    %eax, -4(%ebp)
        xorl    %eax, %eax
        leal    -68(%ebp), %eax
        movl    %eax, (%esp)
        call    bar
        movl    $6, %eax
        movl    -4(%ebp), %edx
        xorl    %gs:20, %edx
        jne     .L5
        leave
        ret
.L5:
        .p2align 4,,7
        .p2align 3
        call    __stack_chk_fail
-O2 -fstack-protector -m64 -mcmodel=kernel:
        subq    $88, %rsp
        movq    %gs:40, %rax
        movq    %rax, 72(%rsp)
        xorl    %eax, %eax
        movq    %rsp, %rdi
        call    bar
        movq    72(%rsp), %rdx
        xorq    %gs:40, %rdx
        movl    $6, %eax
        jne     .L5
        addq    $88, %rsp
        ret
.L5:
        .p2align 4,,6
        .p2align 3
        call    __stack_chk_fail

both with gcc 4.1.x and 4.3.0.
BTW, you can use -fstack-protector --param=ssp-buffer-size=4
etc. to tweak the size of buffers to trigger stack protection, the
default is 8, but e.g. whole Fedora is compiled with 4.

	Jakub

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [x86.git#mm] stack protector fixes, vmsplice exploit
  2008-02-14 22:35         ` Jakub Jelinek
  2008-02-14 21:43           ` pageexec
@ 2008-02-14 23:16           ` Ingo Molnar
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2008-02-14 23:16 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: pageexec, Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds,
	Thomas Gleixner, H. Peter Anvin


* Jakub Jelinek <jakub@redhat.com> wrote:

> On Thu, Feb 14, 2008 at 09:25:35PM +0100, Ingo Molnar wrote:
> > The per function call overhead from stackprotector is already pretty 
> > serious IMO, but at least that's something that GCC _could_ be doing 
> > (much) smarter (why doesnt it jne forward out to __check_stk_failure, 
> > instead of generating 4 instructions, one of them a default-mispredicted 
> > branch instruction??), so that overhead could in theory be something 
> > like 4 fall-through instructions per function, instead of the current 6.
> 
> Where do you see a mispredicted branch?

ah!

> int foo (void)
> {
>   char buf[64];
>   bar (buf);
>   return 6;
> }
> 
> -O2 -fstack-protector -m64:
>         subq    $88, %rsp
>         movq    %fs:40, %rax
>         movq    %rax, 72(%rsp)
>         xorl    %eax, %eax
>         movq    %rsp, %rdi
>         call    bar
>         movq    72(%rsp), %rdx
>         xorq    %fs:40, %rdx
>         movl    $6, %eax
>         jne     .L5
>         addq    $88, %rsp
>         ret
> .L5:
>         .p2align 4,,6
>         .p2align 3
>         call    __stack_chk_fail

i got this:

	.file	""
	.text
.globl foo
	.type	foo, @function
foo:
.LFB2:
	pushq	%rbp
.LCFI0:
	movq	%rsp, %rbp
.LCFI1:
	subq	$208, %rsp
.LCFI2:
	movq	__stack_chk_guard(%rip), %rax
	movq	%rax, -8(%rbp)
	xorl	%eax, %eax
	movl	$3, %eax
	movq	-8(%rbp), %rdx
	xorq	__stack_chk_guard(%rip), %rdx
	je	.L3
	call	__stack_chk_fail
.L3:
	leave
	ret

but that's F8's gcc 4.1, and not the kernel mode code generator either. 

the code you cited looks far better - that's good news!

one optimization would be to do a 'jne' straight into __stack_chk_fail() 
- it's not like we ever want to return. [and it's obvious from the 
existing stackframe which one the failing function was] That way we'd 
have about 3 bytes less per function? We dont want to return to the 
original function so for the kernel it would be OK.

another potential optimization would be to exchange this:

>         subq    $88, %rsp
>         movq    %fs:40, %rax
>         movq    %rax, 72(%rsp)

into:

	pushq	%fs:40
	subq	$80, %rsp

or am i missing something? (is there perhaps an address generation 
dependency between the pushq and the subq? Or the canary would be at the 
wrong position?)

> both with gcc 4.1.x and 4.3.0. BTW, you can use -fstack-protector 
> --param=ssp-buffer-size=4 etc. to tweak the size of buffers to trigger 
> stack protection, the default is 8, but e.g. whole Fedora is compiled 
> with 4.

ok. is -fstack-protector-all basically equivalent to 
--param=ssp-buffer-size=0 ? I'm wondering whether it would be easy for 
gcc to completely skip stackprotector code on functions that have no 
buffers, even under -fstack-protector-all. (perhaps it already does?)

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [x86.git#mm] stack protector fixes, vmsplice exploit
  2008-02-14 21:43           ` pageexec
@ 2008-02-14 23:19             ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2008-02-14 23:19 UTC (permalink / raw)
  To: pageexec
  Cc: Jakub Jelinek, Sam Ravnborg, Arjan van de Ven, linux-kernel,
	torvalds, Thomas Gleixner, H. Peter Anvin


* pageexec@freemail.hu <pageexec@freemail.hu> wrote:

> On 14 Feb 2008 at 17:35, Jakub Jelinek wrote:
> 
> > Where do you see a mispredicted branch?
> 
> try it with -Os (gentoo gcc 4.2.2):
> 
> 0000000000000000 <foo>:
>    0:   48 83 ec 58             sub    $0x58,%rsp
>    4:   65 48 8b 04 25 28 00 00 00      mov    %gs:0x28,%rax
>    d:   48 89 44 24 48          mov    %rax,0x48(%rsp)
>   12:   31 c0                   xor    %eax,%eax
>   14:   48 89 e7                mov    %rsp,%rdi
>   17:   e8 00 00 00 00          callq  1c <foo+0x1c>    18: R_X86_64_PC32       
> bar+0xfffffffffffffffc
>   1c:   48 8b 54 24 48          mov    0x48(%rsp),%rdx
>   21:   65 48 33 14 25 28 00 00 00      xor    %gs:0x28,%rdx
>   2a:   b8 06 00 00 00          mov    $0x6,%eax
>   2f:   74 05                   je     36 <foo+0x36>
>   31:   e8 00 00 00 00          callq  36 <foo+0x36>    32: R_X86_64_PC32       
> __stack_chk_fail+0xfffffffffffffffc
>   36:   48 83 c4 58             add    $0x58,%rsp
>   3a:   c3                      retq

ah, that's what my kernel uses too - and CONFIG_CC_OPTIMIZE_FOR_SIZE=y 
is the default for Fedora too.

hm, especially those big addressing mode mov's and xor's look rather 
nasty.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-02-14 23:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-14 17:00 [x86.git#mm] stack protector fixes, vmsplice exploit Ingo Molnar
2008-02-14 17:16 ` pageexec
2008-02-14 19:00   ` Ingo Molnar
2008-02-14 18:55     ` pageexec
2008-02-14 20:25       ` Ingo Molnar
2008-02-14 21:00         ` Arjan van de Ven
2008-02-14 21:12         ` pageexec
2008-02-14 22:35         ` Jakub Jelinek
2008-02-14 21:43           ` pageexec
2008-02-14 23:19             ` Ingo Molnar
2008-02-14 23:16           ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).