LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [x86.git#mm] stack protector fixes, vmsplice exploit @ 2008-02-14 17:00 Ingo Molnar 2008-02-14 17:16 ` pageexec 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2008-02-14 17:00 UTC (permalink / raw) To: pageexec Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin update: latest x86.git#mm has a pretty much working stack-protector feature - you can pick it up for testing via: http://people.redhat.com/mingo/x86.git/README as pageexec@freemail.hu has indicated it already in his analysis and patch, there were multiple bugs hitting us here. The amount and scope of these problems show structural problems in how security in this area was approached. So with these changes we try to go deeper than just minimally fixing the feature. We've got 15 changes so far in and around this area: x86: fix execve with -fstack-protect x86: exclude vsyscall files from stackprotect x86: fix stackprotect Makefile rule x86: fix stackprotector canary updates during context switches panic: print more informative messages on stackprotect failure panic: print out stacktrace if DEBUG_BUGVERBOSE x86: enable stack-protector by default x86: setup stack canary for the idle threads x86: fix canary of the boot CPU's idle task stackprotector: include files stackprotector: add boot_init_stack_canary() x86: fix the stackprotector canary of the boot CPU x86: stackprotector: mix TSC to the boot canary x86: test the presence of the stackprotector x86: streamline stackprotector but we've not completed this work yet. We'll push the independent bits to Linus ASAP. Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [x86.git#mm] stack protector fixes, vmsplice exploit 2008-02-14 17:00 [x86.git#mm] stack protector fixes, vmsplice exploit Ingo Molnar @ 2008-02-14 17:16 ` pageexec 2008-02-14 19:00 ` Ingo Molnar 0 siblings, 1 reply; 11+ messages in thread From: pageexec @ 2008-02-14 17:16 UTC (permalink / raw) To: Ingo Molnar Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin On 14 Feb 2008 at 18:00, Ingo Molnar wrote: some comments: > x86: fix execve with -fstack-protect the commit comment says: > This hack was true up to the point the stackprotector added > another word to the stack frame. Shifting all the addresses > by 8 bytes, crashing and burning any exec attempt. actually, that's not the problem here because the canary is in the local variable area, it doesn't affect the passed arguments. what happens here is that gcc treats the argument area as owned by the callee, not the caller and is allowed to do certain tricks. for ssp it will make a copy of the struct passed by value into the local variable area and pass *its* address down, and it won't copy it back into the original instance stored in the argument area. so once sys_execve returns, the pt_regs passed by value hasn't at all changed and its default content will cause a nice double fault (FWIW, this part took me the longest to debug, being down with cold didn't help it either ;). > x86: fix stackprotector canary updates during context switches the commit says: > Note: this means that we call __switch_to() [and its sub-functions] > still with the old canary, but that is not a problem, both the previous > and the next task has a high-quality canary. The only (mostly academic) > disadvantage is that the canary of one task may leak onto the stack of > another task, increasing the risk of information leaks, were an attacker > able to read the stack of specific tasks (but not that of others). the best practical defense against leaking the canary is to change its value on every syscall but it has some performance impact in microbenchmarks. > x86: stackprotector: mix TSC to the boot canary this should probably be implemented in a general get_random_long() function... > x86: test the presence of the stackprotector > x86: streamline stackprotector the config comment says -fstack-protector whereas you're really enabling the stonger -fstack-protector-all feature (without the latter the ssp test function sample_stackprotector_fn wouldn't even be instrumented ;-). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [x86.git#mm] stack protector fixes, vmsplice exploit 2008-02-14 17:16 ` pageexec @ 2008-02-14 19:00 ` Ingo Molnar 2008-02-14 18:55 ` pageexec 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2008-02-14 19:00 UTC (permalink / raw) To: pageexec Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin * pageexec@freemail.hu <pageexec@freemail.hu> wrote: > the commit comment says: > > > This hack was true up to the point the stackprotector added another > > word to the stack frame. Shifting all the addresses by 8 bytes, > > crashing and burning any exec attempt. > > actually, that's not the problem here because the canary is in the > local variable area, it doesn't affect the passed arguments. what > happens here is that gcc treats the argument area as owned by the > callee, not the caller and is allowed to do certain tricks. for ssp it > will make a copy of the struct passed by value into the local variable > area and pass *its* address down, and it won't copy it back into the > original instance stored in the argument area. > > so once sys_execve returns, the pt_regs passed by value hasn't at all > changed and its default content will cause a nice double fault (FWIW, > this part took me the longest to debug, being down with cold didn't > help it either ;). heh, indeed :-) I saw the double fault and was happy that you showed how to fix it (it did not seem obvious to me _at all_, and i've seen my fair share of fixes) and i quickly lured myself into having understood it :-/ I've now updated the commit message with your (correct) description. > > Note: this means that we call __switch_to() [and its sub-functions] > > still with the old canary, but that is not a problem, both the > > previous and the next task has a high-quality canary. The only > > (mostly academic) disadvantage is that the canary of one task may > > leak onto the stack of another task, increasing the risk of > > information leaks, were an attacker able to read the stack of > > specific tasks (but not that of others). > > the best practical defense against leaking the canary is to change its > value on every syscall but it has some performance impact in > microbenchmarks. yeah, that's not really practical (especially as it would deplete our entropy pool pretty fast would could in some circumstances introduce a higher risk to the system than the risk of a canary leaking). I think we can avoid the leakage across tasks by being careful during context-switch time: never calling with the old canary still in the PDA. I think this should be fairly easy as we'd just have to load the new pdaptr in the switch_to() assembly code. [ the only (even more academic) possibility here would be an NMI to hit during those two instructions that update RSP and load the new pdpptr, and leaving the old canary on the stack. But NMIs go to separate stack frames anyway on 64-bit so this isnt really a practical worry IMO. ] Hm? > > x86: stackprotector: mix TSC to the boot canary > > this should probably be implemented in a general get_random_long() > function... yeah ... this is the second time i could have used it recently. > > x86: test the presence of the stackprotector > > x86: streamline stackprotector > > the config comment says -fstack-protector whereas you're really > enabling the stonger -fstack-protector-all feature (without the latter > the ssp test function sample_stackprotector_fn wouldn't even be > instrumented ;-). yeah, indeed. I fixed the comments and flipped the commits around :-) TODO: perhaps all vsyscall functions need to move into separate .o files. (probably academic as the functions affected by that tend to be very simple and do not deal with anything overflowable.) Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [x86.git#mm] stack protector fixes, vmsplice exploit 2008-02-14 19:00 ` Ingo Molnar @ 2008-02-14 18:55 ` pageexec 2008-02-14 20:25 ` Ingo Molnar 0 siblings, 1 reply; 11+ messages in thread From: pageexec @ 2008-02-14 18:55 UTC (permalink / raw) To: Ingo Molnar Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin On 14 Feb 2008 at 20:00, Ingo Molnar wrote: > > the best practical defense against leaking the canary is to change its > > value on every syscall but it has some performance impact in > > microbenchmarks. > > yeah, that's not really practical (especially as it would deplete our > entropy pool pretty fast would could in some circumstances introduce a > higher risk to the system than the risk of a canary leaking). you don't necessarily have to use the heavy-handed ip id code as it is used now, random32 is plenty good here. > I think we can avoid the leakage across tasks by being careful during > context-switch time: never calling with the old canary still in the PDA. > I think this should be fairly easy as we'd just have to load the new > pdaptr in the switch_to() assembly code. i don't think you have to worry about cross-task leaking at all, a hypothetical exploit is happy to learn its own canary and that's actually easier than learning some other task's canary by virtue of bugs that leak uninitialized struct padding and stuff... really, the best defense is to reduce the useful lifetime of any leaked canary, and you can't get better than syscall granularity without disproportional effort and impact elsewhere (and i'm sure some would find even this disproportional ;). > TODO: perhaps all vsyscall functions need to move into separate .o > files. (probably academic as the functions affected by that tend to be > very simple and do not deal with anything overflowable.) yeah, i wasn't suggesting it for its security value, more like for 'full coverage'. if such a separation isn't useful otherwise (no idea if not having the .vsyscall* code along with related kernel code would be confusing for the reader for example), i'd say this isn't important. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [x86.git#mm] stack protector fixes, vmsplice exploit 2008-02-14 18:55 ` pageexec @ 2008-02-14 20:25 ` Ingo Molnar 2008-02-14 21:00 ` Arjan van de Ven ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Ingo Molnar @ 2008-02-14 20:25 UTC (permalink / raw) To: pageexec Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin * pageexec@freemail.hu <pageexec@freemail.hu> wrote: > really, the best defense is to reduce the useful lifetime of any > leaked canary, and you can't get better than syscall granularity > without disproportional effort and impact elsewhere (and i'm sure some > would find even this disproportional ;). hm, i think per syscall canaries are really expensive. The per function call overhead from stackprotector is already pretty serious IMO, but at least that's something that GCC _could_ be doing (much) smarter (why doesnt it jne forward out to __check_stk_failure, instead of generating 4 instructions, one of them a default-mispredicted branch instruction??), so that overhead could in theory be something like 4 fall-through instructions per function, instead of the current 6. Also, even with -fstack-protector-all, gcc could detect the _provably_ correct and overflow-safe functions and skip their annotation altogether. And those are the ones that hurt most: the small and straightforward (and also, most of the time, overflow-free) ones. The heuristics for -fstack-protector were really botched, as you noted they happened to mark vmsplice as "safe". So either we do the full thing or nothing - anything inbetween is just russian roulette and false sense of security. But per syscall canary randomization is something that would be a drag on our null syscall latency forever. I'm really unsure about it. It will also be driven in the wrong direction: people would try to "optimize" the per syscall random number generation which is a constant erosive force on its strength and there's no strong counter-force. (and no clear boundary either 'random32 should be plenty enough' is too vague as limit.) So lets perhaps also analyze the security effects of not having per system call canaries and make the best out of it: the main threat is information leaks via padding and similar initialization mistakes [which have almost zero accidental discovery rate], right? So could we perhaps somehow turn kernel-space information leaks into functional, and hence detectable failures? Wouldnt for example kmemcheck notice them, once they are copied to user-space? [ If you havent seen kmemcheck yet i think you'll love it, it's a similarly neat trick as the pte noexec trick you did in the PaX kernel. It's similarly disgusting as well ;-) ] > > TODO: perhaps all vsyscall functions need to move into separate .o > > files. (probably academic as the functions affected by that tend to > > be very simple and do not deal with anything overflowable.) > > yeah, i wasn't suggesting it for its security value, more like for > 'full coverage'. if such a separation isn't useful otherwise (no idea > if not having the .vsyscall* code along with related kernel code would > be confusing for the reader for example), i'd say this isn't > important. perhaps it would be more useful to have some "no stack protector" gcc attribute. We could then stick that into the __vsyscall definition and it's all done and self-maintaining from that point on. (because a missing __vsyscall annotation results in a runtime failure.) (I'd fully expect GCC to not have provided such an attribute.) but yes, i agree in general that it's easier to maintain something if it's full coverage, that way one does not have to keep in mind (and cannot accidentally forget ;-) the exceptions. Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [x86.git#mm] stack protector fixes, vmsplice exploit 2008-02-14 20:25 ` Ingo Molnar @ 2008-02-14 21:00 ` Arjan van de Ven 2008-02-14 21:12 ` pageexec 2008-02-14 22:35 ` Jakub Jelinek 2 siblings, 0 replies; 11+ messages in thread From: Arjan van de Ven @ 2008-02-14 21:00 UTC (permalink / raw) To: Ingo Molnar Cc: pageexec, Sam Ravnborg, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin On Thu, 14 Feb 2008 21:25:35 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > * pageexec@freemail.hu <pageexec@freemail.hu> wrote: > > > really, the best defense is to reduce the useful lifetime of any > > leaked canary, and you can't get better than syscall granularity > > without disproportional effort and impact elsewhere (and i'm sure > > some would find even this disproportional ;). > > hm, i think per syscall canaries are really expensive. it's not that bad. Assuming you use a PNR that you re-seed periodically, it's * go to the next random number with PNR * write to PDA and task struct give or take 10 cycles total if you squeeze it hard, 20 if you don't. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [x86.git#mm] stack protector fixes, vmsplice exploit 2008-02-14 20:25 ` Ingo Molnar 2008-02-14 21:00 ` Arjan van de Ven @ 2008-02-14 21:12 ` pageexec 2008-02-14 22:35 ` Jakub Jelinek 2 siblings, 0 replies; 11+ messages in thread From: pageexec @ 2008-02-14 21:12 UTC (permalink / raw) To: Ingo Molnar Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin On 14 Feb 2008 at 21:25, Ingo Molnar wrote: > * pageexec@freemail.hu <pageexec@freemail.hu> wrote: > > > really, the best defense is to reduce the useful lifetime of any > > leaked canary, and you can't get better than syscall granularity > > without disproportional effort and impact elsewhere (and i'm sure some > > would find even this disproportional ;). > > hm, i think per syscall canaries are really expensive. it depends on how it's implemented and what expensive is. i'd say offering this as a config option would make sense for those who want to turn ssp from a debugging feature (which it is at the moment) into a security one (i know i will do it once i port the kernel stack randomization to amd64 in PaX, it's the same code path). speaking of that, for years PaX has had a feature to randomize the kernel stack ptr on each kernel->user transition based on rdtsc (using only a few bits of it obviously). its performance impact is really hard to measure on anything (i mean, macrobenchmarks) and frankly, people blindly enable it without having second thoughts about its impact because it just doesn't matter. as Arjan said, this ssp canary randomization could be done cheaply from a PRNG, or heck, even just a simple ror by a random amount from rdtsc on it will raise the bar enough (and the PRNG state is subject to info leaking itself whereas rdtsc isn't ;). in short, i suggest that this per-syscall canary be offered as a config option at least and use something cheap and not particularly strong (in crypto terms) just to raise the bar enough that no attacker would accept the risk of panic'ing the box should the opportunity arise. > The per function call overhead from stackprotector is already pretty > serious IMO, it's a mov/mov/.../mov/xor/jz on the normal path basically, it's a few cycles except for the forward jz but that should be easy to fix in gcc. with that said, this whole ssp business should have been changed into return address verification a long time ago (i.e., use the saved return address as the canary itself), that would detect both stack overflows and prevent ret2libc style attacks for practical purposes. > But per syscall canary randomization is something that would be a drag > on our null syscall latency forever. I'm really unsure about it. It will > also be driven in the wrong direction: people would try to "optimize" > the per syscall random number generation which is a constant erosive > force on its strength and there's no strong counter-force. (and no clear > boundary either 'random32 should be plenty enough' is too vague as > limit.) i think you're getting lost in the details a bit here. the purpose of changing the canary is to create a guaranteed minimum risk for an attacker to abuse a leaked canary. obviously a constant (per task) canary means 0 risk for an attacker, so we can get only better than that. how much better we want to get is mainly limited by what is considered cheap implementation-wise and 'good enough' from a risk assessment point of view. say, if you were to flip a random bit in the canary then a leaked canary would give an attacker 50% chance to succeed and 50% chance to panic the box. even that is already high (for an attacker) but you see where i'm getting at when you increase the randomly changed bits in the canary. in practice, there's no need to change all 32/64 bits, that's for crypto nuts only, but for real life attackers there's no difference between 99% and 99.99999% chances of panic'ing a box (or even a box farm, one alarm raised is an alarm that will draw attention regardless of how many other boxes panic'ed). that's why i keep saying that as simple a thing as using rdtsc/xor/ror/etc is 'plenty good'. i know it's not good for a cryptanalyst in a generic setting, but this is a special situation (if you fail to guess you don't get to try infinitely, you fail hard) and we can relax the crypto requirements. > So lets perhaps also analyze the security effects of not having per > system call canaries and make the best out of it: the main threat is > information leaks via padding and similar initialization mistakes [which > have almost zero accidental discovery rate], right? that was only an example for a bug class that could provide/leak such info, but obviously programmers are a lot more creative in accomplishing the same. the problem with kmemcheck is that it's a debugging aid, it obviously cannot be used in production. also it's not comprehensive when you have bugs that leak memory like the do_brk (or mremap, i forget) one did: a valid userland pte kept pointing to a free'd and later reused page. i'm not saying that one would use such a bug for canary leaking when it can be abused other ways a lot better, it's just to say that there're many more bug classes in the kernel than userland. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [x86.git#mm] stack protector fixes, vmsplice exploit 2008-02-14 20:25 ` Ingo Molnar 2008-02-14 21:00 ` Arjan van de Ven 2008-02-14 21:12 ` pageexec @ 2008-02-14 22:35 ` Jakub Jelinek 2008-02-14 21:43 ` pageexec 2008-02-14 23:16 ` Ingo Molnar 2 siblings, 2 replies; 11+ messages in thread From: Jakub Jelinek @ 2008-02-14 22:35 UTC (permalink / raw) To: Ingo Molnar Cc: pageexec, Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin On Thu, Feb 14, 2008 at 09:25:35PM +0100, Ingo Molnar wrote: > The per function call overhead from stackprotector is already pretty > serious IMO, but at least that's something that GCC _could_ be doing > (much) smarter (why doesnt it jne forward out to __check_stk_failure, > instead of generating 4 instructions, one of them a default-mispredicted > branch instruction??), so that overhead could in theory be something > like 4 fall-through instructions per function, instead of the current 6. Where do you see a mispredicted branch? int foo (void) { char buf[64]; bar (buf); return 6; } -O2 -fstack-protector -m64: subq $88, %rsp movq %fs:40, %rax movq %rax, 72(%rsp) xorl %eax, %eax movq %rsp, %rdi call bar movq 72(%rsp), %rdx xorq %fs:40, %rdx movl $6, %eax jne .L5 addq $88, %rsp ret .L5: .p2align 4,,6 .p2align 3 call __stack_chk_fail -O2 -fstack-protector -m32: pushl %ebp movl %esp, %ebp subl $88, %esp movl %gs:20, %eax movl %eax, -4(%ebp) xorl %eax, %eax leal -68(%ebp), %eax movl %eax, (%esp) call bar movl $6, %eax movl -4(%ebp), %edx xorl %gs:20, %edx jne .L5 leave ret .L5: .p2align 4,,7 .p2align 3 call __stack_chk_fail -O2 -fstack-protector -m64 -mcmodel=kernel: subq $88, %rsp movq %gs:40, %rax movq %rax, 72(%rsp) xorl %eax, %eax movq %rsp, %rdi call bar movq 72(%rsp), %rdx xorq %gs:40, %rdx movl $6, %eax jne .L5 addq $88, %rsp ret .L5: .p2align 4,,6 .p2align 3 call __stack_chk_fail both with gcc 4.1.x and 4.3.0. BTW, you can use -fstack-protector --param=ssp-buffer-size=4 etc. to tweak the size of buffers to trigger stack protection, the default is 8, but e.g. whole Fedora is compiled with 4. Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [x86.git#mm] stack protector fixes, vmsplice exploit 2008-02-14 22:35 ` Jakub Jelinek @ 2008-02-14 21:43 ` pageexec 2008-02-14 23:19 ` Ingo Molnar 2008-02-14 23:16 ` Ingo Molnar 1 sibling, 1 reply; 11+ messages in thread From: pageexec @ 2008-02-14 21:43 UTC (permalink / raw) To: Ingo Molnar, Jakub Jelinek Cc: Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin On 14 Feb 2008 at 17:35, Jakub Jelinek wrote: > Where do you see a mispredicted branch? try it with -Os (gentoo gcc 4.2.2): 0000000000000000 <foo>: 0: 48 83 ec 58 sub $0x58,%rsp 4: 65 48 8b 04 25 28 00 00 00 mov %gs:0x28,%rax d: 48 89 44 24 48 mov %rax,0x48(%rsp) 12: 31 c0 xor %eax,%eax 14: 48 89 e7 mov %rsp,%rdi 17: e8 00 00 00 00 callq 1c <foo+0x1c> 18: R_X86_64_PC32 bar+0xfffffffffffffffc 1c: 48 8b 54 24 48 mov 0x48(%rsp),%rdx 21: 65 48 33 14 25 28 00 00 00 xor %gs:0x28,%rdx 2a: b8 06 00 00 00 mov $0x6,%eax 2f: 74 05 je 36 <foo+0x36> 31: e8 00 00 00 00 callq 36 <foo+0x36> 32: R_X86_64_PC32 __stack_chk_fail+0xfffffffffffffffc 36: 48 83 c4 58 add $0x58,%rsp 3a: c3 retq ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [x86.git#mm] stack protector fixes, vmsplice exploit 2008-02-14 21:43 ` pageexec @ 2008-02-14 23:19 ` Ingo Molnar 0 siblings, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2008-02-14 23:19 UTC (permalink / raw) To: pageexec Cc: Jakub Jelinek, Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin * pageexec@freemail.hu <pageexec@freemail.hu> wrote: > On 14 Feb 2008 at 17:35, Jakub Jelinek wrote: > > > Where do you see a mispredicted branch? > > try it with -Os (gentoo gcc 4.2.2): > > 0000000000000000 <foo>: > 0: 48 83 ec 58 sub $0x58,%rsp > 4: 65 48 8b 04 25 28 00 00 00 mov %gs:0x28,%rax > d: 48 89 44 24 48 mov %rax,0x48(%rsp) > 12: 31 c0 xor %eax,%eax > 14: 48 89 e7 mov %rsp,%rdi > 17: e8 00 00 00 00 callq 1c <foo+0x1c> 18: R_X86_64_PC32 > bar+0xfffffffffffffffc > 1c: 48 8b 54 24 48 mov 0x48(%rsp),%rdx > 21: 65 48 33 14 25 28 00 00 00 xor %gs:0x28,%rdx > 2a: b8 06 00 00 00 mov $0x6,%eax > 2f: 74 05 je 36 <foo+0x36> > 31: e8 00 00 00 00 callq 36 <foo+0x36> 32: R_X86_64_PC32 > __stack_chk_fail+0xfffffffffffffffc > 36: 48 83 c4 58 add $0x58,%rsp > 3a: c3 retq ah, that's what my kernel uses too - and CONFIG_CC_OPTIMIZE_FOR_SIZE=y is the default for Fedora too. hm, especially those big addressing mode mov's and xor's look rather nasty. Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [x86.git#mm] stack protector fixes, vmsplice exploit 2008-02-14 22:35 ` Jakub Jelinek 2008-02-14 21:43 ` pageexec @ 2008-02-14 23:16 ` Ingo Molnar 1 sibling, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2008-02-14 23:16 UTC (permalink / raw) To: Jakub Jelinek Cc: pageexec, Sam Ravnborg, Arjan van de Ven, linux-kernel, torvalds, Thomas Gleixner, H. Peter Anvin * Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, Feb 14, 2008 at 09:25:35PM +0100, Ingo Molnar wrote: > > The per function call overhead from stackprotector is already pretty > > serious IMO, but at least that's something that GCC _could_ be doing > > (much) smarter (why doesnt it jne forward out to __check_stk_failure, > > instead of generating 4 instructions, one of them a default-mispredicted > > branch instruction??), so that overhead could in theory be something > > like 4 fall-through instructions per function, instead of the current 6. > > Where do you see a mispredicted branch? ah! > int foo (void) > { > char buf[64]; > bar (buf); > return 6; > } > > -O2 -fstack-protector -m64: > subq $88, %rsp > movq %fs:40, %rax > movq %rax, 72(%rsp) > xorl %eax, %eax > movq %rsp, %rdi > call bar > movq 72(%rsp), %rdx > xorq %fs:40, %rdx > movl $6, %eax > jne .L5 > addq $88, %rsp > ret > .L5: > .p2align 4,,6 > .p2align 3 > call __stack_chk_fail i got this: .file "" .text .globl foo .type foo, @function foo: .LFB2: pushq %rbp .LCFI0: movq %rsp, %rbp .LCFI1: subq $208, %rsp .LCFI2: movq __stack_chk_guard(%rip), %rax movq %rax, -8(%rbp) xorl %eax, %eax movl $3, %eax movq -8(%rbp), %rdx xorq __stack_chk_guard(%rip), %rdx je .L3 call __stack_chk_fail .L3: leave ret but that's F8's gcc 4.1, and not the kernel mode code generator either. the code you cited looks far better - that's good news! one optimization would be to do a 'jne' straight into __stack_chk_fail() - it's not like we ever want to return. [and it's obvious from the existing stackframe which one the failing function was] That way we'd have about 3 bytes less per function? We dont want to return to the original function so for the kernel it would be OK. another potential optimization would be to exchange this: > subq $88, %rsp > movq %fs:40, %rax > movq %rax, 72(%rsp) into: pushq %fs:40 subq $80, %rsp or am i missing something? (is there perhaps an address generation dependency between the pushq and the subq? Or the canary would be at the wrong position?) > both with gcc 4.1.x and 4.3.0. BTW, you can use -fstack-protector > --param=ssp-buffer-size=4 etc. to tweak the size of buffers to trigger > stack protection, the default is 8, but e.g. whole Fedora is compiled > with 4. ok. is -fstack-protector-all basically equivalent to --param=ssp-buffer-size=0 ? I'm wondering whether it would be easy for gcc to completely skip stackprotector code on functions that have no buffers, even under -fstack-protector-all. (perhaps it already does?) Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-02-14 23:19 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-02-14 17:00 [x86.git#mm] stack protector fixes, vmsplice exploit Ingo Molnar 2008-02-14 17:16 ` pageexec 2008-02-14 19:00 ` Ingo Molnar 2008-02-14 18:55 ` pageexec 2008-02-14 20:25 ` Ingo Molnar 2008-02-14 21:00 ` Arjan van de Ven 2008-02-14 21:12 ` pageexec 2008-02-14 22:35 ` Jakub Jelinek 2008-02-14 21:43 ` pageexec 2008-02-14 23:19 ` Ingo Molnar 2008-02-14 23:16 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).