LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls
@ 2008-02-20 13:57 Arne Georg Gleditsch
  2008-02-20 18:08 ` Thomas Gleixner
  2008-02-21 15:58 ` Andi Kleen
  0 siblings, 2 replies; 10+ messages in thread
From: Arne Georg Gleditsch @ 2008-02-20 13:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: John Stultz, Ingo Molnar, Thomas Gleixner, Andi Kleen, Roman Zippel

Hi,

I'm looking at 2.6.25-rc2.  vsyscall_sysctl_change contains code to NOP
out the actual system call instructions of the vsyscall page when
vsyscall64 is enabled.  This seems to interact badly with the fallback
code in do_vgettimeofday which tries to call gettimeofday if the
configured clock source does not support vread.  (In effect,
gettimeofday() becomes a nop and time() always returns 0.  Not very
useful.)

Is there a good reason to keep this?  Aren't the instructions in
question avoided (or invoked) according to the vsyscall64 flag by the
surrounding logic anyway?

-- 
								Arne.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls
  2008-02-20 13:57 arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls Arne Georg Gleditsch
@ 2008-02-20 18:08 ` Thomas Gleixner
  2008-02-21 15:58 ` Andi Kleen
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2008-02-20 18:08 UTC (permalink / raw)
  To: Arne Georg Gleditsch; +Cc: LKML, Roman Zippel, Ingo Molnar, H. Peter Anvin

Arne,

On Wed, 20 Feb 2008, Arne Georg Gleditsch wrote:
> I'm looking at 2.6.25-rc2.  vsyscall_sysctl_change contains code to NOP
> out the actual system call instructions of the vsyscall page when
> vsyscall64 is enabled.  This seems to interact badly with the fallback
> code in do_vgettimeofday which tries to call gettimeofday if the
> configured clock source does not support vread.  (In effect,
> gettimeofday() becomes a nop and time() always returns 0.  Not very
> useful.)

Indeed.
 
> Is there a good reason to keep this?  Aren't the instructions in
> question avoided (or invoked) according to the vsyscall64 flag by the
> surrounding logic anyway?

The initial intent of this was to make it harder for malicious code to
attack via the vsysall area.

But you are right, the code is indeed fundamentally unsafe in various
aspects:

1) the patching code runs without synchronizing other CPUs

2) it inserts NOPs even if there is no clock source which provides
vread

3) when the clock source changes to one without vread we run in
exactly the same problem as in #2

The correct solution is randomizing that area, but that's definitely
not an ad hoc fix.

Thanks for pointing this out. I'm looking into fixing this ASAP.

       tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls
  2008-02-20 13:57 arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls Arne Georg Gleditsch
  2008-02-20 18:08 ` Thomas Gleixner
@ 2008-02-21 15:58 ` Andi Kleen
  2008-02-21 19:45   ` Thomas Gleixner
  2008-02-22 10:38   ` Arne Georg Gleditsch
  1 sibling, 2 replies; 10+ messages in thread
From: Andi Kleen @ 2008-02-21 15:58 UTC (permalink / raw)
  To: Arne Georg Gleditsch
  Cc: linux-kernel, John Stultz, Ingo Molnar, Thomas Gleixner, Roman Zippel

On Wed, Feb 20, 2008 at 02:57:34PM +0100, Arne Georg Gleditsch wrote:
> Hi,
> 
> I'm looking at 2.6.25-rc2.  vsyscall_sysctl_change contains code to NOP
> out the actual system call instructions of the vsyscall page when
> vsyscall64 is enabled.  This seems to interact badly with the fallback
> code in do_vgettimeofday which tries to call gettimeofday if the
> configured clock source does not support vread.  (In effect,
> gettimeofday() becomes a nop and time() always returns 0.  Not very
> useful.)
> 
> Is there a good reason to keep this?  Aren't the instructions in
> question avoided (or invoked) according to the vsyscall64 flag by the
> surrounding logic anyway?

Yes they are.  But a system call sequence at a known fixed address
is potentially useful to exploits. That is why it is nop'ed out when
it is not needed.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls
  2008-02-21 15:58 ` Andi Kleen
@ 2008-02-21 19:45   ` Thomas Gleixner
  2008-02-21 20:02     ` john stultz
  2008-02-22 12:49     ` Andi Kleen
  2008-02-22 10:38   ` Arne Georg Gleditsch
  1 sibling, 2 replies; 10+ messages in thread
From: Thomas Gleixner @ 2008-02-21 19:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arne Georg Gleditsch, linux-kernel, John Stultz, Ingo Molnar,
	Roman Zippel

On Thu, 21 Feb 2008, Andi Kleen wrote:

> On Wed, Feb 20, 2008 at 02:57:34PM +0100, Arne Georg Gleditsch wrote:
> > Hi,
> > 
> > I'm looking at 2.6.25-rc2.  vsyscall_sysctl_change contains code to NOP
> > out the actual system call instructions of the vsyscall page when
> > vsyscall64 is enabled.  This seems to interact badly with the fallback
> > code in do_vgettimeofday which tries to call gettimeofday if the
> > configured clock source does not support vread.  (In effect,
> > gettimeofday() becomes a nop and time() always returns 0.  Not very
> > useful.)
> > 
> > Is there a good reason to keep this?  Aren't the instructions in
> > question avoided (or invoked) according to the vsyscall64 flag by the
> > surrounding logic anyway?
> 
> Yes they are.  But a system call sequence at a known fixed address
> is potentially useful to exploits. That is why it is nop'ed out when
> it is not needed.

That's a nice intent, but the reality is that this code is broken as
hell:

1) the patching code runs without synchronizing other CPUs

2) it inserts NOPs even if there is no clock source which provides
vread

3) when the clock source changes to one without vread we run in
exactly the same problem as in #2

4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then
the syscall is not patched out contrary to your claim.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls
  2008-02-21 19:45   ` Thomas Gleixner
@ 2008-02-21 20:02     ` john stultz
  2008-02-21 20:47       ` Thomas Gleixner
  2008-02-22 12:49     ` Andi Kleen
  1 sibling, 1 reply; 10+ messages in thread
From: john stultz @ 2008-02-21 20:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Arne Georg Gleditsch, linux-kernel, Ingo Molnar,
	Roman Zippel


On Thu, 2008-02-21 at 20:45 +0100, Thomas Gleixner wrote:
> On Thu, 21 Feb 2008, Andi Kleen wrote:
> 
> > On Wed, Feb 20, 2008 at 02:57:34PM +0100, Arne Georg Gleditsch wrote:
> > > Hi,
> > > 
> > > I'm looking at 2.6.25-rc2.  vsyscall_sysctl_change contains code to NOP
> > > out the actual system call instructions of the vsyscall page when
> > > vsyscall64 is enabled.  This seems to interact badly with the fallback
> > > code in do_vgettimeofday which tries to call gettimeofday if the
> > > configured clock source does not support vread.  (In effect,
> > > gettimeofday() becomes a nop and time() always returns 0.  Not very
> > > useful.)
> > > 
> > > Is there a good reason to keep this?  Aren't the instructions in
> > > question avoided (or invoked) according to the vsyscall64 flag by the
> > > surrounding logic anyway?
> > 
> > Yes they are.  But a system call sequence at a known fixed address
> > is potentially useful to exploits. That is why it is nop'ed out when
> > it is not needed.
> 
> That's a nice intent, but the reality is that this code is broken as
> hell:
> 
> 1) the patching code runs without synchronizing other CPUs
> 
> 2) it inserts NOPs even if there is no clock source which provides
> vread
> 
> 3) when the clock source changes to one without vread we run in
> exactly the same problem as in #2

I've not looked deeply here, but it seems to resolve #2 and #3, we need
some way for glibc to know if it should jump to the vsyscall or make the
syscall itself.

If it always jumps to the vsyscall address, it seems we have to have
some way of falling back to syscall inside the vsyscall.

That or we need to do the NOP/un-NOP part in the update_vsyscall code
dependent on if the current clocksource supports vread, instead of on
the /proc entry access.

-john




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls
  2008-02-21 20:02     ` john stultz
@ 2008-02-21 20:47       ` Thomas Gleixner
  2008-02-21 20:59         ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2008-02-21 20:47 UTC (permalink / raw)
  To: john stultz
  Cc: Andi Kleen, Arne Georg Gleditsch, LKML, Ingo Molnar,
	Roman Zippel, H. Peter Anvin, Arjan van de Ven

On Thu, 21 Feb 2008, john stultz wrote:
> > > Yes they are.  But a system call sequence at a known fixed address
> > > is potentially useful to exploits. That is why it is nop'ed out when
> > > it is not needed.
> > 
> > That's a nice intent, but the reality is that this code is broken as
> > hell:
> > 
> > 1) the patching code runs without synchronizing other CPUs
> > 
> > 2) it inserts NOPs even if there is no clock source which provides
> > vread
> > 
> > 3) when the clock source changes to one without vread we run in
> > exactly the same problem as in #2
> 
> I've not looked deeply here, but it seems to resolve #2 and #3, we need
> some way for glibc to know if it should jump to the vsyscall or make the
> syscall itself.

Does glibc have an existing fallback, when we return -ENOSYS (or
whatever) from the vsyscall path ? If not, we can not do this.

> If it always jumps to the vsyscall address, it seems we have to have
> some way of falling back to syscall inside the vsyscall.
> 
> That or we need to do the NOP/un-NOP part in the update_vsyscall code
> dependent on if the current clocksource supports vread, instead of on
> the /proc entry access.

That won't fly. We need to sychronize the CPUs when we patch the code,
which is not possible from update_wall_time with interrupts disabled.

I fear we have to kill the broken code for .25 and implement vdso
randomizing to fix that. Another short term solution to prevent
attacks via the syscall instruction would be to sanity check syscalls
when the call originates from the vsyscall address range, but that
might be frowned upon as it adds two assembler instructions into the
syscall hotpath :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls
  2008-02-21 20:47       ` Thomas Gleixner
@ 2008-02-21 20:59         ` Thomas Gleixner
  2008-02-22 12:55           ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2008-02-21 20:59 UTC (permalink / raw)
  To: john stultz
  Cc: Andi Kleen, Arne Georg Gleditsch, LKML, Ingo Molnar,
	Roman Zippel, H. Peter Anvin, Arjan van de Ven

On Thu, 21 Feb 2008, Thomas Gleixner wrote:
> > That or we need to do the NOP/un-NOP part in the update_vsyscall code
> > dependent on if the current clocksource supports vread, instead of on
> > the /proc entry access.
> 
> That won't fly. We need to sychronize the CPUs when we patch the code,
> which is not possible from update_wall_time with interrupts disabled.

Also this is utterly stupid as we keep the syscall in cases where we
do not have vread anyway, so we keep the attack point open for a lot
of existing machines due to TSC wreckage and HPET unavailability.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls
  2008-02-21 15:58 ` Andi Kleen
  2008-02-21 19:45   ` Thomas Gleixner
@ 2008-02-22 10:38   ` Arne Georg Gleditsch
  1 sibling, 0 replies; 10+ messages in thread
From: Arne Georg Gleditsch @ 2008-02-22 10:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, John Stultz, Ingo Molnar, Thomas Gleixner, Roman Zippel

Andi Kleen <ak@muc.de> writes:
> On Wed, Feb 20, 2008 at 02:57:34PM +0100, Arne Georg Gleditsch wrote:
>> Hi,
>> 
>> I'm looking at 2.6.25-rc2.  vsyscall_sysctl_change contains code to NOP
>> out the actual system call instructions of the vsyscall page when
>> vsyscall64 is enabled.  This seems to interact badly with the fallback
>> code in do_vgettimeofday which tries to call gettimeofday if the
>> configured clock source does not support vread.  (In effect,
>> gettimeofday() becomes a nop and time() always returns 0.  Not very
>> useful.)
>> 
>> Is there a good reason to keep this?  Aren't the instructions in
>> question avoided (or invoked) according to the vsyscall64 flag by the
>> surrounding logic anyway?
>
> Yes they are.  But a system call sequence at a known fixed address
> is potentially useful to exploits. That is why it is nop'ed out when
> it is not needed.

Reasonable enough, as long as it can be determined to be not needed.
Still, isn't the __vsyscall_gtod_data structure part of the same page?
Wouldn't that give you access to any 2-byte opcode you want every 64k
seconds?  You'd need to time your attack, of course, but that could be
done prior to actually launching the exploit...

-- 
								Arne.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls
  2008-02-21 19:45   ` Thomas Gleixner
  2008-02-21 20:02     ` john stultz
@ 2008-02-22 12:49     ` Andi Kleen
  1 sibling, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2008-02-22 12:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Arne Georg Gleditsch, linux-kernel, John Stultz, Ingo Molnar,
	Roman Zippel

On Thu, Feb 21, 2008 at 08:45:53PM +0100, Thomas Gleixner wrote:
> On Thu, 21 Feb 2008, Andi Kleen wrote:
> 
> > On Wed, Feb 20, 2008 at 02:57:34PM +0100, Arne Georg Gleditsch wrote:
> > > Hi,
> > > 
> > > I'm looking at 2.6.25-rc2.  vsyscall_sysctl_change contains code to NOP
> > > out the actual system call instructions of the vsyscall page when
> > > vsyscall64 is enabled.  This seems to interact badly with the fallback
> > > code in do_vgettimeofday which tries to call gettimeofday if the
> > > configured clock source does not support vread.  (In effect,
> > > gettimeofday() becomes a nop and time() always returns 0.  Not very
> > > useful.)
> > > 
> > > Is there a good reason to keep this?  Aren't the instructions in
> > > question avoided (or invoked) according to the vsyscall64 flag by the
> > > surrounding logic anyway?
> > 
> > Yes they are.  But a system call sequence at a known fixed address
> > is potentially useful to exploits. That is why it is nop'ed out when
> > it is not needed.
> 
> That's a nice intent, but the reality is that this code is broken as
> hell:

Well it worked when I wrote it, but it's quite possible it didn't survive
the clocksource conversion completely.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls
  2008-02-21 20:59         ` Thomas Gleixner
@ 2008-02-22 12:55           ` Andi Kleen
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2008-02-22 12:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: john stultz, Arne Georg Gleditsch, LKML, Ingo Molnar,
	Roman Zippel, H. Peter Anvin, Arjan van de Ven

On Thu, Feb 21, 2008 at 09:59:24PM +0100, Thomas Gleixner wrote:
> On Thu, 21 Feb 2008, Thomas Gleixner wrote:
> > > That or we need to do the NOP/un-NOP part in the update_vsyscall code
> > > dependent on if the current clocksource supports vread, instead of on
> > > the /proc entry access.
> > 
> > That won't fly. We need to sychronize the CPUs when we patch the code,
> > which is not possible from update_wall_time with interrupts disabled.
> 
> Also this is utterly stupid as we keep the syscall in cases where we
> do not have vread anyway, so we keep the attack point open for a lot
> of existing machines due to TSC wreckage and HPET unavailability.

Yes that is true, but I didn't find a cheap way around it without
breaking binary compatibility.

I found an expensive way (essentially just putting
a trampoline to a variable mapped vsyscall page on the old static
address), but since it would have added quite a lot of complexity
(vsyscall is inside the kernel mapping and would need to be rewritten
at context switch) and memory overhead in page tables and one more
page per process I didn't do that.

I also considered doing boot time randomization only (which
would avoid a lot of these problems), but it didn't seem worth it.

Also one must say I don't consider it a big security improvement on most
systems.  That is because most programs are not PIC, but linked at a fixed
address and usually if you grep the larger binary for the few bytes
needed for a system call you'll find it at some known offset as 
part of another instruction sequence.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-02-22 12:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-20 13:57 arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls Arne Georg Gleditsch
2008-02-20 18:08 ` Thomas Gleixner
2008-02-21 15:58 ` Andi Kleen
2008-02-21 19:45   ` Thomas Gleixner
2008-02-21 20:02     ` john stultz
2008-02-21 20:47       ` Thomas Gleixner
2008-02-21 20:59         ` Thomas Gleixner
2008-02-22 12:55           ` Andi Kleen
2008-02-22 12:49     ` Andi Kleen
2008-02-22 10:38   ` Arne Georg Gleditsch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).