LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/2] ARM: set thread_info->syscall just before sys_* execution
@ 2015-01-11 14:32 Roman Pen
  2015-01-11 14:32 ` [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall Roman Pen
  2015-01-11 14:32 ` [PATCH 2/2] ARM: entry-common,ptrace: do not pass scno to syscall_trace_enter Roman Pen
  0 siblings, 2 replies; 28+ messages in thread
From: Roman Pen @ 2015-01-11 14:32 UTC (permalink / raw)
  Cc: Roman Pen, Russell King, Christoffer Dall, Stefano Stabellini,
	Sekhar Nori, Kees Cook, Andy Lutomirski, Eric Paris, Will Deacon,
	linux-arm-kernel, linux-kernel, Marc Zyngier, Catalin Marinas,
	stable

Hello.

It turned out to be that on ARM 'syscall_get_nr' call and
corresponding userspace proc access '/proc/*/syscall' always
return 0 instead of correct syscall number:

 # cat /proc/*/syscall
 0 0xffffffff 0x0 0x0 0x0 0x0 0x0 0xbea33cc0 0xb6f32f2c
 0 0x5 0x16e99a8 0x0 0x0 0x0 0xbeec03b4 0xbeec02a0 0xb6cc85e0
 0 0x3 0xbeee5d44 0xbeee5d40 0xbeee5d40 0x0 0x0 0xbeee5d3c 0xb6ef40ac
 0 0xffffffff 0xbed757f8 0x2 0x0 0x2 0xbed757f8 0xbed757e0 0xb6e4af2c
 [snip]

Where first digit should be not 0, but correct syscall number.

The first patch fixes this and the second one does minor tweaks.

Roman Pen (2):
  ARM: entry-common: fix forgotten set of thread_info->syscall
  ARM: entry-common,ptrace: do not pass scno to syscall_trace_enter

 arch/arm/kernel/asm-offsets.c  | 1 +
 arch/arm/kernel/entry-common.S | 2 +-
 arch/arm/kernel/ptrace.c       | 6 ++++--
 3 files changed, 6 insertions(+), 3 deletions(-)

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: stable@vger.kernel.org

-- 
2.1.3


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-11 14:32 [PATCH 0/2] ARM: set thread_info->syscall just before sys_* execution Roman Pen
@ 2015-01-11 14:32 ` Roman Pen
  2015-01-12 18:39   ` Will Deacon
  2015-01-11 14:32 ` [PATCH 2/2] ARM: entry-common,ptrace: do not pass scno to syscall_trace_enter Roman Pen
  1 sibling, 1 reply; 28+ messages in thread
From: Roman Pen @ 2015-01-11 14:32 UTC (permalink / raw)
  Cc: Roman Pen, Russell King, Marc Zyngier, Catalin Marinas,
	Christoffer Dall, Stefano Stabellini, Sekhar Nori,
	linux-arm-kernel, linux-kernel, stable

thread_info->syscall is used only for ptrace, but syscall number
is also used by syscall_get_nr and returned to userspace by the
following proc file access:

 $ cat /proc/self/syscall
 0 0x3 0xbe928bd8 0x1000 0x0 0xac9e0 0x3 0xbe928bb4 0xb6f5dfbc
 ^
The first number is the syscall number, currently it is zero.
Patch fixes this:

 $ cat /proc/self/syscall
 3 0x3 0xbefc7bd8 0x1000 0x0 0xac9e0 0x3 0xbefc7bb4 0xb6e82fbc
 ^
Right, read syscall

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
---
 arch/arm/kernel/asm-offsets.c  | 1 +
 arch/arm/kernel/entry-common.S | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 2d2d608..6911bad 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -70,6 +70,7 @@ int main(void)
   DEFINE(TI_CPU,		offsetof(struct thread_info, cpu));
   DEFINE(TI_CPU_DOMAIN,		offsetof(struct thread_info, cpu_domain));
   DEFINE(TI_CPU_SAVE,		offsetof(struct thread_info, cpu_context));
+  DEFINE(TI_SYSCALL,		offsetof(struct thread_info, syscall));
   DEFINE(TI_USED_CP,		offsetof(struct thread_info, used_cp));
   DEFINE(TI_TP_VALUE,		offsetof(struct thread_info, tp_value));
   DEFINE(TI_FPSTATE,		offsetof(struct thread_info, fpstate));
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index f8ccc21..89452ff 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -189,6 +189,7 @@ ENTRY(vector_swi)
 #endif
 
 local_restart:
+	str scno, [tsk, #TI_SYSCALL]		@ set syscall number
 	ldr	r10, [tsk, #TI_FLAGS]		@ check for syscall tracing
 	stmdb	sp!, {r4, r5}			@ push fifth and sixth args
 
-- 
2.1.3


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 2/2] ARM: entry-common,ptrace: do not pass scno to syscall_trace_enter
  2015-01-11 14:32 [PATCH 0/2] ARM: set thread_info->syscall just before sys_* execution Roman Pen
  2015-01-11 14:32 ` [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall Roman Pen
@ 2015-01-11 14:32 ` Roman Pen
  2015-01-13 20:08   ` Kees Cook
  1 sibling, 1 reply; 28+ messages in thread
From: Roman Pen @ 2015-01-11 14:32 UTC (permalink / raw)
  Cc: Roman Pen, Russell King, Christoffer Dall, Stefano Stabellini,
	Sekhar Nori, Kees Cook, Andy Lutomirski, Eric Paris, Will Deacon,
	linux-arm-kernel, linux-kernel

In previous patch current_thread_info()->syscall is set with
corresponding syscall number prior to further calls, thus there
is no any need to pass 'scno'.

Also, add explicit comment why do we have to reread 'scno' local
variable.

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm/kernel/entry-common.S | 1 -
 arch/arm/kernel/ptrace.c       | 6 ++++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 89452ff..3d12eb5 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -228,7 +228,6 @@ ENDPROC(vector_swi)
 	 * context switches, and waiting for our parent to respond.
 	 */
 __sys_trace:
-	mov	r1, scno
 	add	r0, sp, #S_OFF
 	bl	syscall_trace_enter
 
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index ef9119f..1238787 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -928,9 +928,9 @@ static void tracehook_report_syscall(struct pt_regs *regs,
 	regs->ARM_ip = ip;
 }
 
-asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
+asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 {
-	current_thread_info()->syscall = scno;
+	int scno = current_thread_info()->syscall;
 
 	/* Do the secure computing check first; failures should be fast. */
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
@@ -944,6 +944,8 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
 	if (test_thread_flag(TIF_SYSCALL_TRACE))
 		tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
 
+	/* Syscall can be aborted (-1 can be set) or even changed
+	 * by the tracer and subsequent PTRACE_SET_SYSCALL request */
 	scno = current_thread_info()->syscall;
 
 	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
-- 
2.1.3


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-11 14:32 ` [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall Roman Pen
@ 2015-01-12 18:39   ` Will Deacon
  2015-01-13  8:35     ` Roman Peniaev
  0 siblings, 1 reply; 28+ messages in thread
From: Will Deacon @ 2015-01-12 18:39 UTC (permalink / raw)
  To: Roman Pen
  Cc: Russell King, Stefano Stabellini, Marc Zyngier, Catalin Marinas,
	Sekhar Nori, linux-kernel, linux-arm-kernel, stable,
	Christoffer Dall

On Sun, Jan 11, 2015 at 02:32:30PM +0000, Roman Pen wrote:
> thread_info->syscall is used only for ptrace, but syscall number
> is also used by syscall_get_nr and returned to userspace by the
> following proc file access:
> 
>  $ cat /proc/self/syscall
>  0 0x3 0xbe928bd8 0x1000 0x0 0xac9e0 0x3 0xbe928bb4 0xb6f5dfbc
>  ^
> The first number is the syscall number, currently it is zero.
> Patch fixes this:
> 
>  $ cat /proc/self/syscall
>  3 0x3 0xbefc7bd8 0x1000 0x0 0xac9e0 0x3 0xbefc7bb4 0xb6e82fbc
>  ^
> Right, read syscall

Yes, it seems that despite requiring CONFIG_HAVE_ARCH_TRACEHOOK,
the /proc code requires syscall_get_nr to work regardless of
TIF_SYSCALL_TRACE.

> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Sekhar Nori <nsekhar@ti.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---
>  arch/arm/kernel/asm-offsets.c  | 1 +
>  arch/arm/kernel/entry-common.S | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index 2d2d608..6911bad 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -70,6 +70,7 @@ int main(void)
>    DEFINE(TI_CPU,		offsetof(struct thread_info, cpu));
>    DEFINE(TI_CPU_DOMAIN,		offsetof(struct thread_info, cpu_domain));
>    DEFINE(TI_CPU_SAVE,		offsetof(struct thread_info, cpu_context));
> +  DEFINE(TI_SYSCALL,		offsetof(struct thread_info, syscall));
>    DEFINE(TI_USED_CP,		offsetof(struct thread_info, used_cp));
>    DEFINE(TI_TP_VALUE,		offsetof(struct thread_info, tp_value));
>    DEFINE(TI_FPSTATE,		offsetof(struct thread_info, fpstate));
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index f8ccc21..89452ff 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -189,6 +189,7 @@ ENTRY(vector_swi)
>  #endif
>  
>  local_restart:
> +	str scno, [tsk, #TI_SYSCALL]		@ set syscall number
>  	ldr	r10, [tsk, #TI_FLAGS]		@ check for syscall tracing
>  	stmdb	sp!, {r4, r5}			@ push fifth and sixth args

Do we definitely want to update scno on syscall restarting?

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-12 18:39   ` Will Deacon
@ 2015-01-13  8:35     ` Roman Peniaev
  2015-01-14  2:23       ` Roman Peniaev
  2015-01-14 20:51       ` Kees Cook
  0 siblings, 2 replies; 28+ messages in thread
From: Roman Peniaev @ 2015-01-13  8:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King, Stefano Stabellini, Marc Zyngier, Catalin Marinas,
	Sekhar Nori, linux-kernel, linux-arm-kernel, stable,
	Christoffer Dall

On Tue, Jan 13, 2015 at 3:39 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sun, Jan 11, 2015 at 02:32:30PM +0000, Roman Pen wrote:
>> thread_info->syscall is used only for ptrace, but syscall number
>> is also used by syscall_get_nr and returned to userspace by the
>> following proc file access:
>>
>>  $ cat /proc/self/syscall
>>  0 0x3 0xbe928bd8 0x1000 0x0 0xac9e0 0x3 0xbe928bb4 0xb6f5dfbc
>>  ^
>> The first number is the syscall number, currently it is zero.
>> Patch fixes this:
>>
>>  $ cat /proc/self/syscall
>>  3 0x3 0xbefc7bd8 0x1000 0x0 0xac9e0 0x3 0xbefc7bb4 0xb6e82fbc
>>  ^
>> Right, read syscall
>
> Yes, it seems that despite requiring CONFIG_HAVE_ARCH_TRACEHOOK,
> the /proc code requires syscall_get_nr to work regardless of
> TIF_SYSCALL_TRACE.
>
>> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
>> Cc: Russell King <linux@arm.linux.org.uk>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> Cc: Sekhar Nori <nsekhar@ti.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: stable@vger.kernel.org
>> ---
>>  arch/arm/kernel/asm-offsets.c  | 1 +
>>  arch/arm/kernel/entry-common.S | 1 +
>>  2 files changed, 2 insertions(+)
>>
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index 2d2d608..6911bad 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -70,6 +70,7 @@ int main(void)
>>    DEFINE(TI_CPU,             offsetof(struct thread_info, cpu));
>>    DEFINE(TI_CPU_DOMAIN,              offsetof(struct thread_info, cpu_domain));
>>    DEFINE(TI_CPU_SAVE,                offsetof(struct thread_info, cpu_context));
>> +  DEFINE(TI_SYSCALL,         offsetof(struct thread_info, syscall));
>>    DEFINE(TI_USED_CP,         offsetof(struct thread_info, used_cp));
>>    DEFINE(TI_TP_VALUE,                offsetof(struct thread_info, tp_value));
>>    DEFINE(TI_FPSTATE,         offsetof(struct thread_info, fpstate));
>> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
>> index f8ccc21..89452ff 100644
>> --- a/arch/arm/kernel/entry-common.S
>> +++ b/arch/arm/kernel/entry-common.S
>> @@ -189,6 +189,7 @@ ENTRY(vector_swi)
>>  #endif
>>
>>  local_restart:
>> +     str scno, [tsk, #TI_SYSCALL]            @ set syscall number
>>       ldr     r10, [tsk, #TI_FLAGS]           @ check for syscall tracing
>>       stmdb   sp!, {r4, r5}                   @ push fifth and sixth args
>
> Do we definitely want to update scno on syscall restarting?


Good question.

First thing to mention is __sys_trace will trace 'restart_syscall',
not the real syscall we are going to restart.

E.g. in test application we do infinite poll and then send STOP and
CONT to this app:

    test-243   [002] ...1  1792.067726: sys_enter: NR 168 (0, 0,
ffffffff, 0, 0, 0)
    test-243   [002] ...1  1802.299073: sys_exit: NR 168 = -516
    test-243   [004] ...1  1814.716264: sys_enter: NR 0 (0, 0,
ffffffff, 0, 0, 0)
    test-243   [004] ...1  2183.687225: sys_exit: NR 0 = -516

the poll was restarted and trace shows that we are in restart_syscall.

Is that expected?

And the second thing is that my next patch did some tweaks in
'syscall_trace_enter', where we take scno not from param we passed,
but from thread_info->syscall we previously set.

So, regarding your question, if I set scno only once - I will break
previous behavior, and __sys_trace will trace the syscall we restarted.

And I think this is what we need, because according to the
'syscall_trace_enter' code we do 'secure_computing' and
'audit_syscall_entry', which definitely expect original syscall, not
the 'restart_syscall'.


--
Roman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] ARM: entry-common,ptrace: do not pass scno to syscall_trace_enter
  2015-01-11 14:32 ` [PATCH 2/2] ARM: entry-common,ptrace: do not pass scno to syscall_trace_enter Roman Pen
@ 2015-01-13 20:08   ` Kees Cook
  2015-01-13 23:21     ` Roman Peniaev
  0 siblings, 1 reply; 28+ messages in thread
From: Kees Cook @ 2015-01-13 20:08 UTC (permalink / raw)
  To: Roman Pen
  Cc: Russell King, Christoffer Dall, Stefano Stabellini, Sekhar Nori,
	Andy Lutomirski, Eric Paris, Will Deacon, linux-arm-kernel, LKML

On Sun, Jan 11, 2015 at 6:32 AM, Roman Pen <r.peniaev@gmail.com> wrote:
> In previous patch current_thread_info()->syscall is set with
> corresponding syscall number prior to further calls, thus there
> is no any need to pass 'scno'.
>
> Also, add explicit comment why do we have to reread 'scno' local
> variable.
>
> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Sekhar Nori <nsekhar@ti.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Eric Paris <eparis@redhat.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/arm/kernel/entry-common.S | 1 -
>  arch/arm/kernel/ptrace.c       | 6 ++++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index 89452ff..3d12eb5 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -228,7 +228,6 @@ ENDPROC(vector_swi)
>          * context switches, and waiting for our parent to respond.
>          */
>  __sys_trace:
> -       mov     r1, scno
>         add     r0, sp, #S_OFF
>         bl      syscall_trace_enter
>
> diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
> index ef9119f..1238787 100644
> --- a/arch/arm/kernel/ptrace.c
> +++ b/arch/arm/kernel/ptrace.c
> @@ -928,9 +928,9 @@ static void tracehook_report_syscall(struct pt_regs *regs,
>         regs->ARM_ip = ip;
>  }
>
> -asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
> +asmlinkage int syscall_trace_enter(struct pt_regs *regs)
>  {
> -       current_thread_info()->syscall = scno;
> +       int scno = current_thread_info()->syscall;

Was this assignment of current_thread_info()->syscall redundant? If
so, this looks fine. If not, what will now be setting the thread_info?

-Kees

>
>         /* Do the secure computing check first; failures should be fast. */
>  #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
> @@ -944,6 +944,8 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
>         if (test_thread_flag(TIF_SYSCALL_TRACE))
>                 tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
>
> +       /* Syscall can be aborted (-1 can be set) or even changed
> +        * by the tracer and subsequent PTRACE_SET_SYSCALL request */
>         scno = current_thread_info()->syscall;
>
>         if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
> --
> 2.1.3
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] ARM: entry-common,ptrace: do not pass scno to syscall_trace_enter
  2015-01-13 20:08   ` Kees Cook
@ 2015-01-13 23:21     ` Roman Peniaev
  2015-01-13 23:43       ` Kees Cook
  0 siblings, 1 reply; 28+ messages in thread
From: Roman Peniaev @ 2015-01-13 23:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Russell King, Christoffer Dall, Stefano Stabellini, Sekhar Nori,
	Andy Lutomirski, Eric Paris, Will Deacon, linux-arm-kernel, LKML

On Wed, Jan 14, 2015 at 5:08 AM, Kees Cook <keescook@chromium.org> wrote:
> On Sun, Jan 11, 2015 at 6:32 AM, Roman Pen <r.peniaev@gmail.com> wrote:
>> In previous patch current_thread_info()->syscall is set with
>> corresponding syscall number prior to further calls, thus there
>> is no any need to pass 'scno'.
>>
>> Also, add explicit comment why do we have to reread 'scno' local
>> variable.
>>
>> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
>> Cc: Russell King <linux@arm.linux.org.uk>
>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> Cc: Sekhar Nori <nsekhar@ti.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Eric Paris <eparis@redhat.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  arch/arm/kernel/entry-common.S | 1 -
>>  arch/arm/kernel/ptrace.c       | 6 ++++--
>>  2 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
>> index 89452ff..3d12eb5 100644
>> --- a/arch/arm/kernel/entry-common.S
>> +++ b/arch/arm/kernel/entry-common.S
>> @@ -228,7 +228,6 @@ ENDPROC(vector_swi)
>>          * context switches, and waiting for our parent to respond.
>>          */
>>  __sys_trace:
>> -       mov     r1, scno
>>         add     r0, sp, #S_OFF
>>         bl      syscall_trace_enter
>>
>> diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
>> index ef9119f..1238787 100644
>> --- a/arch/arm/kernel/ptrace.c
>> +++ b/arch/arm/kernel/ptrace.c
>> @@ -928,9 +928,9 @@ static void tracehook_report_syscall(struct pt_regs *regs,
>>         regs->ARM_ip = ip;
>>  }
>>
>> -asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
>> +asmlinkage int syscall_trace_enter(struct pt_regs *regs)
>>  {
>> -       current_thread_info()->syscall = scno;
>> +       int scno = current_thread_info()->syscall;
>
> Was this assignment of current_thread_info()->syscall redundant? If
> so, this looks fine. If not, what will now be setting the thread_info?

[sorry, i have to resend this email because previously I sent it using my phone
 and arm maillist rejected it because of html inside]

Yes, it is redundant.
Because of the previous patch thread_info->syscall already contains
corresponding scnr,
so we use it instead of passing the same number from asm.

So everything should work fine without current patch, and also current
patch should not
change anything in the expected behavior.

--
Roman




>
> -Kees
>
>>
>>         /* Do the secure computing check first; failures should be fast. */
>>  #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
>> @@ -944,6 +944,8 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
>>         if (test_thread_flag(TIF_SYSCALL_TRACE))
>>                 tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
>>
>> +       /* Syscall can be aborted (-1 can be set) or even changed
>> +        * by the tracer and subsequent PTRACE_SET_SYSCALL request */
>>         scno = current_thread_info()->syscall;
>>
>>         if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
>> --
>> 2.1.3
>>
>
>
>
> --
> Kees Cook
> Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] ARM: entry-common,ptrace: do not pass scno to syscall_trace_enter
  2015-01-13 23:21     ` Roman Peniaev
@ 2015-01-13 23:43       ` Kees Cook
  0 siblings, 0 replies; 28+ messages in thread
From: Kees Cook @ 2015-01-13 23:43 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Russell King, Christoffer Dall, Stefano Stabellini, Sekhar Nori,
	Andy Lutomirski, Eric Paris, Will Deacon, linux-arm-kernel, LKML

On Tue, Jan 13, 2015 at 3:21 PM, Roman Peniaev <r.peniaev@gmail.com> wrote:
> On Wed, Jan 14, 2015 at 5:08 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Sun, Jan 11, 2015 at 6:32 AM, Roman Pen <r.peniaev@gmail.com> wrote:
>>> In previous patch current_thread_info()->syscall is set with
>>> corresponding syscall number prior to further calls, thus there
>>> is no any need to pass 'scno'.
>>>
>>> Also, add explicit comment why do we have to reread 'scno' local
>>> variable.
>>>
>>> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
>>> Cc: Russell King <linux@arm.linux.org.uk>
>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>> Cc: Sekhar Nori <nsekhar@ti.com>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>> Cc: Eric Paris <eparis@redhat.com>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> ---
>>>  arch/arm/kernel/entry-common.S | 1 -
>>>  arch/arm/kernel/ptrace.c       | 6 ++++--
>>>  2 files changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
>>> index 89452ff..3d12eb5 100644
>>> --- a/arch/arm/kernel/entry-common.S
>>> +++ b/arch/arm/kernel/entry-common.S
>>> @@ -228,7 +228,6 @@ ENDPROC(vector_swi)
>>>          * context switches, and waiting for our parent to respond.
>>>          */
>>>  __sys_trace:
>>> -       mov     r1, scno
>>>         add     r0, sp, #S_OFF
>>>         bl      syscall_trace_enter
>>>
>>> diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
>>> index ef9119f..1238787 100644
>>> --- a/arch/arm/kernel/ptrace.c
>>> +++ b/arch/arm/kernel/ptrace.c
>>> @@ -928,9 +928,9 @@ static void tracehook_report_syscall(struct pt_regs *regs,
>>>         regs->ARM_ip = ip;
>>>  }
>>>
>>> -asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
>>> +asmlinkage int syscall_trace_enter(struct pt_regs *regs)
>>>  {
>>> -       current_thread_info()->syscall = scno;
>>> +       int scno = current_thread_info()->syscall;
>>
>> Was this assignment of current_thread_info()->syscall redundant? If
>> so, this looks fine. If not, what will now be setting the thread_info?
>
> [sorry, i have to resend this email because previously I sent it using my phone
>  and arm maillist rejected it because of html inside]
>
> Yes, it is redundant.
> Because of the previous patch thread_info->syscall already contains
> corresponding scnr,
> so we use it instead of passing the same number from asm.

Ah! Okay, I wasn't CCed on the 1/2 patch. I see it now, thanks!

> So everything should work fine without current patch, and also current
> patch should not
> change anything in the expected behavior.

Great. If this still passes the seccomp testsuite, I'm fine for it. I
can test it later this week if no one else gets to it.

Reviewed-by: Kees Cook <keescook@chromium.org>

Thanks!

-Kees

>
> --
> Roman
>
>
>
>
>>
>> -Kees
>>
>>>
>>>         /* Do the secure computing check first; failures should be fast. */
>>>  #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
>>> @@ -944,6 +944,8 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
>>>         if (test_thread_flag(TIF_SYSCALL_TRACE))
>>>                 tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
>>>
>>> +       /* Syscall can be aborted (-1 can be set) or even changed
>>> +        * by the tracer and subsequent PTRACE_SET_SYSCALL request */
>>>         scno = current_thread_info()->syscall;
>>>
>>>         if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
>>> --
>>> 2.1.3
>>>
>>
>>
>>
>> --
>> Kees Cook
>> Chrome OS Security



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-13  8:35     ` Roman Peniaev
@ 2015-01-14  2:23       ` Roman Peniaev
  2015-01-14 20:51       ` Kees Cook
  1 sibling, 0 replies; 28+ messages in thread
From: Roman Peniaev @ 2015-01-14  2:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King, Stefano Stabellini, Marc Zyngier, Catalin Marinas,
	Sekhar Nori, linux-kernel, linux-arm-kernel, stable,
	Christoffer Dall, Kees Cook

On Tue, Jan 13, 2015 at 5:35 PM, Roman Peniaev <r.peniaev@gmail.com> wrote:
> On Tue, Jan 13, 2015 at 3:39 AM, Will Deacon <will.deacon@arm.com> wrote:
>> On Sun, Jan 11, 2015 at 02:32:30PM +0000, Roman Pen wrote:
>>> thread_info->syscall is used only for ptrace, but syscall number
>>> is also used by syscall_get_nr and returned to userspace by the
>>> following proc file access:
>>>
>>>  $ cat /proc/self/syscall
>>>  0 0x3 0xbe928bd8 0x1000 0x0 0xac9e0 0x3 0xbe928bb4 0xb6f5dfbc
>>>  ^
>>> The first number is the syscall number, currently it is zero.
>>> Patch fixes this:
>>>
>>>  $ cat /proc/self/syscall
>>>  3 0x3 0xbefc7bd8 0x1000 0x0 0xac9e0 0x3 0xbefc7bb4 0xb6e82fbc
>>>  ^
>>> Right, read syscall
>>
>> Yes, it seems that despite requiring CONFIG_HAVE_ARCH_TRACEHOOK,
>> the /proc code requires syscall_get_nr to work regardless of
>> TIF_SYSCALL_TRACE.
>>
>>> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
>>> Cc: Russell King <linux@arm.linux.org.uk>
>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>> Cc: Sekhar Nori <nsekhar@ti.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: stable@vger.kernel.org
>>> ---
>>>  arch/arm/kernel/asm-offsets.c  | 1 +
>>>  arch/arm/kernel/entry-common.S | 1 +
>>>  2 files changed, 2 insertions(+)
>>>
>>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>>> index 2d2d608..6911bad 100644
>>> --- a/arch/arm/kernel/asm-offsets.c
>>> +++ b/arch/arm/kernel/asm-offsets.c
>>> @@ -70,6 +70,7 @@ int main(void)
>>>    DEFINE(TI_CPU,             offsetof(struct thread_info, cpu));
>>>    DEFINE(TI_CPU_DOMAIN,              offsetof(struct thread_info, cpu_domain));
>>>    DEFINE(TI_CPU_SAVE,                offsetof(struct thread_info, cpu_context));
>>> +  DEFINE(TI_SYSCALL,         offsetof(struct thread_info, syscall));
>>>    DEFINE(TI_USED_CP,         offsetof(struct thread_info, used_cp));
>>>    DEFINE(TI_TP_VALUE,                offsetof(struct thread_info, tp_value));
>>>    DEFINE(TI_FPSTATE,         offsetof(struct thread_info, fpstate));
>>> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
>>> index f8ccc21..89452ff 100644
>>> --- a/arch/arm/kernel/entry-common.S
>>> +++ b/arch/arm/kernel/entry-common.S
>>> @@ -189,6 +189,7 @@ ENTRY(vector_swi)
>>>  #endif
>>>
>>>  local_restart:
>>> +     str scno, [tsk, #TI_SYSCALL]            @ set syscall number
>>>       ldr     r10, [tsk, #TI_FLAGS]           @ check for syscall tracing
>>>       stmdb   sp!, {r4, r5}                   @ push fifth and sixth args
>>
>> Do we definitely want to update scno on syscall restarting?
>
>
> Good question.
>
> First thing to mention is __sys_trace will trace 'restart_syscall',
> not the real syscall we are going to restart.
>
> E.g. in test application we do infinite poll and then send STOP and
> CONT to this app:
>
>     test-243   [002] ...1  1792.067726: sys_enter: NR 168 (0, 0,
> ffffffff, 0, 0, 0)
>     test-243   [002] ...1  1802.299073: sys_exit: NR 168 = -516
>     test-243   [004] ...1  1814.716264: sys_enter: NR 0 (0, 0,
> ffffffff, 0, 0, 0)
>     test-243   [004] ...1  2183.687225: sys_exit: NR 0 = -516
>
> the poll was restarted and trace shows that we are in restart_syscall.
>
> Is that expected?
>
> And the second thing is that my next patch did some tweaks in
> 'syscall_trace_enter', where we take scno not from param we passed,
> but from thread_info->syscall we previously set.
>
> So, regarding your question, if I set scno only once - I will break
> previous behavior, and __sys_trace will trace the syscall we restarted.
>
> And I think this is what we need, because according to the
> 'syscall_trace_enter' code we do 'secure_computing' and
> 'audit_syscall_entry', which definitely expect original syscall, not
> the 'restart_syscall'.

+Kees Cook

Moreover, I tested on x86, and behavior is the same:
if we do restart_syscall userspace sees 0 as scnr.
(for me it sounds strange, because as I understand userspace
should avoid any awareness about possible syscall restart)

x86:

[root@qemu-x86 ~]# ./test.x86 &
[1] 285
[root@qemu-x86 ~]# cat /proc/285/syscall
168 0x0 0x0 0xffffffff 0x0 0xbf9387ac 0xbf9387b8 0xbf9386f8 0xffffe424
[root@qemu-x86 ~]# kill -STOP 285
[root@qemu-x86 ~]#

[1]+  Stopped                 ./test.x86
[root@qemu-x86 ~]# kill -CONT 285
[root@qemu-x86 ~]# cat /proc/285/syscall
0 0x0 0x0 0xffffffff 0x0 0xbf9387ac 0xbf9387b8 0xbf9386f8 0xffffe424

And getting syscall tracing on another run of this test application:

[root@qemu-x86 ~]# cat /sys/kernel/debug/tracing/trace | grep test.x86
.....
test.x86-279   [000] ...1   189.926878: sys_enter: NR 168 (0, 0,
ffffffff, 0, bfe6824c, bfe68258)
test.x86-279   [001] ...1   199.931406: sys_exit: NR 168 = -516
test.x86-279   [001] ...1   203.446946: sys_enter: NR 0 (0, 0,
ffffffff, 0, bfe6824c, bfe68258)
test.x86-279   [000] ...1   276.294314: sys_exit: NR 0 = -516
.....


ARM behavior is the same with these two patches:

sh-3.2# /tmp/test &
[1] 178
sh-3.2# cat /proc/178/syscall
168 0x0 0x0 0xffffffff 0x0 0x0 0x0 0xbdcdece4 0x4220a6cc
sh-3.2# kill -STOP 178
sh-3.2#

[1]+  Stopped(SIGSTOP)        /tmp/test
sh-3.2# cat /proc/178/syscall
168 0x0 0x0 0xffffffff 0x0 0x0 0x0 0xbdcdece4 0x4220a6c8
sh-3.2# kill -CONT 178
sh-3.2# cat /proc/178/syscall
0 0x0 0x0 0xffffffff 0x0 0x0 0x0 0xbdcdece4 0x4220a6cc


--
Roman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-13  8:35     ` Roman Peniaev
  2015-01-14  2:23       ` Roman Peniaev
@ 2015-01-14 20:51       ` Kees Cook
  2015-01-15  1:54         ` Roman Peniaev
  1 sibling, 1 reply; 28+ messages in thread
From: Kees Cook @ 2015-01-14 20:51 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Will Deacon, Russell King, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Tue, Jan 13, 2015 at 12:35 AM, Roman Peniaev <r.peniaev@gmail.com> wrote:
> On Tue, Jan 13, 2015 at 3:39 AM, Will Deacon <will.deacon@arm.com> wrote:
>> On Sun, Jan 11, 2015 at 02:32:30PM +0000, Roman Pen wrote:
>>> thread_info->syscall is used only for ptrace, but syscall number
>>> is also used by syscall_get_nr and returned to userspace by the
>>> following proc file access:
>>>
>>>  $ cat /proc/self/syscall
>>>  0 0x3 0xbe928bd8 0x1000 0x0 0xac9e0 0x3 0xbe928bb4 0xb6f5dfbc
>>>  ^
>>> The first number is the syscall number, currently it is zero.
>>> Patch fixes this:
>>>
>>>  $ cat /proc/self/syscall
>>>  3 0x3 0xbefc7bd8 0x1000 0x0 0xac9e0 0x3 0xbefc7bb4 0xb6e82fbc
>>>  ^
>>> Right, read syscall
>>
>> Yes, it seems that despite requiring CONFIG_HAVE_ARCH_TRACEHOOK,
>> the /proc code requires syscall_get_nr to work regardless of
>> TIF_SYSCALL_TRACE.
>>
>>> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
>>> Cc: Russell King <linux@arm.linux.org.uk>
>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>> Cc: Sekhar Nori <nsekhar@ti.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: stable@vger.kernel.org
>>> ---
>>>  arch/arm/kernel/asm-offsets.c  | 1 +
>>>  arch/arm/kernel/entry-common.S | 1 +
>>>  2 files changed, 2 insertions(+)
>>>
>>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>>> index 2d2d608..6911bad 100644
>>> --- a/arch/arm/kernel/asm-offsets.c
>>> +++ b/arch/arm/kernel/asm-offsets.c
>>> @@ -70,6 +70,7 @@ int main(void)
>>>    DEFINE(TI_CPU,             offsetof(struct thread_info, cpu));
>>>    DEFINE(TI_CPU_DOMAIN,              offsetof(struct thread_info, cpu_domain));
>>>    DEFINE(TI_CPU_SAVE,                offsetof(struct thread_info, cpu_context));
>>> +  DEFINE(TI_SYSCALL,         offsetof(struct thread_info, syscall));
>>>    DEFINE(TI_USED_CP,         offsetof(struct thread_info, used_cp));
>>>    DEFINE(TI_TP_VALUE,                offsetof(struct thread_info, tp_value));
>>>    DEFINE(TI_FPSTATE,         offsetof(struct thread_info, fpstate));
>>> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
>>> index f8ccc21..89452ff 100644
>>> --- a/arch/arm/kernel/entry-common.S
>>> +++ b/arch/arm/kernel/entry-common.S
>>> @@ -189,6 +189,7 @@ ENTRY(vector_swi)
>>>  #endif
>>>
>>>  local_restart:
>>> +     str scno, [tsk, #TI_SYSCALL]            @ set syscall number
>>>       ldr     r10, [tsk, #TI_FLAGS]           @ check for syscall tracing
>>>       stmdb   sp!, {r4, r5}                   @ push fifth and sixth args
>>
>> Do we definitely want to update scno on syscall restarting?
>
>
> Good question.
>
> First thing to mention is __sys_trace will trace 'restart_syscall',
> not the real syscall we are going to restart.
>
> E.g. in test application we do infinite poll and then send STOP and
> CONT to this app:
>
>     test-243   [002] ...1  1792.067726: sys_enter: NR 168 (0, 0,
> ffffffff, 0, 0, 0)
>     test-243   [002] ...1  1802.299073: sys_exit: NR 168 = -516
>     test-243   [004] ...1  1814.716264: sys_enter: NR 0 (0, 0,
> ffffffff, 0, 0, 0)
>     test-243   [004] ...1  2183.687225: sys_exit: NR 0 = -516
>
> the poll was restarted and trace shows that we are in restart_syscall.
>
> Is that expected?
>
> And the second thing is that my next patch did some tweaks in
> 'syscall_trace_enter', where we take scno not from param we passed,
> but from thread_info->syscall we previously set.
>
> So, regarding your question, if I set scno only once - I will break
> previous behavior, and __sys_trace will trace the syscall we restarted.
>
> And I think this is what we need, because according to the
> 'syscall_trace_enter' code we do 'secure_computing' and
> 'audit_syscall_entry', which definitely expect original syscall, not
> the 'restart_syscall'.

Seccomp expects to see the __NR_restart_syscall syscall, since it
interposes the syscall entry points.

-Kees

>
>
> --
> Roman
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-14 20:51       ` Kees Cook
@ 2015-01-15  1:54         ` Roman Peniaev
  2015-01-15 22:54           ` Kees Cook
  0 siblings, 1 reply; 28+ messages in thread
From: Roman Peniaev @ 2015-01-15  1:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Will Deacon, Russell King, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Thu, Jan 15, 2015 at 5:51 AM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, Jan 13, 2015 at 12:35 AM, Roman Peniaev <r.peniaev@gmail.com> wrote:
>> On Tue, Jan 13, 2015 at 3:39 AM, Will Deacon <will.deacon@arm.com> wrote:
>>> On Sun, Jan 11, 2015 at 02:32:30PM +0000, Roman Pen wrote:
>>>> thread_info->syscall is used only for ptrace, but syscall number
>>>> is also used by syscall_get_nr and returned to userspace by the
>>>> following proc file access:
>>>>
>>>>  $ cat /proc/self/syscall
>>>>  0 0x3 0xbe928bd8 0x1000 0x0 0xac9e0 0x3 0xbe928bb4 0xb6f5dfbc
>>>>  ^
>>>> The first number is the syscall number, currently it is zero.
>>>> Patch fixes this:
>>>>
>>>>  $ cat /proc/self/syscall
>>>>  3 0x3 0xbefc7bd8 0x1000 0x0 0xac9e0 0x3 0xbefc7bb4 0xb6e82fbc
>>>>  ^
>>>> Right, read syscall
>>>
>>> Yes, it seems that despite requiring CONFIG_HAVE_ARCH_TRACEHOOK,
>>> the /proc code requires syscall_get_nr to work regardless of
>>> TIF_SYSCALL_TRACE.
>>>
>>>> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
>>>> Cc: Russell King <linux@arm.linux.org.uk>
>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>> Cc: Sekhar Nori <nsekhar@ti.com>
>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Cc: stable@vger.kernel.org
>>>> ---
>>>>  arch/arm/kernel/asm-offsets.c  | 1 +
>>>>  arch/arm/kernel/entry-common.S | 1 +
>>>>  2 files changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>>>> index 2d2d608..6911bad 100644
>>>> --- a/arch/arm/kernel/asm-offsets.c
>>>> +++ b/arch/arm/kernel/asm-offsets.c
>>>> @@ -70,6 +70,7 @@ int main(void)
>>>>    DEFINE(TI_CPU,             offsetof(struct thread_info, cpu));
>>>>    DEFINE(TI_CPU_DOMAIN,              offsetof(struct thread_info, cpu_domain));
>>>>    DEFINE(TI_CPU_SAVE,                offsetof(struct thread_info, cpu_context));
>>>> +  DEFINE(TI_SYSCALL,         offsetof(struct thread_info, syscall));
>>>>    DEFINE(TI_USED_CP,         offsetof(struct thread_info, used_cp));
>>>>    DEFINE(TI_TP_VALUE,                offsetof(struct thread_info, tp_value));
>>>>    DEFINE(TI_FPSTATE,         offsetof(struct thread_info, fpstate));
>>>> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
>>>> index f8ccc21..89452ff 100644
>>>> --- a/arch/arm/kernel/entry-common.S
>>>> +++ b/arch/arm/kernel/entry-common.S
>>>> @@ -189,6 +189,7 @@ ENTRY(vector_swi)
>>>>  #endif
>>>>
>>>>  local_restart:
>>>> +     str scno, [tsk, #TI_SYSCALL]            @ set syscall number
>>>>       ldr     r10, [tsk, #TI_FLAGS]           @ check for syscall tracing
>>>>       stmdb   sp!, {r4, r5}                   @ push fifth and sixth args
>>>
>>> Do we definitely want to update scno on syscall restarting?
>>
>>
>> Good question.
>>
>> First thing to mention is __sys_trace will trace 'restart_syscall',
>> not the real syscall we are going to restart.
>>
>> E.g. in test application we do infinite poll and then send STOP and
>> CONT to this app:
>>
>>     test-243   [002] ...1  1792.067726: sys_enter: NR 168 (0, 0,
>> ffffffff, 0, 0, 0)
>>     test-243   [002] ...1  1802.299073: sys_exit: NR 168 = -516
>>     test-243   [004] ...1  1814.716264: sys_enter: NR 0 (0, 0,
>> ffffffff, 0, 0, 0)
>>     test-243   [004] ...1  2183.687225: sys_exit: NR 0 = -516
>>
>> the poll was restarted and trace shows that we are in restart_syscall.
>>
>> Is that expected?
>>
>> And the second thing is that my next patch did some tweaks in
>> 'syscall_trace_enter', where we take scno not from param we passed,
>> but from thread_info->syscall we previously set.
>>
>> So, regarding your question, if I set scno only once - I will break
>> previous behavior, and __sys_trace will trace the syscall we restarted.
>>
>> And I think this is what we need, because according to the
>> 'syscall_trace_enter' code we do 'secure_computing' and
>> 'audit_syscall_entry', which definitely expect original syscall, not
>> the 'restart_syscall'.
>
> Seccomp expects to see the __NR_restart_syscall syscall, since it
> interposes the syscall entry points.


Aha, thanks. So I should not break anything.

--
Roman


>
> -Kees
>
>>
>>
>> --
>> Roman
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
>
>
> --
> Kees Cook
> Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-15  1:54         ` Roman Peniaev
@ 2015-01-15 22:54           ` Kees Cook
  2015-01-16 15:57             ` Roman Peniaev
  0 siblings, 1 reply; 28+ messages in thread
From: Kees Cook @ 2015-01-15 22:54 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Will Deacon, Russell King, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Wed, Jan 14, 2015 at 5:54 PM, Roman Peniaev <r.peniaev@gmail.com> wrote:
> On Thu, Jan 15, 2015 at 5:51 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Tue, Jan 13, 2015 at 12:35 AM, Roman Peniaev <r.peniaev@gmail.com> wrote:
>>> On Tue, Jan 13, 2015 at 3:39 AM, Will Deacon <will.deacon@arm.com> wrote:
>>>> On Sun, Jan 11, 2015 at 02:32:30PM +0000, Roman Pen wrote:
>>>>> thread_info->syscall is used only for ptrace, but syscall number
>>>>> is also used by syscall_get_nr and returned to userspace by the
>>>>> following proc file access:
>>>>>
>>>>>  $ cat /proc/self/syscall
>>>>>  0 0x3 0xbe928bd8 0x1000 0x0 0xac9e0 0x3 0xbe928bb4 0xb6f5dfbc
>>>>>  ^
>>>>> The first number is the syscall number, currently it is zero.
>>>>> Patch fixes this:
>>>>>
>>>>>  $ cat /proc/self/syscall
>>>>>  3 0x3 0xbefc7bd8 0x1000 0x0 0xac9e0 0x3 0xbefc7bb4 0xb6e82fbc
>>>>>  ^
>>>>> Right, read syscall
>>>>
>>>> Yes, it seems that despite requiring CONFIG_HAVE_ARCH_TRACEHOOK,
>>>> the /proc code requires syscall_get_nr to work regardless of
>>>> TIF_SYSCALL_TRACE.
>>>>
>>>>> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
>>>>> Cc: Russell King <linux@arm.linux.org.uk>
>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>>> Cc: Sekhar Nori <nsekhar@ti.com>
>>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Cc: stable@vger.kernel.org
>>>>> ---
>>>>>  arch/arm/kernel/asm-offsets.c  | 1 +
>>>>>  arch/arm/kernel/entry-common.S | 1 +
>>>>>  2 files changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>>>>> index 2d2d608..6911bad 100644
>>>>> --- a/arch/arm/kernel/asm-offsets.c
>>>>> +++ b/arch/arm/kernel/asm-offsets.c
>>>>> @@ -70,6 +70,7 @@ int main(void)
>>>>>    DEFINE(TI_CPU,             offsetof(struct thread_info, cpu));
>>>>>    DEFINE(TI_CPU_DOMAIN,              offsetof(struct thread_info, cpu_domain));
>>>>>    DEFINE(TI_CPU_SAVE,                offsetof(struct thread_info, cpu_context));
>>>>> +  DEFINE(TI_SYSCALL,         offsetof(struct thread_info, syscall));
>>>>>    DEFINE(TI_USED_CP,         offsetof(struct thread_info, used_cp));
>>>>>    DEFINE(TI_TP_VALUE,                offsetof(struct thread_info, tp_value));
>>>>>    DEFINE(TI_FPSTATE,         offsetof(struct thread_info, fpstate));
>>>>> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
>>>>> index f8ccc21..89452ff 100644
>>>>> --- a/arch/arm/kernel/entry-common.S
>>>>> +++ b/arch/arm/kernel/entry-common.S
>>>>> @@ -189,6 +189,7 @@ ENTRY(vector_swi)
>>>>>  #endif
>>>>>
>>>>>  local_restart:
>>>>> +     str scno, [tsk, #TI_SYSCALL]            @ set syscall number
>>>>>       ldr     r10, [tsk, #TI_FLAGS]           @ check for syscall tracing
>>>>>       stmdb   sp!, {r4, r5}                   @ push fifth and sixth args
>>>>
>>>> Do we definitely want to update scno on syscall restarting?
>>>
>>>
>>> Good question.
>>>
>>> First thing to mention is __sys_trace will trace 'restart_syscall',
>>> not the real syscall we are going to restart.
>>>
>>> E.g. in test application we do infinite poll and then send STOP and
>>> CONT to this app:
>>>
>>>     test-243   [002] ...1  1792.067726: sys_enter: NR 168 (0, 0,
>>> ffffffff, 0, 0, 0)
>>>     test-243   [002] ...1  1802.299073: sys_exit: NR 168 = -516
>>>     test-243   [004] ...1  1814.716264: sys_enter: NR 0 (0, 0,
>>> ffffffff, 0, 0, 0)
>>>     test-243   [004] ...1  2183.687225: sys_exit: NR 0 = -516
>>>
>>> the poll was restarted and trace shows that we are in restart_syscall.
>>>
>>> Is that expected?
>>>
>>> And the second thing is that my next patch did some tweaks in
>>> 'syscall_trace_enter', where we take scno not from param we passed,
>>> but from thread_info->syscall we previously set.
>>>
>>> So, regarding your question, if I set scno only once - I will break
>>> previous behavior, and __sys_trace will trace the syscall we restarted.
>>>
>>> And I think this is what we need, because according to the
>>> 'syscall_trace_enter' code we do 'secure_computing' and
>>> 'audit_syscall_entry', which definitely expect original syscall, not
>>> the 'restart_syscall'.
>>
>> Seccomp expects to see the __NR_restart_syscall syscall, since it
>> interposes the syscall entry points.
>
>
> Aha, thanks. So I should not break anything.

I've tested on ARM now, seccomp doesn't see any change in behavior.
Please consider both patches:

Tested-by: Kees Cook <keescook@chromium.org>

One interesting thing I noticed (which is unchanged by this series),
but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
not __NR_restart_syscall, even though it was a __NR_restart_syscall
trap from seccomp. Is there a better place to see the actual syscall?

-Kees

>
> --
> Roman
>
>
>>
>> -Kees
>>
>>>
>>>
>>> --
>>> Roman
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>>
>>
>> --
>> Kees Cook
>> Chrome OS Security



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-15 22:54           ` Kees Cook
@ 2015-01-16 15:57             ` Roman Peniaev
  2015-01-16 15:59               ` Russell King - ARM Linux
  0 siblings, 1 reply; 28+ messages in thread
From: Roman Peniaev @ 2015-01-16 15:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: Will Deacon, Russell King, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Fri, Jan 16, 2015 at 7:54 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 14, 2015 at 5:54 PM, Roman Peniaev <r.peniaev@gmail.com> wrote:
>> On Thu, Jan 15, 2015 at 5:51 AM, Kees Cook <keescook@chromium.org> wrote:
>>> On Tue, Jan 13, 2015 at 12:35 AM, Roman Peniaev <r.peniaev@gmail.com> wrote:
>>>> On Tue, Jan 13, 2015 at 3:39 AM, Will Deacon <will.deacon@arm.com> wrote:
>>>>> On Sun, Jan 11, 2015 at 02:32:30PM +0000, Roman Pen wrote:
>>>>>> thread_info->syscall is used only for ptrace, but syscall number
>>>>>> is also used by syscall_get_nr and returned to userspace by the
>>>>>> following proc file access:
>>>>>>
>>>>>>  $ cat /proc/self/syscall
>>>>>>  0 0x3 0xbe928bd8 0x1000 0x0 0xac9e0 0x3 0xbe928bb4 0xb6f5dfbc
>>>>>>  ^
>>>>>> The first number is the syscall number, currently it is zero.
>>>>>> Patch fixes this:
>>>>>>
>>>>>>  $ cat /proc/self/syscall
>>>>>>  3 0x3 0xbefc7bd8 0x1000 0x0 0xac9e0 0x3 0xbefc7bb4 0xb6e82fbc
>>>>>>  ^
>>>>>> Right, read syscall
>>>>>
>>>>> Yes, it seems that despite requiring CONFIG_HAVE_ARCH_TRACEHOOK,
>>>>> the /proc code requires syscall_get_nr to work regardless of
>>>>> TIF_SYSCALL_TRACE.
>>>>>
>>>>>> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
>>>>>> Cc: Russell King <linux@arm.linux.org.uk>
>>>>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>>>> Cc: Sekhar Nori <nsekhar@ti.com>
>>>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>> Cc: stable@vger.kernel.org
>>>>>> ---
>>>>>>  arch/arm/kernel/asm-offsets.c  | 1 +
>>>>>>  arch/arm/kernel/entry-common.S | 1 +
>>>>>>  2 files changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>>>>>> index 2d2d608..6911bad 100644
>>>>>> --- a/arch/arm/kernel/asm-offsets.c
>>>>>> +++ b/arch/arm/kernel/asm-offsets.c
>>>>>> @@ -70,6 +70,7 @@ int main(void)
>>>>>>    DEFINE(TI_CPU,             offsetof(struct thread_info, cpu));
>>>>>>    DEFINE(TI_CPU_DOMAIN,              offsetof(struct thread_info, cpu_domain));
>>>>>>    DEFINE(TI_CPU_SAVE,                offsetof(struct thread_info, cpu_context));
>>>>>> +  DEFINE(TI_SYSCALL,         offsetof(struct thread_info, syscall));
>>>>>>    DEFINE(TI_USED_CP,         offsetof(struct thread_info, used_cp));
>>>>>>    DEFINE(TI_TP_VALUE,                offsetof(struct thread_info, tp_value));
>>>>>>    DEFINE(TI_FPSTATE,         offsetof(struct thread_info, fpstate));
>>>>>> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
>>>>>> index f8ccc21..89452ff 100644
>>>>>> --- a/arch/arm/kernel/entry-common.S
>>>>>> +++ b/arch/arm/kernel/entry-common.S
>>>>>> @@ -189,6 +189,7 @@ ENTRY(vector_swi)
>>>>>>  #endif
>>>>>>
>>>>>>  local_restart:
>>>>>> +     str scno, [tsk, #TI_SYSCALL]            @ set syscall number
>>>>>>       ldr     r10, [tsk, #TI_FLAGS]           @ check for syscall tracing
>>>>>>       stmdb   sp!, {r4, r5}                   @ push fifth and sixth args
>>>>>
>>>>> Do we definitely want to update scno on syscall restarting?
>>>>
>>>>
>>>> Good question.
>>>>
>>>> First thing to mention is __sys_trace will trace 'restart_syscall',
>>>> not the real syscall we are going to restart.
>>>>
>>>> E.g. in test application we do infinite poll and then send STOP and
>>>> CONT to this app:
>>>>
>>>>     test-243   [002] ...1  1792.067726: sys_enter: NR 168 (0, 0,
>>>> ffffffff, 0, 0, 0)
>>>>     test-243   [002] ...1  1802.299073: sys_exit: NR 168 = -516
>>>>     test-243   [004] ...1  1814.716264: sys_enter: NR 0 (0, 0,
>>>> ffffffff, 0, 0, 0)
>>>>     test-243   [004] ...1  2183.687225: sys_exit: NR 0 = -516
>>>>
>>>> the poll was restarted and trace shows that we are in restart_syscall.
>>>>
>>>> Is that expected?
>>>>
>>>> And the second thing is that my next patch did some tweaks in
>>>> 'syscall_trace_enter', where we take scno not from param we passed,
>>>> but from thread_info->syscall we previously set.
>>>>
>>>> So, regarding your question, if I set scno only once - I will break
>>>> previous behavior, and __sys_trace will trace the syscall we restarted.
>>>>
>>>> And I think this is what we need, because according to the
>>>> 'syscall_trace_enter' code we do 'secure_computing' and
>>>> 'audit_syscall_entry', which definitely expect original syscall, not
>>>> the 'restart_syscall'.
>>>
>>> Seccomp expects to see the __NR_restart_syscall syscall, since it
>>> interposes the syscall entry points.
>>
>>
>> Aha, thanks. So I should not break anything.
>
> I've tested on ARM now, seccomp doesn't see any change in behavior.
> Please consider both patches:
>
> Tested-by: Kees Cook <keescook@chromium.org>
>

Thanks.

> One interesting thing I noticed (which is unchanged by this series),
> but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
> not __NR_restart_syscall, even though it was a __NR_restart_syscall
> trap from seccomp. Is there a better place to see the actual syscall?

As I understand we do not push new r7 to the stack, and ptrace uses the
old value.

I checked x86, x86-64 - ptrace returns __NR_restart_syscall.
So, probably, this should be fixed for ARM.

--
Roman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-16 15:57             ` Roman Peniaev
@ 2015-01-16 15:59               ` Russell King - ARM Linux
  2015-01-16 16:08                 ` Roman Peniaev
  0 siblings, 1 reply; 28+ messages in thread
From: Russell King - ARM Linux @ 2015-01-16 15:59 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Kees Cook, Will Deacon, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Sat, Jan 17, 2015 at 12:57:02AM +0900, Roman Peniaev wrote:
> On Fri, Jan 16, 2015 at 7:54 AM, Kees Cook <keescook@chromium.org> wrote:
> > One interesting thing I noticed (which is unchanged by this series),
> > but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
> > not __NR_restart_syscall, even though it was a __NR_restart_syscall
> > trap from seccomp. Is there a better place to see the actual syscall?
> 
> As I understand we do not push new r7 to the stack, and ptrace uses the
> old value.

And why should we push r7 to the stack?  ptrace should be using the
recorded system call number, rather than poking about on the stack
itself.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-16 15:59               ` Russell King - ARM Linux
@ 2015-01-16 16:08                 ` Roman Peniaev
  2015-01-16 16:17                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 28+ messages in thread
From: Roman Peniaev @ 2015-01-16 16:08 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Kees Cook, Will Deacon, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Sat, Jan 17, 2015 at 12:59 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sat, Jan 17, 2015 at 12:57:02AM +0900, Roman Peniaev wrote:
>> On Fri, Jan 16, 2015 at 7:54 AM, Kees Cook <keescook@chromium.org> wrote:
>> > One interesting thing I noticed (which is unchanged by this series),
>> > but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
>> > not __NR_restart_syscall, even though it was a __NR_restart_syscall
>> > trap from seccomp. Is there a better place to see the actual syscall?
>>
>> As I understand we do not push new r7 to the stack, and ptrace uses the
>> old value.
>
> And why should we push r7 to the stack?  ptrace should be using the
> recorded system call number, rather than poking about on the stack
> itself.

Probably we should not, but the behaviour comparing arm to x86 is different.

Also there is no any way from userspace to figure out what syscall was
restarted,
if you do not trace each syscall enter and exit from the very beginning.

--
Roman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-16 16:08                 ` Roman Peniaev
@ 2015-01-16 16:17                   ` Russell King - ARM Linux
  2015-01-16 19:57                     ` Kees Cook
  0 siblings, 1 reply; 28+ messages in thread
From: Russell King - ARM Linux @ 2015-01-16 16:17 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Kees Cook, Will Deacon, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Sat, Jan 17, 2015 at 01:08:11AM +0900, Roman Peniaev wrote:
> On Sat, Jan 17, 2015 at 12:59 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Sat, Jan 17, 2015 at 12:57:02AM +0900, Roman Peniaev wrote:
> >> On Fri, Jan 16, 2015 at 7:54 AM, Kees Cook <keescook@chromium.org> wrote:
> >> > One interesting thing I noticed (which is unchanged by this series),
> >> > but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
> >> > not __NR_restart_syscall, even though it was a __NR_restart_syscall
> >> > trap from seccomp. Is there a better place to see the actual syscall?
> >>
> >> As I understand we do not push new r7 to the stack, and ptrace uses the
> >> old value.
> >
> > And why should we push r7 to the stack?  ptrace should be using the
> > recorded system call number, rather than poking about on the stack
> > itself.
> 
> Probably we should not, but the behaviour comparing arm to x86 is different.

We definitely should not, because changing the stacked value changes the
value in r7 after the syscall has returned.  We have guaranteed that the
value will be preserved across syscalls for years, so we really should
not be changing that.

> Also there is no any way from userspace to figure out what syscall was
> restarted, if you do not trace each syscall enter and exit from the
> very beginning.

Thinking about ptrace, that's been true for years.

It really depends whether you consider the restart syscall a userspace
thing or a kernelspace thing.  When you consider that the vast majority
of syscall restarts are done internally in the kernel, and we just
re-issue the syscall, it immediately brings up the question "why is
the restart block method different?" and "should the restart block
method be visible to userspace?"

IMHO, it is prudent not to expose kernel internals to userspace unless
there is a real reason to, otherwise they become part of the userspace
API.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-16 16:17                   ` Russell King - ARM Linux
@ 2015-01-16 19:57                     ` Kees Cook
  2015-01-16 23:54                       ` Kees Cook
  0 siblings, 1 reply; 28+ messages in thread
From: Kees Cook @ 2015-01-16 19:57 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Roman Peniaev, Will Deacon, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Fri, Jan 16, 2015 at 8:17 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sat, Jan 17, 2015 at 01:08:11AM +0900, Roman Peniaev wrote:
>> On Sat, Jan 17, 2015 at 12:59 AM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > On Sat, Jan 17, 2015 at 12:57:02AM +0900, Roman Peniaev wrote:
>> >> On Fri, Jan 16, 2015 at 7:54 AM, Kees Cook <keescook@chromium.org> wrote:
>> >> > One interesting thing I noticed (which is unchanged by this series),
>> >> > but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
>> >> > not __NR_restart_syscall, even though it was a __NR_restart_syscall
>> >> > trap from seccomp. Is there a better place to see the actual syscall?
>> >>
>> >> As I understand we do not push new r7 to the stack, and ptrace uses the
>> >> old value.
>> >
>> > And why should we push r7 to the stack?  ptrace should be using the
>> > recorded system call number, rather than poking about on the stack
>> > itself.
>>
>> Probably we should not, but the behaviour comparing arm to x86 is different.
>
> We definitely should not, because changing the stacked value changes the
> value in r7 after the syscall has returned.  We have guaranteed that the
> value will be preserved across syscalls for years, so we really should
> not be changing that.

Yeah, we can't mess with the registers. I was just asking for
clarification on how this is visible to userspace.

>
>> Also there is no any way from userspace to figure out what syscall was
>> restarted, if you do not trace each syscall enter and exit from the
>> very beginning.
>
> Thinking about ptrace, that's been true for years.
>
> It really depends whether you consider the restart syscall a userspace
> thing or a kernelspace thing.  When you consider that the vast majority
> of syscall restarts are done internally in the kernel, and we just
> re-issue the syscall, it immediately brings up the question "why is
> the restart block method different?" and "should the restart block
> method be visible to userspace?"
>
> IMHO, it is prudent not to expose kernel internals to userspace unless
> there is a real reason to, otherwise they become part of the userspace
> API.

I couldn't agree more, but restart_syscall is already visible to
userspace: it can be called directly, for example. And it's visible to
tracers.

Unfortunately, the difference here is the visibility during trace
trap. On x86, it's exposed but on ARM, there's no way (that I can
find) to query the "true" syscall, even though the true syscall is
what triggers the tracer. The syscall number isn't provided by any
element of the ptrace event system, nor through siginfo, and must be
examined on a per-arch basis from registers.

Seccomp does, however, provide a mechanism to pass arbitrary event
data on a TRACE event, so poll vs restart_syscall can be distinguished
that way.

It seems even strace doesn't know how to find this information. For example:

x86:
poll([{fd=3, events=POLLIN}], 1, 4294967295
) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=994, si_uid=1000} ---
--- stopped by SIGSTOP ---
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=994, si_uid=1000} ---
restart_syscall(<... resuming interrupted call ...>

ARM:
poll([{fd=3, events=POLLIN}], 1, -1
)    = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=20563, si_uid=0} ---
--- stopped by SIGSTOP ---
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=20563, si_uid=0} ---
poll([{fd=3, events=POLLIN}], 1, -1

Would it make sense to add REGSET_SYSTEM_CALL to ARM? (Though this
begs the question, "Is restart_syscall visible during a trace on
arm64?", which I'll have to go check...)

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-16 19:57                     ` Kees Cook
@ 2015-01-16 23:54                       ` Kees Cook
  2015-01-19  5:58                         ` Roman Peniaev
  2015-01-19  9:20                         ` Will Deacon
  0 siblings, 2 replies; 28+ messages in thread
From: Kees Cook @ 2015-01-16 23:54 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Roman Peniaev, Will Deacon, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Fri, Jan 16, 2015 at 11:57 AM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Jan 16, 2015 at 8:17 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Sat, Jan 17, 2015 at 01:08:11AM +0900, Roman Peniaev wrote:
>>> On Sat, Jan 17, 2015 at 12:59 AM, Russell King - ARM Linux
>>> <linux@arm.linux.org.uk> wrote:
>>> > On Sat, Jan 17, 2015 at 12:57:02AM +0900, Roman Peniaev wrote:
>>> >> On Fri, Jan 16, 2015 at 7:54 AM, Kees Cook <keescook@chromium.org> wrote:
>>> >> > One interesting thing I noticed (which is unchanged by this series),
>>> >> > but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
>>> >> > not __NR_restart_syscall, even though it was a __NR_restart_syscall
>>> >> > trap from seccomp. Is there a better place to see the actual syscall?
>>> >>
>>> >> As I understand we do not push new r7 to the stack, and ptrace uses the
>>> >> old value.
>>> >
>>> > And why should we push r7 to the stack?  ptrace should be using the
>>> > recorded system call number, rather than poking about on the stack
>>> > itself.
>>>
>>> Probably we should not, but the behaviour comparing arm to x86 is different.
>>
>> We definitely should not, because changing the stacked value changes the
>> value in r7 after the syscall has returned.  We have guaranteed that the
>> value will be preserved across syscalls for years, so we really should
>> not be changing that.
>
> Yeah, we can't mess with the registers. I was just asking for
> clarification on how this is visible to userspace.
>
>>
>>> Also there is no any way from userspace to figure out what syscall was
>>> restarted, if you do not trace each syscall enter and exit from the
>>> very beginning.
>>
>> Thinking about ptrace, that's been true for years.
>>
>> It really depends whether you consider the restart syscall a userspace
>> thing or a kernelspace thing.  When you consider that the vast majority
>> of syscall restarts are done internally in the kernel, and we just
>> re-issue the syscall, it immediately brings up the question "why is
>> the restart block method different?" and "should the restart block
>> method be visible to userspace?"
>>
>> IMHO, it is prudent not to expose kernel internals to userspace unless
>> there is a real reason to, otherwise they become part of the userspace
>> API.
>
> I couldn't agree more, but restart_syscall is already visible to
> userspace: it can be called directly, for example. And it's visible to
> tracers.
>
> Unfortunately, the difference here is the visibility during trace
> trap. On x86, it's exposed but on ARM, there's no way (that I can
> find) to query the "true" syscall, even though the true syscall is
> what triggers the tracer. The syscall number isn't provided by any
> element of the ptrace event system, nor through siginfo, and must be
> examined on a per-arch basis from registers.
>
> Seccomp does, however, provide a mechanism to pass arbitrary event
> data on a TRACE event, so poll vs restart_syscall can be distinguished
> that way.
>
> It seems even strace doesn't know how to find this information. For example:
>
> x86:
> poll([{fd=3, events=POLLIN}], 1, 4294967295
> ) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
> --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=994, si_uid=1000} ---
> --- stopped by SIGSTOP ---
> --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=994, si_uid=1000} ---
> restart_syscall(<... resuming interrupted call ...>
>
> ARM:
> poll([{fd=3, events=POLLIN}], 1, -1
> )    = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
> --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=20563, si_uid=0} ---
> --- stopped by SIGSTOP ---
> --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=20563, si_uid=0} ---
> poll([{fd=3, events=POLLIN}], 1, -1
>
> Would it make sense to add REGSET_SYSTEM_CALL to ARM? (Though this
> begs the question, "Is restart_syscall visible during a trace on
> arm64?", which I'll have to go check...)

So, some further testing:
- native arm64 presents "poll" again even to seccomp when
restart_syscall is triggered (both via regs[8] and
NT_ARM_SYSTEM_CALL).
- compat mode on arm64 _does_ show syscall_restart (via ARM_r7).

Which of these behaviors is intentional? :)

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-16 23:54                       ` Kees Cook
@ 2015-01-19  5:58                         ` Roman Peniaev
  2015-01-20 18:56                           ` Kees Cook
  2015-01-19  9:20                         ` Will Deacon
  1 sibling, 1 reply; 28+ messages in thread
From: Roman Peniaev @ 2015-01-19  5:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Russell King - ARM Linux, Will Deacon, Stefano Stabellini,
	Marc Zyngier, Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Sat, Jan 17, 2015 at 8:54 AM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Jan 16, 2015 at 11:57 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Fri, Jan 16, 2015 at 8:17 AM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>>> On Sat, Jan 17, 2015 at 01:08:11AM +0900, Roman Peniaev wrote:
>>>> On Sat, Jan 17, 2015 at 12:59 AM, Russell King - ARM Linux
>>>> <linux@arm.linux.org.uk> wrote:
>>>> > On Sat, Jan 17, 2015 at 12:57:02AM +0900, Roman Peniaev wrote:
>>>> >> On Fri, Jan 16, 2015 at 7:54 AM, Kees Cook <keescook@chromium.org> wrote:
>>>> >> > One interesting thing I noticed (which is unchanged by this series),
>>>> >> > but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
>>>> >> > not __NR_restart_syscall, even though it was a __NR_restart_syscall
>>>> >> > trap from seccomp. Is there a better place to see the actual syscall?
>>>> >>
>>>> >> As I understand we do not push new r7 to the stack, and ptrace uses the
>>>> >> old value.
>>>> >
>>>> > And why should we push r7 to the stack?  ptrace should be using the
>>>> > recorded system call number, rather than poking about on the stack
>>>> > itself.
>>>>
>>>> Probably we should not, but the behaviour comparing arm to x86 is different.
>>>
>>> We definitely should not, because changing the stacked value changes the
>>> value in r7 after the syscall has returned.  We have guaranteed that the
>>> value will be preserved across syscalls for years, so we really should
>>> not be changing that.
>>
>> Yeah, we can't mess with the registers. I was just asking for
>> clarification on how this is visible to userspace.
>>
>>>
>>>> Also there is no any way from userspace to figure out what syscall was
>>>> restarted, if you do not trace each syscall enter and exit from the
>>>> very beginning.
>>>
>>> Thinking about ptrace, that's been true for years.
>>>
>>> It really depends whether you consider the restart syscall a userspace
>>> thing or a kernelspace thing.  When you consider that the vast majority
>>> of syscall restarts are done internally in the kernel, and we just
>>> re-issue the syscall, it immediately brings up the question "why is
>>> the restart block method different?" and "should the restart block
>>> method be visible to userspace?"
>>>
>>> IMHO, it is prudent not to expose kernel internals to userspace unless
>>> there is a real reason to, otherwise they become part of the userspace
>>> API.
>>
>> I couldn't agree more, but restart_syscall is already visible to
>> userspace: it can be called directly, for example. And it's visible to
>> tracers.
>>
>> Unfortunately, the difference here is the visibility during trace
>> trap. On x86, it's exposed but on ARM, there's no way (that I can
>> find) to query the "true" syscall, even though the true syscall is
>> what triggers the tracer. The syscall number isn't provided by any
>> element of the ptrace event system, nor through siginfo, and must be
>> examined on a per-arch basis from registers.
>>
>> Seccomp does, however, provide a mechanism to pass arbitrary event
>> data on a TRACE event, so poll vs restart_syscall can be distinguished
>> that way.
>>
>> It seems even strace doesn't know how to find this information. For example:
>>
>> x86:
>> poll([{fd=3, events=POLLIN}], 1, 4294967295
>> ) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
>> --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=994, si_uid=1000} ---
>> --- stopped by SIGSTOP ---
>> --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=994, si_uid=1000} ---
>> restart_syscall(<... resuming interrupted call ...>
>>
>> ARM:
>> poll([{fd=3, events=POLLIN}], 1, -1
>> )    = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
>> --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=20563, si_uid=0} ---
>> --- stopped by SIGSTOP ---
>> --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=20563, si_uid=0} ---
>> poll([{fd=3, events=POLLIN}], 1, -1
>>
>> Would it make sense to add REGSET_SYSTEM_CALL to ARM? (Though this
>> begs the question, "Is restart_syscall visible during a trace on
>> arm64?", which I'll have to go check...)
>
> So, some further testing:
> - native arm64 presents "poll" again even to seccomp when
> restart_syscall is triggered (both via regs[8] and
> NT_ARM_SYSTEM_CALL).
> - compat mode on arm64 _does_ show syscall_restart (via ARM_r7).
>
> Which of these behaviors is intentional? :)
>


Just want to summarize the difference.
(please, correct me if i am mistaken)

Userspace has two ways to see actual syscall number:
   1. /proc/pid/syscall file
   2. ptrace

So the following is the table showing what syscall number
userspace sees using proc file or doing ptrace in case of restarted poll:

                         x86           ARM              ARM64     ARM64 compat
cat /proc/pid/syscall:   NR_restart    Not supported    ?????     ?????
               ptrace:   NR_restart    NR_poll          NR_poll   NR_restart


Not supported - should be fixed by these two patches, the behaviour should
                be similar to x86, i.e. userspace will see NR_restart

         ???? - I do not have ARM64 for testing.
                Kees, could you please cat /proc/pid/syscall for those two?
                I took a quick look into arm64 syscall.h/entry.S and seems it
                is supported fine and the result should be equal to ptrace.

So, yes, compatibility is important, but /proc/pid/syscall never works on ARM
and ptrace output is different even among ARM architectures.

--
Roman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-16 23:54                       ` Kees Cook
  2015-01-19  5:58                         ` Roman Peniaev
@ 2015-01-19  9:20                         ` Will Deacon
  2015-01-20 18:31                           ` Kees Cook
  1 sibling, 1 reply; 28+ messages in thread
From: Will Deacon @ 2015-01-19  9:20 UTC (permalink / raw)
  To: Kees Cook
  Cc: Russell King - ARM Linux, Roman Peniaev, Stefano Stabellini,
	Marc Zyngier, Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Fri, Jan 16, 2015 at 11:54:45PM +0000, Kees Cook wrote:
> On Fri, Jan 16, 2015 at 11:57 AM, Kees Cook <keescook@chromium.org> wrote:
> > On Fri, Jan 16, 2015 at 8:17 AM, Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> >> On Sat, Jan 17, 2015 at 01:08:11AM +0900, Roman Peniaev wrote:
> >>> On Sat, Jan 17, 2015 at 12:59 AM, Russell King - ARM Linux
> >>> <linux@arm.linux.org.uk> wrote:
> >>> > On Sat, Jan 17, 2015 at 12:57:02AM +0900, Roman Peniaev wrote:
> >>> >> On Fri, Jan 16, 2015 at 7:54 AM, Kees Cook <keescook@chromium.org> wrote:
> >>> >> > One interesting thing I noticed (which is unchanged by this series),
> >>> >> > but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
> >>> >> > not __NR_restart_syscall, even though it was a __NR_restart_syscall
> >>> >> > trap from seccomp. Is there a better place to see the actual syscall?
> >>> >>
> >>> >> As I understand we do not push new r7 to the stack, and ptrace uses the
> >>> >> old value.
> >>> >
> >>> > And why should we push r7 to the stack?  ptrace should be using the
> >>> > recorded system call number, rather than poking about on the stack
> >>> > itself.
> >>>
> >>> Probably we should not, but the behaviour comparing arm to x86 is different.
> >>
> >> We definitely should not, because changing the stacked value changes the
> >> value in r7 after the syscall has returned.  We have guaranteed that the
> >> value will be preserved across syscalls for years, so we really should
> >> not be changing that.
> >
> > Yeah, we can't mess with the registers. I was just asking for
> > clarification on how this is visible to userspace.
> >
> >>
> >>> Also there is no any way from userspace to figure out what syscall was
> >>> restarted, if you do not trace each syscall enter and exit from the
> >>> very beginning.
> >>
> >> Thinking about ptrace, that's been true for years.
> >>
> >> It really depends whether you consider the restart syscall a userspace
> >> thing or a kernelspace thing.  When you consider that the vast majority
> >> of syscall restarts are done internally in the kernel, and we just
> >> re-issue the syscall, it immediately brings up the question "why is
> >> the restart block method different?" and "should the restart block
> >> method be visible to userspace?"
> >>
> >> IMHO, it is prudent not to expose kernel internals to userspace unless
> >> there is a real reason to, otherwise they become part of the userspace
> >> API.
> >
> > I couldn't agree more, but restart_syscall is already visible to
> > userspace: it can be called directly, for example. And it's visible to
> > tracers.
> >
> > Unfortunately, the difference here is the visibility during trace
> > trap. On x86, it's exposed but on ARM, there's no way (that I can
> > find) to query the "true" syscall, even though the true syscall is
> > what triggers the tracer. The syscall number isn't provided by any
> > element of the ptrace event system, nor through siginfo, and must be
> > examined on a per-arch basis from registers.
> >
> > Seccomp does, however, provide a mechanism to pass arbitrary event
> > data on a TRACE event, so poll vs restart_syscall can be distinguished
> > that way.
> >
> > It seems even strace doesn't know how to find this information. For example:
> >
> > x86:
> > poll([{fd=3, events=POLLIN}], 1, 4294967295
> > ) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
> > --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=994, si_uid=1000} ---
> > --- stopped by SIGSTOP ---
> > --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=994, si_uid=1000} ---
> > restart_syscall(<... resuming interrupted call ...>
> >
> > ARM:
> > poll([{fd=3, events=POLLIN}], 1, -1
> > )    = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
> > --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=20563, si_uid=0} ---
> > --- stopped by SIGSTOP ---
> > --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=20563, si_uid=0} ---
> > poll([{fd=3, events=POLLIN}], 1, -1
> >
> > Would it make sense to add REGSET_SYSTEM_CALL to ARM? (Though this
> > begs the question, "Is restart_syscall visible during a trace on
> > arm64?", which I'll have to go check...)
> 
> So, some further testing:
> - native arm64 presents "poll" again even to seccomp when
> restart_syscall is triggered (both via regs[8] and
> NT_ARM_SYSTEM_CALL).

I'm fine either way for the native case, but we should stick with whetever
we end up with. Being compatible with ARM is probably a good idea. Do you
have a preference?

> - compat mode on arm64 _does_ show syscall_restart (via ARM_r7).

That sounds like a bug, then. Any chance you could look into a patch?

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-19  9:20                         ` Will Deacon
@ 2015-01-20 18:31                           ` Kees Cook
  2015-01-20 22:45                             ` Russell King - ARM Linux
  0 siblings, 1 reply; 28+ messages in thread
From: Kees Cook @ 2015-01-20 18:31 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King - ARM Linux, Roman Peniaev, Stefano Stabellini,
	Marc Zyngier, Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Mon, Jan 19, 2015 at 1:20 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jan 16, 2015 at 11:54:45PM +0000, Kees Cook wrote:
>> On Fri, Jan 16, 2015 at 11:57 AM, Kees Cook <keescook@chromium.org> wrote:
>> > On Fri, Jan 16, 2015 at 8:17 AM, Russell King - ARM Linux
>> > <linux@arm.linux.org.uk> wrote:
>> >> On Sat, Jan 17, 2015 at 01:08:11AM +0900, Roman Peniaev wrote:
>> >>> On Sat, Jan 17, 2015 at 12:59 AM, Russell King - ARM Linux
>> >>> <linux@arm.linux.org.uk> wrote:
>> >>> > On Sat, Jan 17, 2015 at 12:57:02AM +0900, Roman Peniaev wrote:
>> >>> >> On Fri, Jan 16, 2015 at 7:54 AM, Kees Cook <keescook@chromium.org> wrote:
>> >>> >> > One interesting thing I noticed (which is unchanged by this series),
>> >>> >> > but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
>> >>> >> > not __NR_restart_syscall, even though it was a __NR_restart_syscall
>> >>> >> > trap from seccomp. Is there a better place to see the actual syscall?
>> >>> >>
>> >>> >> As I understand we do not push new r7 to the stack, and ptrace uses the
>> >>> >> old value.
>> >>> >
>> >>> > And why should we push r7 to the stack?  ptrace should be using the
>> >>> > recorded system call number, rather than poking about on the stack
>> >>> > itself.
>> >>>
>> >>> Probably we should not, but the behaviour comparing arm to x86 is different.
>> >>
>> >> We definitely should not, because changing the stacked value changes the
>> >> value in r7 after the syscall has returned.  We have guaranteed that the
>> >> value will be preserved across syscalls for years, so we really should
>> >> not be changing that.
>> >
>> > Yeah, we can't mess with the registers. I was just asking for
>> > clarification on how this is visible to userspace.
>> >
>> >>
>> >>> Also there is no any way from userspace to figure out what syscall was
>> >>> restarted, if you do not trace each syscall enter and exit from the
>> >>> very beginning.
>> >>
>> >> Thinking about ptrace, that's been true for years.
>> >>
>> >> It really depends whether you consider the restart syscall a userspace
>> >> thing or a kernelspace thing.  When you consider that the vast majority
>> >> of syscall restarts are done internally in the kernel, and we just
>> >> re-issue the syscall, it immediately brings up the question "why is
>> >> the restart block method different?" and "should the restart block
>> >> method be visible to userspace?"
>> >>
>> >> IMHO, it is prudent not to expose kernel internals to userspace unless
>> >> there is a real reason to, otherwise they become part of the userspace
>> >> API.
>> >
>> > I couldn't agree more, but restart_syscall is already visible to
>> > userspace: it can be called directly, for example. And it's visible to
>> > tracers.
>> >
>> > Unfortunately, the difference here is the visibility during trace
>> > trap. On x86, it's exposed but on ARM, there's no way (that I can
>> > find) to query the "true" syscall, even though the true syscall is
>> > what triggers the tracer. The syscall number isn't provided by any
>> > element of the ptrace event system, nor through siginfo, and must be
>> > examined on a per-arch basis from registers.
>> >
>> > Seccomp does, however, provide a mechanism to pass arbitrary event
>> > data on a TRACE event, so poll vs restart_syscall can be distinguished
>> > that way.
>> >
>> > It seems even strace doesn't know how to find this information. For example:
>> >
>> > x86:
>> > poll([{fd=3, events=POLLIN}], 1, 4294967295
>> > ) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
>> > --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=994, si_uid=1000} ---
>> > --- stopped by SIGSTOP ---
>> > --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=994, si_uid=1000} ---
>> > restart_syscall(<... resuming interrupted call ...>
>> >
>> > ARM:
>> > poll([{fd=3, events=POLLIN}], 1, -1
>> > )    = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
>> > --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=20563, si_uid=0} ---
>> > --- stopped by SIGSTOP ---
>> > --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=20563, si_uid=0} ---
>> > poll([{fd=3, events=POLLIN}], 1, -1
>> >
>> > Would it make sense to add REGSET_SYSTEM_CALL to ARM? (Though this
>> > begs the question, "Is restart_syscall visible during a trace on
>> > arm64?", which I'll have to go check...)
>>
>> So, some further testing:
>> - native arm64 presents "poll" again even to seccomp when
>> restart_syscall is triggered (both via regs[8] and
>> NT_ARM_SYSTEM_CALL).
>
> I'm fine either way for the native case, but we should stick with whetever
> we end up with. Being compatible with ARM is probably a good idea. Do you
> have a preference?

I actually prefer seccomp matching ptrace register visibility, so I'd
like to see restart_syscall everywhere. (It is a real entry point
after all.)

>> - compat mode on arm64 _does_ show syscall_restart (via ARM_r7).
>
> That sounds like a bug, then. Any chance you could look into a patch?

Well, actually, I think this is the _correct_ behavior, and native
arm64 and native arm are the broken pieces. If no one objects to
fixing this, I can work on sorting it out for ptrace, since it looks
like Roman has procfs handled.

Russell, given the rest of this thread, would you be okay exposing
"true" syscall to ARM? Perhaps we could implement NT_ARM_SYSTEM_CALL
on arm32?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-19  5:58                         ` Roman Peniaev
@ 2015-01-20 18:56                           ` Kees Cook
  0 siblings, 0 replies; 28+ messages in thread
From: Kees Cook @ 2015-01-20 18:56 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Russell King - ARM Linux, Will Deacon, Stefano Stabellini,
	Marc Zyngier, Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Sun, Jan 18, 2015 at 9:58 PM, Roman Peniaev <r.peniaev@gmail.com> wrote:
> On Sat, Jan 17, 2015 at 8:54 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Fri, Jan 16, 2015 at 11:57 AM, Kees Cook <keescook@chromium.org> wrote:
>>> On Fri, Jan 16, 2015 at 8:17 AM, Russell King - ARM Linux
>>> <linux@arm.linux.org.uk> wrote:
>>>> On Sat, Jan 17, 2015 at 01:08:11AM +0900, Roman Peniaev wrote:
>>>>> On Sat, Jan 17, 2015 at 12:59 AM, Russell King - ARM Linux
>>>>> <linux@arm.linux.org.uk> wrote:
>>>>> > On Sat, Jan 17, 2015 at 12:57:02AM +0900, Roman Peniaev wrote:
>>>>> >> On Fri, Jan 16, 2015 at 7:54 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>> >> > One interesting thing I noticed (which is unchanged by this series),
>>>>> >> > but pulling ARM_r7 during the seccomp ptrace event shows __NR_poll,
>>>>> >> > not __NR_restart_syscall, even though it was a __NR_restart_syscall
>>>>> >> > trap from seccomp. Is there a better place to see the actual syscall?
>>>>> >>
>>>>> >> As I understand we do not push new r7 to the stack, and ptrace uses the
>>>>> >> old value.
>>>>> >
>>>>> > And why should we push r7 to the stack?  ptrace should be using the
>>>>> > recorded system call number, rather than poking about on the stack
>>>>> > itself.
>>>>>
>>>>> Probably we should not, but the behaviour comparing arm to x86 is different.
>>>>
>>>> We definitely should not, because changing the stacked value changes the
>>>> value in r7 after the syscall has returned.  We have guaranteed that the
>>>> value will be preserved across syscalls for years, so we really should
>>>> not be changing that.
>>>
>>> Yeah, we can't mess with the registers. I was just asking for
>>> clarification on how this is visible to userspace.
>>>
>>>>
>>>>> Also there is no any way from userspace to figure out what syscall was
>>>>> restarted, if you do not trace each syscall enter and exit from the
>>>>> very beginning.
>>>>
>>>> Thinking about ptrace, that's been true for years.
>>>>
>>>> It really depends whether you consider the restart syscall a userspace
>>>> thing or a kernelspace thing.  When you consider that the vast majority
>>>> of syscall restarts are done internally in the kernel, and we just
>>>> re-issue the syscall, it immediately brings up the question "why is
>>>> the restart block method different?" and "should the restart block
>>>> method be visible to userspace?"
>>>>
>>>> IMHO, it is prudent not to expose kernel internals to userspace unless
>>>> there is a real reason to, otherwise they become part of the userspace
>>>> API.
>>>
>>> I couldn't agree more, but restart_syscall is already visible to
>>> userspace: it can be called directly, for example. And it's visible to
>>> tracers.
>>>
>>> Unfortunately, the difference here is the visibility during trace
>>> trap. On x86, it's exposed but on ARM, there's no way (that I can
>>> find) to query the "true" syscall, even though the true syscall is
>>> what triggers the tracer. The syscall number isn't provided by any
>>> element of the ptrace event system, nor through siginfo, and must be
>>> examined on a per-arch basis from registers.
>>>
>>> Seccomp does, however, provide a mechanism to pass arbitrary event
>>> data on a TRACE event, so poll vs restart_syscall can be distinguished
>>> that way.
>>>
>>> It seems even strace doesn't know how to find this information. For example:
>>>
>>> x86:
>>> poll([{fd=3, events=POLLIN}], 1, 4294967295
>>> ) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
>>> --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=994, si_uid=1000} ---
>>> --- stopped by SIGSTOP ---
>>> --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=994, si_uid=1000} ---
>>> restart_syscall(<... resuming interrupted call ...>
>>>
>>> ARM:
>>> poll([{fd=3, events=POLLIN}], 1, -1
>>> )    = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
>>> --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=20563, si_uid=0} ---
>>> --- stopped by SIGSTOP ---
>>> --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=20563, si_uid=0} ---
>>> poll([{fd=3, events=POLLIN}], 1, -1
>>>
>>> Would it make sense to add REGSET_SYSTEM_CALL to ARM? (Though this
>>> begs the question, "Is restart_syscall visible during a trace on
>>> arm64?", which I'll have to go check...)
>>
>> So, some further testing:
>> - native arm64 presents "poll" again even to seccomp when
>> restart_syscall is triggered (both via regs[8] and
>> NT_ARM_SYSTEM_CALL).
>> - compat mode on arm64 _does_ show syscall_restart (via ARM_r7).
>>
>> Which of these behaviors is intentional? :)
>>
>
>
> Just want to summarize the difference.
> (please, correct me if i am mistaken)
>
> Userspace has two ways to see actual syscall number:
>    1. /proc/pid/syscall file
>    2. ptrace
>
> So the following is the table showing what syscall number
> userspace sees using proc file or doing ptrace in case of restarted poll:
>
>                          x86           ARM              ARM64     ARM64 compat
> cat /proc/pid/syscall:   NR_restart    Not supported    ?????     ?????
>                ptrace:   NR_restart    NR_poll          NR_poll   NR_restart
>
>
> Not supported - should be fixed by these two patches, the behaviour should
>                 be similar to x86, i.e. userspace will see NR_restart
>
>          ???? - I do not have ARM64 for testing.
>                 Kees, could you please cat /proc/pid/syscall for those two?
>                 I took a quick look into arm64 syscall.h/entry.S and seems it
>                 is supported fine and the result should be equal to ptrace.

Yup, checking this directly agrees, /proc/$pid/syscall for me:

native arm64 shows NR_poll
arm64 compat shows NR_restart

> So, yes, compatibility is important, but /proc/pid/syscall never works on ARM
> and ptrace output is different even among ARM architectures.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-20 18:31                           ` Kees Cook
@ 2015-01-20 22:45                             ` Russell King - ARM Linux
  2015-01-20 23:04                               ` Russell King - ARM Linux
  0 siblings, 1 reply; 28+ messages in thread
From: Russell King - ARM Linux @ 2015-01-20 22:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: Will Deacon, Roman Peniaev, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Sekhar Nori, linux-kernel, stable,
	Christoffer Dall, linux-arm-kernel

On Tue, Jan 20, 2015 at 10:31:57AM -0800, Kees Cook wrote:
> On Mon, Jan 19, 2015 at 1:20 AM, Will Deacon <will.deacon@arm.com> wrote:
> > I'm fine either way for the native case, but we should stick with whetever
> > we end up with. Being compatible with ARM is probably a good idea. Do you
> > have a preference?
> 
> I actually prefer seccomp matching ptrace register visibility, so I'd
> like to see restart_syscall everywhere. (It is a real entry point
> after all.)
> 
> >> - compat mode on arm64 _does_ show syscall_restart (via ARM_r7).
> >
> > That sounds like a bug, then. Any chance you could look into a patch?
> 
> Well, actually, I think this is the _correct_ behavior, and native
> arm64 and native arm are the broken pieces. If no one objects to
> fixing this, I can work on sorting it out for ptrace, since it looks
> like Roman has procfs handled.
> 
> Russell, given the rest of this thread, would you be okay exposing
> "true" syscall to ARM? Perhaps we could implement NT_ARM_SYSTEM_CALL
> on arm32?

Well, the whole question is this: is restarting a system call like
usleep() really a separate system call, or is it a kernel implementation
detail?

If you wanted seccomp to see this, what would be the use case?  Why
would seccomp want to block this syscall?  Does it make sense for
seccomp to block this syscall when it doesn't block something like
usleep() and then have usleep() fail just because the thread received
a signal?

I personally regard the whole restart system call thing as a purely
kernel internal thing which should not be exposed to userland.  If
we decide that it should be exposed to userland, then it becomes part
of the user ABI, and it /could/ become difficult if we needed to
change it in future - and I'd rather not get into the "oh shit, we
can't change it because that would break app X" crap.

Before we had the restart syscall, we used to just re-issue the
original syscall, so actually on ARM we preserved the older Linux
behaviour.

Can you put forward a strong argument justifying why the restart
system call should be exposed, other than "other architectures do" ?

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-20 22:45                             ` Russell King - ARM Linux
@ 2015-01-20 23:04                               ` Russell King - ARM Linux
  2015-01-21 23:32                                 ` Kees Cook
  0 siblings, 1 reply; 28+ messages in thread
From: Russell King - ARM Linux @ 2015-01-20 23:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-arm-kernel, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Will Deacon, linux-kernel, Sekhar Nori,
	Roman Peniaev, stable, Christoffer Dall

On Tue, Jan 20, 2015 at 10:45:19PM +0000, Russell King - ARM Linux wrote:
> Well, the whole question is this: is restarting a system call like
> usleep() really a separate system call, or is it a kernel implementation
> detail?
> 
> If you wanted seccomp to see this, what would be the use case?  Why
> would seccomp want to block this syscall?  Does it make sense for
> seccomp to block this syscall when it doesn't block something like
> usleep() and then have usleep() fail just because the thread received
> a signal?
> 
> I personally regard the whole restart system call thing as a purely
> kernel internal thing which should not be exposed to userland.  If
> we decide that it should be exposed to userland, then it becomes part
> of the user ABI, and it /could/ become difficult if we needed to
> change it in future - and I'd rather not get into the "oh shit, we
> can't change it because that would break app X" crap.

Here's a scenario where it could become a problem:

Let's say that we want to use seccomp to secure some code which issues
system calls.  We determine that the app uses system calls which don't
result in the restart system call being issued, so we decide to ask
seccomp to block the restart system call.  Some of these system calls
that the app was using are restartable system calls.

When these system calls are restarted, what we see via ptrace etc is
that the system call simply gets re-issued as its own system call.

In a future kernel version, we decide that we could really do with one
of those system calls using the restart block feature, so we arrange
for it to set up the restart block, and return -ERESTART_BLOCK.  That's
fine for most applications, but this app now breaks.

The side effect of that breakage is that we have to revert that kernel
change - because we've broken userland, and that's simply not allowed.

Now look at the alternative: we don't make the restart syscall visible.
This means that we hide that detail, and we actually reflect the
behaviour that we've had for the other system call restart mechanisms,
and we don't have to fear userspace breakage as a result of switching
from one restart mechanism to another.

I am very much of the opinion that we should be trying to limit the
exposure of inappropriate kernel internal details to userland, because
userland has a habbit of becoming reliant on them, and when it does,
it makes kernel maintanence unnecessarily harder.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-20 23:04                               ` Russell King - ARM Linux
@ 2015-01-21 23:32                                 ` Kees Cook
  2015-01-22  1:24                                   ` Roman Peniaev
  0 siblings, 1 reply; 28+ messages in thread
From: Kees Cook @ 2015-01-21 23:32 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arm-kernel, Stefano Stabellini, Marc Zyngier,
	Catalin Marinas, Will Deacon, linux-kernel, Sekhar Nori,
	Roman Peniaev, stable, Christoffer Dall

On Tue, Jan 20, 2015 at 3:04 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jan 20, 2015 at 10:45:19PM +0000, Russell King - ARM Linux wrote:
>> Well, the whole question is this: is restarting a system call like
>> usleep() really a separate system call, or is it a kernel implementation
>> detail?
>>
>> If you wanted seccomp to see this, what would be the use case?  Why
>> would seccomp want to block this syscall?  Does it make sense for
>> seccomp to block this syscall when it doesn't block something like
>> usleep() and then have usleep() fail just because the thread received
>> a signal?
>>
>> I personally regard the whole restart system call thing as a purely
>> kernel internal thing which should not be exposed to userland.  If
>> we decide that it should be exposed to userland, then it becomes part
>> of the user ABI, and it /could/ become difficult if we needed to
>> change it in future - and I'd rather not get into the "oh shit, we
>> can't change it because that would break app X" crap.
>
> Here's a scenario where it could become a problem:
>
> Let's say that we want to use seccomp to secure some code which issues
> system calls.  We determine that the app uses system calls which don't
> result in the restart system call being issued, so we decide to ask
> seccomp to block the restart system call.  Some of these system calls
> that the app was using are restartable system calls.
>
> When these system calls are restarted, what we see via ptrace etc is
> that the system call simply gets re-issued as its own system call.
>
> In a future kernel version, we decide that we could really do with one
> of those system calls using the restart block feature, so we arrange
> for it to set up the restart block, and return -ERESTART_BLOCK.  That's
> fine for most applications, but this app now breaks.
>
> The side effect of that breakage is that we have to revert that kernel
> change - because we've broken userland, and that's simply not allowed.
>
> Now look at the alternative: we don't make the restart syscall visible.
> This means that we hide that detail, and we actually reflect the
> behaviour that we've had for the other system call restart mechanisms,
> and we don't have to fear userspace breakage as a result of switching
> from one restart mechanism to another.
>
> I am very much of the opinion that we should be trying to limit the
> exposure of inappropriate kernel internal details to userland, because
> userland has a habbit of becoming reliant on them, and when it does,
> it makes kernel maintanence unnecessarily harder.

I totally agree with you. :) My question here is more about what we
should do with what we currently have since we have some unexpected
combinations.

There is already an __NR_restart_syscall syscall and it seems like
it's already part of the userspace ABI:
 - it is possible to call it from userspace directly
 - seccomp "sees" it
 - ptrace doesn't see it

Native ARM64 hides the restart from both seccomp and ptrace, and this
seems like the right idea, except that restart_syscall is still
callable from userspace. I don't think there's a way to make that
vanish, which means we'll always have an exposed syscall. If anything
goes wrong with it (which we've been quite close to recently[1]),
there would be no way to have seccomp filter it.

So, at the least, I'd like arm64 to NOT hide restart_syscall from
seccomp, and at best I'd like both arm and arm64 to (somehow) entirely
remove restart_syscall from the userspace ABI so it wouldn't need to
be filtered, and wouldn't become a weird ABI hiccup as you've
described.

I fail to imagine a way to remove restart_syscall from userspace, so
I'm left with wanting parity of behavior between ARM and ARM64 (and
x86). What's the right way forward?

-Kees

[1] https://git.kernel.org/linus/849151dd5481bc8acb1d287a299b5d6a4ca9f1c3

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-21 23:32                                 ` Kees Cook
@ 2015-01-22  1:24                                   ` Roman Peniaev
  2015-01-22 18:07                                     ` Kees Cook
  0 siblings, 1 reply; 28+ messages in thread
From: Roman Peniaev @ 2015-01-22  1:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: Russell King - ARM Linux, linux-arm-kernel, Stefano Stabellini,
	Marc Zyngier, Catalin Marinas, Will Deacon, linux-kernel,
	Sekhar Nori, stable, Christoffer Dall

On Thu, Jan 22, 2015 at 8:32 AM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, Jan 20, 2015 at 3:04 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Tue, Jan 20, 2015 at 10:45:19PM +0000, Russell King - ARM Linux wrote:
>>> Well, the whole question is this: is restarting a system call like
>>> usleep() really a separate system call, or is it a kernel implementation
>>> detail?
>>>
>>> If you wanted seccomp to see this, what would be the use case?  Why
>>> would seccomp want to block this syscall?  Does it make sense for
>>> seccomp to block this syscall when it doesn't block something like
>>> usleep() and then have usleep() fail just because the thread received
>>> a signal?
>>>
>>> I personally regard the whole restart system call thing as a purely
>>> kernel internal thing which should not be exposed to userland.  If
>>> we decide that it should be exposed to userland, then it becomes part
>>> of the user ABI, and it /could/ become difficult if we needed to
>>> change it in future - and I'd rather not get into the "oh shit, we
>>> can't change it because that would break app X" crap.
>>
>> Here's a scenario where it could become a problem:
>>
>> Let's say that we want to use seccomp to secure some code which issues
>> system calls.  We determine that the app uses system calls which don't
>> result in the restart system call being issued, so we decide to ask
>> seccomp to block the restart system call.  Some of these system calls
>> that the app was using are restartable system calls.
>>
>> When these system calls are restarted, what we see via ptrace etc is
>> that the system call simply gets re-issued as its own system call.
>>
>> In a future kernel version, we decide that we could really do with one
>> of those system calls using the restart block feature, so we arrange
>> for it to set up the restart block, and return -ERESTART_BLOCK.  That's
>> fine for most applications, but this app now breaks.
>>
>> The side effect of that breakage is that we have to revert that kernel
>> change - because we've broken userland, and that's simply not allowed.
>>
>> Now look at the alternative: we don't make the restart syscall visible.
>> This means that we hide that detail, and we actually reflect the
>> behaviour that we've had for the other system call restart mechanisms,
>> and we don't have to fear userspace breakage as a result of switching
>> from one restart mechanism to another.
>>
>> I am very much of the opinion that we should be trying to limit the
>> exposure of inappropriate kernel internal details to userland, because
>> userland has a habbit of becoming reliant on them, and when it does,
>> it makes kernel maintanence unnecessarily harder.
>
> I totally agree with you. :) My question here is more about what we
> should do with what we currently have since we have some unexpected
> combinations.
>
> There is already an __NR_restart_syscall syscall and it seems like
> it's already part of the userspace ABI:
>  - it is possible to call it from userspace directly
>  - seccomp "sees" it
>  - ptrace doesn't see it
>
> Native ARM64 hides the restart from both seccomp and ptrace, and this
> seems like the right idea, except that restart_syscall is still
> callable from userspace. I don't think there's a way to make that
> vanish, which means we'll always have an exposed syscall. If anything
> goes wrong with it (which we've been quite close to recently[1]),
> there would be no way to have seccomp filter it.
>
> So, at the least, I'd like arm64 to NOT hide restart_syscall from
> seccomp, and at best I'd like both arm and arm64 to (somehow) entirely
> remove restart_syscall from the userspace ABI so it wouldn't need to
> be filtered, and wouldn't become a weird ABI hiccup as you've
> described.
>
> I fail to imagine a way to remove restart_syscall from userspace, so
> I'm left with wanting parity of behavior between ARM and ARM64 (and
> x86). What's the right way forward?

My sufferings are an opposite of what seccompt expects: currently I do
not see any possible way (from userspace) to get syscall number which was
restarted, because at some given time userspace checks the procfs
syscall file and sees NR_restart there, without any chance to understand
what exactly was restarted (I am talking about some kind of debugging
tool which does dead-lock analysis of stuck tasks).

I totally agree with Russell not to provide kernel guts to userspace,
but it is already done.  Too late.

So probably there is a need to split syscall on two numbers:
real and effective.  Real number we have right now on x86.

And this should be done for both ptrace and procfs syscall file.
(am I right that only for ARM we already have PTRACE_SET_SYSCALL?
 seems we can add also real/effective getter)

--
Roman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-22  1:24                                   ` Roman Peniaev
@ 2015-01-22 18:07                                     ` Kees Cook
  2015-01-23  4:17                                       ` Roman Peniaev
  0 siblings, 1 reply; 28+ messages in thread
From: Kees Cook @ 2015-01-22 18:07 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Russell King - ARM Linux, linux-arm-kernel, Stefano Stabellini,
	Marc Zyngier, Catalin Marinas, Will Deacon, linux-kernel,
	Sekhar Nori, stable, Christoffer Dall

On Wed, Jan 21, 2015 at 5:24 PM, Roman Peniaev <r.peniaev@gmail.com> wrote:
> On Thu, Jan 22, 2015 at 8:32 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Tue, Jan 20, 2015 at 3:04 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>>> On Tue, Jan 20, 2015 at 10:45:19PM +0000, Russell King - ARM Linux wrote:
>>>> Well, the whole question is this: is restarting a system call like
>>>> usleep() really a separate system call, or is it a kernel implementation
>>>> detail?
>>>>
>>>> If you wanted seccomp to see this, what would be the use case?  Why
>>>> would seccomp want to block this syscall?  Does it make sense for
>>>> seccomp to block this syscall when it doesn't block something like
>>>> usleep() and then have usleep() fail just because the thread received
>>>> a signal?
>>>>
>>>> I personally regard the whole restart system call thing as a purely
>>>> kernel internal thing which should not be exposed to userland.  If
>>>> we decide that it should be exposed to userland, then it becomes part
>>>> of the user ABI, and it /could/ become difficult if we needed to
>>>> change it in future - and I'd rather not get into the "oh shit, we
>>>> can't change it because that would break app X" crap.
>>>
>>> Here's a scenario where it could become a problem:
>>>
>>> Let's say that we want to use seccomp to secure some code which issues
>>> system calls.  We determine that the app uses system calls which don't
>>> result in the restart system call being issued, so we decide to ask
>>> seccomp to block the restart system call.  Some of these system calls
>>> that the app was using are restartable system calls.
>>>
>>> When these system calls are restarted, what we see via ptrace etc is
>>> that the system call simply gets re-issued as its own system call.
>>>
>>> In a future kernel version, we decide that we could really do with one
>>> of those system calls using the restart block feature, so we arrange
>>> for it to set up the restart block, and return -ERESTART_BLOCK.  That's
>>> fine for most applications, but this app now breaks.
>>>
>>> The side effect of that breakage is that we have to revert that kernel
>>> change - because we've broken userland, and that's simply not allowed.
>>>
>>> Now look at the alternative: we don't make the restart syscall visible.
>>> This means that we hide that detail, and we actually reflect the
>>> behaviour that we've had for the other system call restart mechanisms,
>>> and we don't have to fear userspace breakage as a result of switching
>>> from one restart mechanism to another.
>>>
>>> I am very much of the opinion that we should be trying to limit the
>>> exposure of inappropriate kernel internal details to userland, because
>>> userland has a habbit of becoming reliant on them, and when it does,
>>> it makes kernel maintanence unnecessarily harder.
>>
>> I totally agree with you. :) My question here is more about what we
>> should do with what we currently have since we have some unexpected
>> combinations.
>>
>> There is already an __NR_restart_syscall syscall and it seems like
>> it's already part of the userspace ABI:
>>  - it is possible to call it from userspace directly
>>  - seccomp "sees" it
>>  - ptrace doesn't see it
>>
>> Native ARM64 hides the restart from both seccomp and ptrace, and this
>> seems like the right idea, except that restart_syscall is still
>> callable from userspace. I don't think there's a way to make that
>> vanish, which means we'll always have an exposed syscall. If anything
>> goes wrong with it (which we've been quite close to recently[1]),
>> there would be no way to have seccomp filter it.
>>
>> So, at the least, I'd like arm64 to NOT hide restart_syscall from
>> seccomp, and at best I'd like both arm and arm64 to (somehow) entirely
>> remove restart_syscall from the userspace ABI so it wouldn't need to
>> be filtered, and wouldn't become a weird ABI hiccup as you've
>> described.
>>
>> I fail to imagine a way to remove restart_syscall from userspace, so
>> I'm left with wanting parity of behavior between ARM and ARM64 (and
>> x86). What's the right way forward?
>
> My sufferings are an opposite of what seccompt expects: currently I do
> not see any possible way (from userspace) to get syscall number which was
> restarted, because at some given time userspace checks the procfs
> syscall file and sees NR_restart there, without any chance to understand
> what exactly was restarted (I am talking about some kind of debugging
> tool which does dead-lock analysis of stuck tasks).
>
> I totally agree with Russell not to provide kernel guts to userspace,
> but it is already done.  Too late.
>
> So probably there is a need to split syscall on two numbers:
> real and effective.  Real number we have right now on x86.
>
> And this should be done for both ptrace and procfs syscall file.
> (am I right that only for ARM we already have PTRACE_SET_SYSCALL?
>  seems we can add also real/effective getter)

ARM's syscall "get" is via PTRACE_GETREGSET with NT_PRSTATUS, reading ARM_r7:

int syscall_get(pid_t tracee) {
        struct iovec iov;
        struct pt_regs;

        iov.iov_base = &regs;
        iov.iov_len = sizeof(regs);
        if (ptrace(PTRACE_GETREGSET, tracee, NT_PRSTATUS, &iov) < 0) {
               perror("PTRACE_GETREGSET, NT_PRSTATUS");
               return -1;
        }
        return regs.ARM_r7;
}

ARM's syscall "set" is via PTRACE_SET_SYSCALL:

int syscall_set(int syscall, pid_t tracee) {
        return ptrace(PTRACE_SET_SYSCALL, tracee, NULL, syscall);
}

Landing in 3.19, ARM64 has get/set via PTRACE_[GS]ETREGSET with
NT_ARM_SYSTEM_CALL:

int syscall_get(pid_t tracee) {
        struct iovec iov;
        int syscall;

        iov.iov_base = &syscall;
        iov.iov_len = sizeof(syscall);
        if (ptrace(PTRACE_GETREGSET, tracee,
                  NT_ARM_SYSTEM_CALL, &iov) < 0) {
                perror("PTRACE_GETREGSET, NT_ARM_SYSTEM_CALL");
                return -1;
        }
        return syscall;
}

int syscall_set(int syscall, pid_t tracee) {
        iov.iov_base = &syscall;
        iov.iov_len = sizeof(syscall);
        return ptrace(PTRACE_SETREGSET, tracee,
                  NT_ARM_SYSTEM_CALL, &iov);
}

Prior to 3.19, ARM64 could use PTRACE_[GS]ETREGSET, NT_STATUS on
struct user_pt_regs and regs[8].

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall
  2015-01-22 18:07                                     ` Kees Cook
@ 2015-01-23  4:17                                       ` Roman Peniaev
  0 siblings, 0 replies; 28+ messages in thread
From: Roman Peniaev @ 2015-01-23  4:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: Russell King - ARM Linux, linux-arm-kernel, Stefano Stabellini,
	Marc Zyngier, Catalin Marinas, Will Deacon, linux-kernel,
	Sekhar Nori, stable, Christoffer Dall

On Fri, Jan 23, 2015 at 3:07 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 21, 2015 at 5:24 PM, Roman Peniaev <r.peniaev@gmail.com> wrote:
>> On Thu, Jan 22, 2015 at 8:32 AM, Kees Cook <keescook@chromium.org> wrote:
>>> On Tue, Jan 20, 2015 at 3:04 PM, Russell King - ARM Linux
>>> <linux@arm.linux.org.uk> wrote:
[snip]
>>>
>>> Native ARM64 hides the restart from both seccomp and ptrace, and this
>>> seems like the right idea, except that restart_syscall is still
>>> callable from userspace. I don't think there's a way to make that
>>> vanish, which means we'll always have an exposed syscall. If anything
>>> goes wrong with it (which we've been quite close to recently[1]),
>>> there would be no way to have seccomp filter it.
>>>
>>> So, at the least, I'd like arm64 to NOT hide restart_syscall from
>>> seccomp, and at best I'd like both arm and arm64 to (somehow) entirely
>>> remove restart_syscall from the userspace ABI so it wouldn't need to
>>> be filtered, and wouldn't become a weird ABI hiccup as you've
>>> described.
>>>
>>> I fail to imagine a way to remove restart_syscall from userspace, so
>>> I'm left with wanting parity of behavior between ARM and ARM64 (and
>>> x86). What's the right way forward?
>>
>> My sufferings are an opposite of what seccompt expects: currently I do
>> not see any possible way (from userspace) to get syscall number which was
>> restarted, because at some given time userspace checks the procfs
>> syscall file and sees NR_restart there, without any chance to understand
>> what exactly was restarted (I am talking about some kind of debugging
>> tool which does dead-lock analysis of stuck tasks).
>>
>> I totally agree with Russell not to provide kernel guts to userspace,
>> but it is already done.  Too late.
>>
>> So probably there is a need to split syscall on two numbers:
>> real and effective.  Real number we have right now on x86.
>>
>> And this should be done for both ptrace and procfs syscall file.
>> (am I right that only for ARM we already have PTRACE_SET_SYSCALL?
>>  seems we can add also real/effective getter)
>
> ARM's syscall "get" is via PTRACE_GETREGSET with NT_PRSTATUS, reading ARM_r7:
>
> int syscall_get(pid_t tracee) {
>         struct iovec iov;
>         struct pt_regs;
>
>         iov.iov_base = &regs;
>         iov.iov_len = sizeof(regs);
>         if (ptrace(PTRACE_GETREGSET, tracee, NT_PRSTATUS, &iov) < 0) {
>                perror("PTRACE_GETREGSET, NT_PRSTATUS");
>                return -1;
>         }
>         return regs.ARM_r7;
> }
>
> ARM's syscall "set" is via PTRACE_SET_SYSCALL:
>
> int syscall_set(int syscall, pid_t tracee) {
>         return ptrace(PTRACE_SET_SYSCALL, tracee, NULL, syscall);
> }
>
> Landing in 3.19, ARM64 has get/set via PTRACE_[GS]ETREGSET with
> NT_ARM_SYSTEM_CALL:
>
> int syscall_get(pid_t tracee) {
>         struct iovec iov;
>         int syscall;
>
>         iov.iov_base = &syscall;
>         iov.iov_len = sizeof(syscall);
>         if (ptrace(PTRACE_GETREGSET, tracee,
>                   NT_ARM_SYSTEM_CALL, &iov) < 0) {
>                 perror("PTRACE_GETREGSET, NT_ARM_SYSTEM_CALL");
>                 return -1;
>         }
>         return syscall;
> }
>
> int syscall_set(int syscall, pid_t tracee) {
>         iov.iov_base = &syscall;
>         iov.iov_len = sizeof(syscall);
>         return ptrace(PTRACE_SETREGSET, tracee,
>                   NT_ARM_SYSTEM_CALL, &iov);
> }
>
> Prior to 3.19, ARM64 could use PTRACE_[GS]ETREGSET, NT_STATUS on
> struct user_pt_regs and regs[8].
>

Thanks.  I also came up with this possible way to retrieve effective
syscall.  But, as you showed, I still can get NR_restart (it is 32bit
userspace on ARM64, right?)

Also, this approach is definitely arch dependent (at least I have to
know the register for scnr, also [probably] I have to distinguish EABI
and OABI on ARM).

And also all this ptrace machinery is not as fast as reading from
procfs syscall file (no deal with signals, syscall restarts, etc).

But procfs syscall is not implemented on ARM and, even if it is,
NR_restart spoils me everything.

But still thanks.

--
Roman

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-01-23  4:18 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-11 14:32 [PATCH 0/2] ARM: set thread_info->syscall just before sys_* execution Roman Pen
2015-01-11 14:32 ` [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall Roman Pen
2015-01-12 18:39   ` Will Deacon
2015-01-13  8:35     ` Roman Peniaev
2015-01-14  2:23       ` Roman Peniaev
2015-01-14 20:51       ` Kees Cook
2015-01-15  1:54         ` Roman Peniaev
2015-01-15 22:54           ` Kees Cook
2015-01-16 15:57             ` Roman Peniaev
2015-01-16 15:59               ` Russell King - ARM Linux
2015-01-16 16:08                 ` Roman Peniaev
2015-01-16 16:17                   ` Russell King - ARM Linux
2015-01-16 19:57                     ` Kees Cook
2015-01-16 23:54                       ` Kees Cook
2015-01-19  5:58                         ` Roman Peniaev
2015-01-20 18:56                           ` Kees Cook
2015-01-19  9:20                         ` Will Deacon
2015-01-20 18:31                           ` Kees Cook
2015-01-20 22:45                             ` Russell King - ARM Linux
2015-01-20 23:04                               ` Russell King - ARM Linux
2015-01-21 23:32                                 ` Kees Cook
2015-01-22  1:24                                   ` Roman Peniaev
2015-01-22 18:07                                     ` Kees Cook
2015-01-23  4:17                                       ` Roman Peniaev
2015-01-11 14:32 ` [PATCH 2/2] ARM: entry-common,ptrace: do not pass scno to syscall_trace_enter Roman Pen
2015-01-13 20:08   ` Kees Cook
2015-01-13 23:21     ` Roman Peniaev
2015-01-13 23:43       ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).