LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] perf: Fix oops when kthread execs user process
@ 2019-05-28 12:31 Young Xiao
  2019-05-28 12:41 ` Russell King - ARM Linux admin
  2019-05-28 14:01 ` Peter Zijlstra
  0 siblings, 2 replies; 30+ messages in thread
From: Young Xiao @ 2019-05-28 12:31 UTC (permalink / raw)
  To: will.deacon, linux, mark.rutland, mingo, bp, hpa, x86, peterz,
	kan.liang, linux-arm-kernel, linux-kernel
  Cc: Young Xiao

When a kthread calls call_usermodehelper() the steps are:
  1. allocate current->mm
  2. load_elf_binary()
  3. populate current->thread.regs

While doing this, interrupts are not disabled. If there is a perf
interrupt in the middle of this process (i.e. step 1 has completed
but not yet reached to step 3) and if perf tries to read userspace
regs, kernel oops.

Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
pt_regs are not set.

See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs
user process") for details.

Signed-off-by: Young Xiao <92siuyang@gmail.com>
---
 arch/arm/kernel/perf_regs.c   | 3 ++-
 arch/arm64/kernel/perf_regs.c | 3 ++-
 arch/x86/kernel/perf_regs.c   | 3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/perf_regs.c b/arch/arm/kernel/perf_regs.c
index 05fe92a..78ee29a 100644
--- a/arch/arm/kernel/perf_regs.c
+++ b/arch/arm/kernel/perf_regs.c
@@ -36,5 +36,6 @@ void perf_get_regs_user(struct perf_regs *regs_user,
 			struct pt_regs *regs_user_copy)
 {
 	regs_user->regs = task_pt_regs(current);
-	regs_user->abi = perf_reg_abi(current);
+	regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) :
+			 PERF_SAMPLE_REGS_ABI_NONE;
 }
diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
index 0bbac61..ac19d19 100644
--- a/arch/arm64/kernel/perf_regs.c
+++ b/arch/arm64/kernel/perf_regs.c
@@ -58,5 +58,6 @@ void perf_get_regs_user(struct perf_regs *regs_user,
 			struct pt_regs *regs_user_copy)
 {
 	regs_user->regs = task_pt_regs(current);
-	regs_user->abi = perf_reg_abi(current);
+	regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) :
+			 PERF_SAMPLE_REGS_ABI_NONE;
 }
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index 07c30ee..fa79d6d 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -102,7 +102,8 @@ void perf_get_regs_user(struct perf_regs *regs_user,
 			struct pt_regs *regs_user_copy)
 {
 	regs_user->regs = task_pt_regs(current);
-	regs_user->abi = perf_reg_abi(current);
+	regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) :
+			 PERF_SAMPLE_REGS_ABI_NONE;
 }
 #else /* CONFIG_X86_64 */
 #define REG_NOSUPPORT ((1ULL << PERF_REG_X86_DS) | \
-- 
2.7.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-28 12:31 [PATCH] perf: Fix oops when kthread execs user process Young Xiao
@ 2019-05-28 12:41 ` Russell King - ARM Linux admin
  2019-05-28 14:01 ` Peter Zijlstra
  1 sibling, 0 replies; 30+ messages in thread
From: Russell King - ARM Linux admin @ 2019-05-28 12:41 UTC (permalink / raw)
  To: Young Xiao
  Cc: will.deacon, mark.rutland, mingo, bp, hpa, x86, peterz,
	kan.liang, linux-arm-kernel, linux-kernel

On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
> When a kthread calls call_usermodehelper() the steps are:
>   1. allocate current->mm
>   2. load_elf_binary()
>   3. populate current->thread.regs
> 
> While doing this, interrupts are not disabled. If there is a perf
> interrupt in the middle of this process (i.e. step 1 has completed
> but not yet reached to step 3) and if perf tries to read userspace
> regs, kernel oops.
> 
> Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
> pt_regs are not set.
> 
> See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs
> user process") for details.
> 
> Signed-off-by: Young Xiao <92siuyang@gmail.com>
> ---
>  arch/arm/kernel/perf_regs.c   | 3 ++-
>  arch/arm64/kernel/perf_regs.c | 3 ++-
>  arch/x86/kernel/perf_regs.c   | 3 ++-
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/kernel/perf_regs.c b/arch/arm/kernel/perf_regs.c
> index 05fe92a..78ee29a 100644
> --- a/arch/arm/kernel/perf_regs.c
> +++ b/arch/arm/kernel/perf_regs.c
> @@ -36,5 +36,6 @@ void perf_get_regs_user(struct perf_regs *regs_user,
>  			struct pt_regs *regs_user_copy)
>  {
>  	regs_user->regs = task_pt_regs(current);
> -	regs_user->abi = perf_reg_abi(current);
> +	regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) :
> +			 PERF_SAMPLE_REGS_ABI_NONE;

I'd prefer it if we didn't introduce unnecessary parens - what function
do the parens around "regs_user->regs" serve?
-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-28 12:31 [PATCH] perf: Fix oops when kthread execs user process Young Xiao
  2019-05-28 12:41 ` Russell King - ARM Linux admin
@ 2019-05-28 14:01 ` Peter Zijlstra
  2019-05-28 15:32   ` Will Deacon
  2019-05-29  1:44   ` Michael Ellerman
  1 sibling, 2 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-05-28 14:01 UTC (permalink / raw)
  To: Young Xiao
  Cc: will.deacon, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe

On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
> When a kthread calls call_usermodehelper() the steps are:
>   1. allocate current->mm
>   2. load_elf_binary()
>   3. populate current->thread.regs
> 
> While doing this, interrupts are not disabled. If there is a perf
> interrupt in the middle of this process (i.e. step 1 has completed
> but not yet reached to step 3) and if perf tries to read userspace
> regs, kernel oops.
> 
> Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
> pt_regs are not set.
> 
> See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs
> user process") for details.

Why the hell do we set current->mm before it is complete? Note that
normally exec() builds the new mm before attaching it, see exec_mmap()
in flush_old_exec().

Also, why did those PPC folks 'fix' this in isolation? And why didn't
you Cc them?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-28 14:01 ` Peter Zijlstra
@ 2019-05-28 15:32   ` Will Deacon
  2019-05-28 16:12     ` Mark Rutland
                       ` (2 more replies)
  2019-05-29  1:44   ` Michael Ellerman
  1 sibling, 3 replies; 30+ messages in thread
From: Will Deacon @ 2019-05-28 15:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe

On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote:
> On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
> > When a kthread calls call_usermodehelper() the steps are:
> >   1. allocate current->mm
> >   2. load_elf_binary()
> >   3. populate current->thread.regs
> > 
> > While doing this, interrupts are not disabled. If there is a perf
> > interrupt in the middle of this process (i.e. step 1 has completed
> > but not yet reached to step 3) and if perf tries to read userspace
> > regs, kernel oops.

This seems to be because pt_regs(current) gives NULL for kthreads on Power.

> > Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
> > pt_regs are not set.
> > 
> > See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs
> > user process") for details.
> 
> Why the hell do we set current->mm before it is complete? Note that
> normally exec() builds the new mm before attaching it, see exec_mmap()
> in flush_old_exec().

From the initial report [1], it doesn't look like the mm isn't initialised,
but rather than we're dereferencing a NULL pt_regs pointer somehow for the
current task (see previous comment). I don't see how that can happen on
arm64, given that we put the pt_regs on the kernel stack which is allocated
during fork.

Will

[1] https://git.kernel.org/linus/bf05fc25f268

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-28 15:32   ` Will Deacon
@ 2019-05-28 16:12     ` Mark Rutland
  2019-05-28 17:32     ` Peter Zijlstra
  2019-05-29  4:21     ` Michael Ellerman
  2 siblings, 0 replies; 30+ messages in thread
From: Mark Rutland @ 2019-05-28 16:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Young Xiao, linux, mingo, bp, hpa, x86,
	kan.liang, linux-arm-kernel, linux-kernel, ravi.bangoria, mpe

On Tue, May 28, 2019 at 04:32:24PM +0100, Will Deacon wrote:
> On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote:
> > On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
> > > When a kthread calls call_usermodehelper() the steps are:
> > >   1. allocate current->mm
> > >   2. load_elf_binary()
> > >   3. populate current->thread.regs
> > > 
> > > While doing this, interrupts are not disabled. If there is a perf
> > > interrupt in the middle of this process (i.e. step 1 has completed
> > > but not yet reached to step 3) and if perf tries to read userspace
> > > regs, kernel oops.
> 
> This seems to be because pt_regs(current) gives NULL for kthreads on Power.

I think you mean task_pt_regs(current) here.

> > > Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
> > > pt_regs are not set.
> > > 
> > > See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs
> > > user process") for details.
> > 
> > Why the hell do we set current->mm before it is complete? Note that
> > normally exec() builds the new mm before attaching it, see exec_mmap()
> > in flush_old_exec().
> 
> From the initial report [1], it doesn't look like the mm isn't initialised,
> but rather than we're dereferencing a NULL pt_regs pointer somehow for the
> current task (see previous comment). I don't see how that can happen on
> arm64, given that we put the pt_regs on the kernel stack which is allocated
> during fork.
> 
> Will
> 
> [1] https://git.kernel.org/linus/bf05fc25f268

One caveat is that for the idle threads, the initial SP overlaps the
task_pt_regs() area:

* __primary_switched starts SP at init_thread_union + THREAD_SIZE.

* __cpu_up() starts SP at task_stack_page(idle) + THREAD_SIZE.

... and in either case, sampling that would be bad.

For both arm, I believe similar holds true. AFAICT x86 seems to reserve
the regs area in its head_{64,32}.S, but I can't see what it does for
other threads.

Regardless, for arm, arm64, and x86, task_pt_regs(current) cannot be
NULL.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-28 15:32   ` Will Deacon
  2019-05-28 16:12     ` Mark Rutland
@ 2019-05-28 17:32     ` Peter Zijlstra
  2019-05-29  9:17       ` Will Deacon
  2019-05-29 10:11       ` Mark Rutland
  2019-05-29  4:21     ` Michael Ellerman
  2 siblings, 2 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-05-28 17:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe

On Tue, May 28, 2019 at 04:32:24PM +0100, Will Deacon wrote:
> On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote:
> > On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
> > > When a kthread calls call_usermodehelper() the steps are:
> > >   1. allocate current->mm
> > >   2. load_elf_binary()
> > >   3. populate current->thread.regs
> > > 
> > > While doing this, interrupts are not disabled. If there is a perf
> > > interrupt in the middle of this process (i.e. step 1 has completed
> > > but not yet reached to step 3) and if perf tries to read userspace
> > > regs, kernel oops.
> 
> This seems to be because pt_regs(current) gives NULL for kthreads on Power.

'funny' thing that, perf_sample_regs_user() seems to assume that
anything with current->mm is in fact a user task, and that assumption is
just plain wrong, consider use_mm().

So I'm thinking the right thing to do here is something like the below;
umh should get PF_KTHREAD cleared when it passes exec(). And this should
also fix the power splat I'm thinking.

---

diff --git a/kernel/events/core.c b/kernel/events/core.c
index abbd4b3b96c2..9929404b6eb9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct perf_regs *regs_user,
 	if (user_mode(regs)) {
 		regs_user->abi = perf_reg_abi(current);
 		regs_user->regs = regs;
-	} else if (current->mm) {
+	} else if (!(current->flags & PF_KTHREAD) && current->mm) {
 		perf_get_regs_user(regs_user, regs, regs_user_copy);
 	} else {
 		regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-28 14:01 ` Peter Zijlstra
  2019-05-28 15:32   ` Will Deacon
@ 2019-05-29  1:44   ` Michael Ellerman
  1 sibling, 0 replies; 30+ messages in thread
From: Michael Ellerman @ 2019-05-29  1:44 UTC (permalink / raw)
  To: Peter Zijlstra, Young Xiao
  Cc: will.deacon, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria

Peter Zijlstra <peterz@infradead.org> writes:
> On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
>> When a kthread calls call_usermodehelper() the steps are:
>>   1. allocate current->mm
>>   2. load_elf_binary()
>>   3. populate current->thread.regs
>> 
>> While doing this, interrupts are not disabled. If there is a perf
>> interrupt in the middle of this process (i.e. step 1 has completed
>> but not yet reached to step 3) and if perf tries to read userspace
>> regs, kernel oops.
>> 
>> Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
>> pt_regs are not set.
>> 
>> See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs
>> user process") for details.
>
> Why the hell do we set current->mm before it is complete? Note that
> normally exec() builds the new mm before attaching it, see exec_mmap()
> in flush_old_exec().
>
> Also, why did those PPC folks 'fix' this in isolation? And why didn't
> you Cc them?

We just assumed it was our bug, 'cause we have plenty of those :)

cheers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-28 15:32   ` Will Deacon
  2019-05-28 16:12     ` Mark Rutland
  2019-05-28 17:32     ` Peter Zijlstra
@ 2019-05-29  4:21     ` Michael Ellerman
  2 siblings, 0 replies; 30+ messages in thread
From: Michael Ellerman @ 2019-05-29  4:21 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria

Will Deacon <will.deacon@arm.com> writes:
> On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote:
>> On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
>> > When a kthread calls call_usermodehelper() the steps are:
>> >   1. allocate current->mm
>> >   2. load_elf_binary()
>> >   3. populate current->thread.regs
>> > 
>> > While doing this, interrupts are not disabled. If there is a perf
>> > interrupt in the middle of this process (i.e. step 1 has completed
>> > but not yet reached to step 3) and if perf tries to read userspace
>> > regs, kernel oops.
>
> This seems to be because pt_regs(current) gives NULL for kthreads on Power.

Right, we've done that since roughly forever in copy_thread():

int copy_thread(unsigned long clone_flags, unsigned long usp,
		unsigned long kthread_arg, struct task_struct *p)
{
	...
	/* Copy registers */
	sp -= sizeof(struct pt_regs);
	childregs = (struct pt_regs *) sp;
	if (unlikely(p->flags & PF_KTHREAD)) {
		/* kernel thread */
		memset(childregs, 0, sizeof(struct pt_regs));
		childregs->gpr[1] = sp + sizeof(struct pt_regs);
                ...
		p->thread.regs = NULL;	/* no user register state */

See commit from 2002:
  https://github.com/mpe/linux-fullhistory/commit/c0a96c0918d21d8a99270e94d9c4a4a322d04581#diff-edb76bfcc84905163f34d24d2aad3f3aR187

> From the initial report [1], it doesn't look like the mm isn't initialised,
> but rather than we're dereferencing a NULL pt_regs pointer somehow for the
> current task (see previous comment). I don't see how that can happen on
> arm64, given that we put the pt_regs on the kernel stack which is allocated
> during fork.

We have the regs on the stack too (see above), but we're explicitly
NULL'ing the link from task->thread.

Looks like on arm64 and x86 there is no link from task->thread, instead
you get from task to pt_regs via task_stack_page().

That actually seems potentially fishy given the comment on
task_stack_page() about the stack going away for exiting tasks. We
should probably be NULL'ing the regs pointer in free_thread_stack() or
similar. Though that race mustn't be happening because other arches
would see it.

Or are we just wrong and kthreads should have non-NULL regs? I can't
find another arch that does the same as us.

cheers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-28 17:32     ` Peter Zijlstra
@ 2019-05-29  9:17       ` Will Deacon
  2019-05-29 10:10         ` Peter Zijlstra
  2019-05-29 10:11       ` Mark Rutland
  1 sibling, 1 reply; 30+ messages in thread
From: Will Deacon @ 2019-05-29  9:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe

On Tue, May 28, 2019 at 07:32:28PM +0200, Peter Zijlstra wrote:
> On Tue, May 28, 2019 at 04:32:24PM +0100, Will Deacon wrote:
> > On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote:
> > > On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
> > > > When a kthread calls call_usermodehelper() the steps are:
> > > >   1. allocate current->mm
> > > >   2. load_elf_binary()
> > > >   3. populate current->thread.regs
> > > > 
> > > > While doing this, interrupts are not disabled. If there is a perf
> > > > interrupt in the middle of this process (i.e. step 1 has completed
> > > > but not yet reached to step 3) and if perf tries to read userspace
> > > > regs, kernel oops.
> > 
> > This seems to be because pt_regs(current) gives NULL for kthreads on Power.
> 
> 'funny' thing that, perf_sample_regs_user() seems to assume that
> anything with current->mm is in fact a user task, and that assumption is
> just plain wrong, consider use_mm().

Right, I suppose that was attempting to handle interrupt skid from the PMU
overflow?

> So I'm thinking the right thing to do here is something like the below;
> umh should get PF_KTHREAD cleared when it passes exec(). And this should
> also fix the power splat I'm thinking.
> 
> ---
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index abbd4b3b96c2..9929404b6eb9 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct perf_regs *regs_user,
>  	if (user_mode(regs)) {
>  		regs_user->abi = perf_reg_abi(current);
>  		regs_user->regs = regs;
> -	} else if (current->mm) {
> +	} else if (!(current->flags & PF_KTHREAD) && current->mm) {
>  		perf_get_regs_user(regs_user, regs, regs_user_copy);

Makes sense, but under which circumstances would we have a NULL mm here?

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29  9:17       ` Will Deacon
@ 2019-05-29 10:10         ` Peter Zijlstra
  2019-05-29 10:20           ` Will Deacon
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-05-29 10:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe

On Wed, May 29, 2019 at 10:17:33AM +0100, Will Deacon wrote:
> On Tue, May 28, 2019 at 07:32:28PM +0200, Peter Zijlstra wrote:

> > 'funny' thing that, perf_sample_regs_user() seems to assume that
> > anything with current->mm is in fact a user task, and that assumption is
> > just plain wrong, consider use_mm().
> 
> Right, I suppose that was attempting to handle interrupt skid from the PMU
> overflow?

Nah, just a broken test to determine if there is userspace at all. It is
mostly right, just not completely :-)

> > So I'm thinking the right thing to do here is something like the below;
> > umh should get PF_KTHREAD cleared when it passes exec(). And this should
> > also fix the power splat I'm thinking.
> > 
> > ---
> > 
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index abbd4b3b96c2..9929404b6eb9 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct perf_regs *regs_user,
> >  	if (user_mode(regs)) {
> >  		regs_user->abi = perf_reg_abi(current);
> >  		regs_user->regs = regs;
> > -	} else if (current->mm) {
> > +	} else if (!(current->flags & PF_KTHREAD) && current->mm) {
> >  		perf_get_regs_user(regs_user, regs, regs_user_copy);
> 
> Makes sense, but under which circumstances would we have a NULL mm here?

Dunno; I'm paranoid, and also:

  mm/memcontrol.c:        if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD))
  mm/vmacache.c:  return current->mm == mm && !(current->flags & PF_KTHREAD);


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-28 17:32     ` Peter Zijlstra
  2019-05-29  9:17       ` Will Deacon
@ 2019-05-29 10:11       ` Mark Rutland
  1 sibling, 0 replies; 30+ messages in thread
From: Mark Rutland @ 2019-05-29 10:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Young Xiao, linux, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe

On Tue, May 28, 2019 at 07:32:28PM +0200, Peter Zijlstra wrote:
> On Tue, May 28, 2019 at 04:32:24PM +0100, Will Deacon wrote:
> > On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote:
> > > On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
> > > > When a kthread calls call_usermodehelper() the steps are:
> > > >   1. allocate current->mm
> > > >   2. load_elf_binary()
> > > >   3. populate current->thread.regs
> > > > 
> > > > While doing this, interrupts are not disabled. If there is a perf
> > > > interrupt in the middle of this process (i.e. step 1 has completed
> > > > but not yet reached to step 3) and if perf tries to read userspace
> > > > regs, kernel oops.
> > 
> > This seems to be because pt_regs(current) gives NULL for kthreads on Power.
> 
> 'funny' thing that, perf_sample_regs_user() seems to assume that
> anything with current->mm is in fact a user task, and that assumption is
> just plain wrong, consider use_mm().

Tagnentially, it looks like that assumption is made elsewhere, and could
do with a more general cleanup. IIUC, the following are suspect:

* kmemleak's scan_should_stop()
* x86's __kernel_fpu_begin()
* arm64's arch_dup_task_struct()

It's probably worth an is_thread(task) helper so that those can be
written in an obviously correct way.

> So I'm thinking the right thing to do here is something like the below;
> umh should get PF_KTHREAD cleared when it passes exec(). And this should
> also fix the power splat I'm thinking.
> 
> ---
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index abbd4b3b96c2..9929404b6eb9 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct perf_regs *regs_user,
>  	if (user_mode(regs)) {
>  		regs_user->abi = perf_reg_abi(current);
>  		regs_user->regs = regs;
> -	} else if (current->mm) {
> +	} else if (!(current->flags & PF_KTHREAD) && current->mm) {

Wouldn't !PF_KTHREAD imply current->mm anyhow?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 10:10         ` Peter Zijlstra
@ 2019-05-29 10:20           ` Will Deacon
  2019-05-29 12:55             ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Will Deacon @ 2019-05-29 10:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe

On Wed, May 29, 2019 at 12:10:42PM +0200, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 10:17:33AM +0100, Will Deacon wrote:
> > On Tue, May 28, 2019 at 07:32:28PM +0200, Peter Zijlstra wrote:
> 
> > > 'funny' thing that, perf_sample_regs_user() seems to assume that
> > > anything with current->mm is in fact a user task, and that assumption is
> > > just plain wrong, consider use_mm().
> > 
> > Right, I suppose that was attempting to handle interrupt skid from the PMU
> > overflow?
> 
> Nah, just a broken test to determine if there is userspace at all. It is
> mostly right, just not completely :-)
> 
> > > So I'm thinking the right thing to do here is something like the below;
> > > umh should get PF_KTHREAD cleared when it passes exec(). And this should
> > > also fix the power splat I'm thinking.
> > > 
> > > ---
> > > 
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index abbd4b3b96c2..9929404b6eb9 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct perf_regs *regs_user,
> > >  	if (user_mode(regs)) {
> > >  		regs_user->abi = perf_reg_abi(current);
> > >  		regs_user->regs = regs;
> > > -	} else if (current->mm) {
> > > +	} else if (!(current->flags & PF_KTHREAD) && current->mm) {
> > >  		perf_get_regs_user(regs_user, regs, regs_user_copy);
> > 
> > Makes sense, but under which circumstances would we have a NULL mm here?
> 
> Dunno; I'm paranoid, and also:
> 
>   mm/memcontrol.c:        if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD))

So this one I also don't understand...

>   mm/vmacache.c:  return current->mm == mm && !(current->flags & PF_KTHREAD);

... but this one is just about an mm mismatch, rather than a NULL mm.

Anyway, you can add my ack to your patch, but I bet we can remove that mm
check :D

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 10:20           ` Will Deacon
@ 2019-05-29 12:55             ` Peter Zijlstra
  2019-05-29 13:05               ` Will Deacon
  2019-05-30  8:38               ` Ravi Bangoria
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-05-29 12:55 UTC (permalink / raw)
  To: Will Deacon
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 11:20:22AM +0100, Will Deacon wrote:
> Anyway, you can add my ack to your patch, but I bet we can remove that mm
> check :D

I've ended up with the below. Ravi, can you test if that does indeed
obsolete your PPC patch?

---
Subject: perf: Fix perf_sample_regs_user()
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed May 29 14:37:24 CEST 2019

perf_sample_regs_user() uses 'current->mm' to test for the presence of
userspace, but this is insufficient, consider use_mm().

A better test is: '!(current->flags & PF_KTHREAD)', exec() clears
PF_KTHREAD after it sets the new ->mm but before it drops to userspace
for the first time.

Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process")

Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reported-by: Young Xiao <92siuyang@gmail.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct
 	if (user_mode(regs)) {
 		regs_user->abi = perf_reg_abi(current);
 		regs_user->regs = regs;
-	} else if (current->mm) {
+	} else if (!(current->flags & PF_KTHREAD)) {
 		perf_get_regs_user(regs_user, regs, regs_user_copy);
 	} else {
 		regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 12:55             ` Peter Zijlstra
@ 2019-05-29 13:05               ` Will Deacon
  2019-05-29 13:25                 ` Peter Zijlstra
  2019-05-30  8:38               ` Ravi Bangoria
  1 sibling, 1 reply; 30+ messages in thread
From: Will Deacon @ 2019-05-29 13:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 11:20:22AM +0100, Will Deacon wrote:
> > Anyway, you can add my ack to your patch, but I bet we can remove that mm
> > check :D
> 
> I've ended up with the below. Ravi, can you test if that does indeed
> obsolete your PPC patch?
> 
> ---
> Subject: perf: Fix perf_sample_regs_user()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed May 29 14:37:24 CEST 2019
> 
> perf_sample_regs_user() uses 'current->mm' to test for the presence of
> userspace, but this is insufficient, consider use_mm().
> 
> A better test is: '!(current->flags & PF_KTHREAD)', exec() clears
> PF_KTHREAD after it sets the new ->mm but before it drops to userspace
> for the first time.
> 
> Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process")
> 
> Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> Reported-by: Young Xiao <92siuyang@gmail.com>
> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Acked-by: Will Deacon <will.deacon@arm.com>
> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/events/core.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct
>  	if (user_mode(regs)) {

Hmm, so it just occurred to me that Mark's observation is that the regs
can be junk in some cases. In which case, should we be checking for
kthreads first?

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 13:05               ` Will Deacon
@ 2019-05-29 13:25                 ` Peter Zijlstra
  2019-05-29 14:35                   ` Will Deacon
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-05-29 13:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote:
> On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:

> >  	if (user_mode(regs)) {
> 
> Hmm, so it just occurred to me that Mark's observation is that the regs
> can be junk in some cases. In which case, should we be checking for
> kthreads first?

task_pt_regs() can return garbage, but @regs is the exception (or
perf_arch_fetch_caller_regs()) regs, and for those user_mode() had
better be correct.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 13:25                 ` Peter Zijlstra
@ 2019-05-29 14:35                   ` Will Deacon
  2019-05-29 16:19                     ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Will Deacon @ 2019-05-29 14:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 03:25:15PM +0200, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote:
> > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:
> 
> > >  	if (user_mode(regs)) {
> > 
> > Hmm, so it just occurred to me that Mark's observation is that the regs
> > can be junk in some cases. In which case, should we be checking for
> > kthreads first?
> 
> task_pt_regs() can return garbage, but @regs is the exception (or
> perf_arch_fetch_caller_regs()) regs, and for those user_mode() had
> better be correct.

So what should we report for the idle task?

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 14:35                   ` Will Deacon
@ 2019-05-29 16:19                     ` Peter Zijlstra
  2019-05-29 16:24                       ` Mark Rutland
  2019-05-29 16:25                       ` Will Deacon
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-05-29 16:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 03:35:10PM +0100, Will Deacon wrote:
> On Wed, May 29, 2019 at 03:25:15PM +0200, Peter Zijlstra wrote:
> > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote:
> > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:
> > 
> > > >  	if (user_mode(regs)) {
> > > 
> > > Hmm, so it just occurred to me that Mark's observation is that the regs
> > > can be junk in some cases. In which case, should we be checking for
> > > kthreads first?
> > 
> > task_pt_regs() can return garbage, but @regs is the exception (or
> > perf_arch_fetch_caller_regs()) regs, and for those user_mode() had
> > better be correct.
> 
> So what should we report for the idle task?

If an interrupt hits the idle task, @regs would be !user_mode(regs),
we'll find current->flags & PF_KTHREAD (idle not having passed through
exec()) and therefore we'll take ABI_NONE for the user regs.

Or am I not getting it?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 16:19                     ` Peter Zijlstra
@ 2019-05-29 16:24                       ` Mark Rutland
  2019-05-29 16:38                         ` Mark Rutland
  2019-05-29 16:25                       ` Will Deacon
  1 sibling, 1 reply; 30+ messages in thread
From: Mark Rutland @ 2019-05-29 16:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Young Xiao, linux, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 06:19:55PM +0200, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 03:35:10PM +0100, Will Deacon wrote:
> > On Wed, May 29, 2019 at 03:25:15PM +0200, Peter Zijlstra wrote:
> > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote:
> > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:
> > > 
> > > > >  	if (user_mode(regs)) {
> > > > 
> > > > Hmm, so it just occurred to me that Mark's observation is that the regs
> > > > can be junk in some cases. In which case, should we be checking for
> > > > kthreads first?
> > > 
> > > task_pt_regs() can return garbage, but @regs is the exception (or
> > > perf_arch_fetch_caller_regs()) regs, and for those user_mode() had
> > > better be correct.
> > 
> > So what should we report for the idle task?
> 
> If an interrupt hits the idle task, @regs would be !user_mode(regs),
> we'll find current->flags & PF_KTHREAD (idle not having passed through
> exec()) and therefore we'll take ABI_NONE for the user regs.
> 
> Or am I not getting it?

If the contents of task_pt_regs(current) is garbage, then the result of
user_mode(task_pt_regs(current)) is also garbage, no?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 16:19                     ` Peter Zijlstra
  2019-05-29 16:24                       ` Mark Rutland
@ 2019-05-29 16:25                       ` Will Deacon
  2019-05-29 16:44                         ` Peter Zijlstra
  1 sibling, 1 reply; 30+ messages in thread
From: Will Deacon @ 2019-05-29 16:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 06:19:55PM +0200, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 03:35:10PM +0100, Will Deacon wrote:
> > On Wed, May 29, 2019 at 03:25:15PM +0200, Peter Zijlstra wrote:
> > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote:
> > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:
> > > 
> > > > >  	if (user_mode(regs)) {
> > > > 
> > > > Hmm, so it just occurred to me that Mark's observation is that the regs
> > > > can be junk in some cases. In which case, should we be checking for
> > > > kthreads first?
> > > 
> > > task_pt_regs() can return garbage, but @regs is the exception (or
> > > perf_arch_fetch_caller_regs()) regs, and for those user_mode() had
> > > better be correct.
> > 
> > So what should we report for the idle task?
> 
> If an interrupt hits the idle task, @regs would be !user_mode(regs),
> we'll find current->flags & PF_KTHREAD (idle not having passed through
> exec()) and therefore we'll take ABI_NONE for the user regs.
> 
> Or am I not getting it?

Sorry, I'm not trying to catch you out! Just trying to understand what the
semantics are supposed to be.

I do find the concept of user_mode(regs) bizarre for the idle task. By the
above, we definitely have a bug on arm64 (user_mode(regs) tends to be
true for the idle task), and I couldn't figure out how you avoided it on
x86. I guess it happens to work because the stack is zero-initialised or
something?

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 16:24                       ` Mark Rutland
@ 2019-05-29 16:38                         ` Mark Rutland
  2019-05-29 17:03                           ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Rutland @ 2019-05-29 16:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Young Xiao, linux, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 05:24:36PM +0100, Mark Rutland wrote:
> On Wed, May 29, 2019 at 06:19:55PM +0200, Peter Zijlstra wrote:
> > On Wed, May 29, 2019 at 03:35:10PM +0100, Will Deacon wrote:
> > > On Wed, May 29, 2019 at 03:25:15PM +0200, Peter Zijlstra wrote:
> > > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote:
> > > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:
> > > > 
> > > > > >  	if (user_mode(regs)) {
> > > > > 
> > > > > Hmm, so it just occurred to me that Mark's observation is that the regs
> > > > > can be junk in some cases. In which case, should we be checking for
> > > > > kthreads first?
> > > > 
> > > > task_pt_regs() can return garbage, but @regs is the exception (or
> > > > perf_arch_fetch_caller_regs()) regs, and for those user_mode() had
> > > > better be correct.
> > > 
> > > So what should we report for the idle task?
> > 
> > If an interrupt hits the idle task, @regs would be !user_mode(regs),
> > we'll find current->flags & PF_KTHREAD (idle not having passed through
> > exec()) and therefore we'll take ABI_NONE for the user regs.
> > 
> > Or am I not getting it?
> 
> If the contents of task_pt_regs(current) is garbage, then the result of
> user_mode(task_pt_regs(current)) is also garbage, no?

Ugh; I was being thick here and assuming regs was the result of
task_pt_regs() when it's actually the interrupted regs.

Sorry for the noise.

Generally speaking though, if we ever task task_pt_regs() of an idle
task we'll get junk, and user_mode() could be true.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 16:25                       ` Will Deacon
@ 2019-05-29 16:44                         ` Peter Zijlstra
  2019-05-30  7:28                           ` Will Deacon
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-05-29 16:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 05:25:28PM +0100, Will Deacon wrote:

> > > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote:
> > > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:
> > > > 
> > > > > >  	if (user_mode(regs)) {
> > > > > 
> > > > > Hmm, so it just occurred to me that Mark's observation is that the regs
> > > > > can be junk in some cases. In which case, should we be checking for
> > > > > kthreads first?

> Sorry, I'm not trying to catch you out! Just trying to understand what the
> semantics are supposed to be.
> 
> I do find the concept of user_mode(regs) bizarre for the idle task. By the
> above, we definitely have a bug on arm64 (user_mode(regs) tends to be
> true for the idle task), and I couldn't figure out how you avoided it on
> x86. I guess it happens to work because the stack is zero-initialised or
> something?

So lets take the whole thing:

static void perf_sample_regs_user(struct perf_regs *regs_user,
				  struct pt_regs *regs,
				  struct pt_regs *regs_user_copy)
{
	if (user_mode(regs)) {
		regs_user->abi = perf_reg_abi(current);
		regs_user->regs = regs;
	} else if (!(current->flags & PF_KTHREAD)) {
		perf_get_regs_user(regs_user, regs, regs_user_copy);
	} else {
		regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE;
		regs_user->regs = NULL;
	}
}

This is called from the perf-generate-a-sample path, which is typically
an exception (IRQ/NMI/whatever) or a software/tracepoint thing.

In the exception case, the @regs argument are the exception register, as
provided by your entry.S to your exception handlers. In the
software/tracepoint thing, it is the result of
perf_arch_fetch_caller_regs().

So @regs is always 'sane' and user_mode(regs) tells us if the exception
came from userspace (and software/tracepoints always fail this, they
'obviously' don't come from userspace). If we're idle, we're not from
userspace, so this branch doesn't matter.

Next, we test if there is a userspace part _at_all_, this is the newly
minted: '!(current->flags & PF_KTHREAD)', if that passes, we use
architecture magic -- task_pt_regs() -- to get the user-regs. This can
be crap. But since the idle task will always fail our test (as would
the old one, idle->mm is always NULL), we'll never get here for idle.

Then failing the above two, as we must for idle, we'll default to
ABI_NONE/NULL.


Does that help?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 16:38                         ` Mark Rutland
@ 2019-05-29 17:03                           ` Peter Zijlstra
  2019-05-30 10:35                             ` Mark Rutland
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-05-29 17:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Young Xiao, linux, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 05:38:54PM +0100, Mark Rutland wrote:
> Sorry for the noise.

n/p, confusion happens :-)

> Generally speaking though, if we ever task task_pt_regs() of an idle
> task we'll get junk, and user_mode() could be true.

Agreed, but we're not doing that.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 16:44                         ` Peter Zijlstra
@ 2019-05-30  7:28                           ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2019-05-30  7:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Young Xiao, linux, mark.rutland, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 06:44:07PM +0200, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 05:25:28PM +0100, Will Deacon wrote:
> 
> > > > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote:
> > > > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:
> > > > > 
> > > > > > >  	if (user_mode(regs)) {
> > > > > > 
> > > > > > Hmm, so it just occurred to me that Mark's observation is that the regs
> > > > > > can be junk in some cases. In which case, should we be checking for
> > > > > > kthreads first?
> 
> > Sorry, I'm not trying to catch you out! Just trying to understand what the
> > semantics are supposed to be.
> > 
> > I do find the concept of user_mode(regs) bizarre for the idle task. By the
> > above, we definitely have a bug on arm64 (user_mode(regs) tends to be
> > true for the idle task), and I couldn't figure out how you avoided it on
> > x86. I guess it happens to work because the stack is zero-initialised or
> > something?
> 
> So lets take the whole thing:
> 
> static void perf_sample_regs_user(struct perf_regs *regs_user,
> 				  struct pt_regs *regs,
> 				  struct pt_regs *regs_user_copy)
> {
> 	if (user_mode(regs)) {
> 		regs_user->abi = perf_reg_abi(current);
> 		regs_user->regs = regs;
> 	} else if (!(current->flags & PF_KTHREAD)) {
> 		perf_get_regs_user(regs_user, regs, regs_user_copy);
> 	} else {
> 		regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE;
> 		regs_user->regs = NULL;
> 	}
> }
> 
> This is called from the perf-generate-a-sample path, which is typically
> an exception (IRQ/NMI/whatever) or a software/tracepoint thing.

Yes, sorry, fell into the same trap as Mark here and misunderstood your
assertion about user_mode(regs) always needing to be valid. Then I went down
a stupid rabbit hole and dragged you with me. I can't ack a patch twice, so
I'll just go do something else for a bit...

Thanks for your patience!

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 12:55             ` Peter Zijlstra
  2019-05-29 13:05               ` Will Deacon
@ 2019-05-30  8:38               ` Ravi Bangoria
  2019-05-30 10:27                 ` Ravi Bangoria
  1 sibling, 1 reply; 30+ messages in thread
From: Ravi Bangoria @ 2019-05-30  8:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Young Xiao, linux, mark.rutland, mingo, bp, hpa,
	x86, kan.liang, linux-arm-kernel, linux-kernel, ravi.bangoria,
	mpe, acme, eranian, fweisbec, jolsa, ravi.bangoria



On 5/29/19 6:25 PM, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 11:20:22AM +0100, Will Deacon wrote:
>> Anyway, you can add my ack to your patch, but I bet we can remove that mm
>> check :D
> 
> I've ended up with the below. Ravi, can you test if that does indeed
> obsolete your PPC patch?

Checking.

> 
> ---
> Subject: perf: Fix perf_sample_regs_user()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed May 29 14:37:24 CEST 2019
> 
> perf_sample_regs_user() uses 'current->mm' to test for the presence of
> userspace, but this is insufficient, consider use_mm().
> 
> A better test is: '!(current->flags & PF_KTHREAD)', exec() clears
> PF_KTHREAD after it sets the new ->mm but before it drops to userspace
> for the first time.

This looks correct. I'll give it a try.

> 
> Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process")
> 
> Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> Reported-by: Young Xiao <92siuyang@gmail.com>
> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Acked-by: Will Deacon <will.deacon@arm.com>
> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/events/core.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct
>  	if (user_mode(regs)) {
>  		regs_user->abi = perf_reg_abi(current);
>  		regs_user->regs = regs;
> -	} else if (current->mm) {
> +	} else if (!(current->flags & PF_KTHREAD)) {
>  		perf_get_regs_user(regs_user, regs, regs_user_copy);
>  	} else {
>  		regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE;
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-30  8:38               ` Ravi Bangoria
@ 2019-05-30 10:27                 ` Ravi Bangoria
  2019-05-31 15:37                   ` Will Deacon
  0 siblings, 1 reply; 30+ messages in thread
From: Ravi Bangoria @ 2019-05-30 10:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Young Xiao, linux, mark.rutland, mingo, bp, hpa,
	x86, kan.liang, linux-arm-kernel, linux-kernel, ravi.bangoria,
	mpe, acme, eranian, fweisbec, jolsa, Ravi Bangoria



On 5/30/19 2:08 PM, Ravi Bangoria wrote:
>> ---
>> Subject: perf: Fix perf_sample_regs_user()
>> From: Peter Zijlstra <peterz@infradead.org>
>> Date: Wed May 29 14:37:24 CEST 2019
>>
>> perf_sample_regs_user() uses 'current->mm' to test for the presence of
>> userspace, but this is insufficient, consider use_mm().
>>
>> A better test is: '!(current->flags & PF_KTHREAD)', exec() clears
>> PF_KTHREAD after it sets the new ->mm but before it drops to userspace
>> for the first time.
> 
> This looks correct. I'll give it a try.
> 
>>
>> Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process")
>>
>> Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
>> Reported-by: Young Xiao <92siuyang@gmail.com>
>> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
>> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Jiri Olsa <jolsa@redhat.com>
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Cc: Stephane Eranian <eranian@google.com>
>> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
>> Acked-by: Will Deacon <will.deacon@arm.com>
>> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample")
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> ---
>>  kernel/events/core.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct
>>  	if (user_mode(regs)) {
>>  		regs_user->abi = perf_reg_abi(current);
>>  		regs_user->regs = regs;
>> -	} else if (current->mm) {
>> +	} else if (!(current->flags & PF_KTHREAD)) {

With this patch applied and my patch reverted, I still see it crashing
because current->flags does not have PF_KTHREAD set. Sample trace with
v5.0 kernel:


   BUG: Kernel NULL pointer dereference at 0x00000000
   Faulting instruction address: 0xc0000000000f1a6c
   Oops: Kernel access of bad area, sig: 11 [#1]
   LE SMP NR_CPUS=2048 NUMA pSeries
   CPU: 17 PID: 3241 Comm: systemd-cgroups Kdump: loaded Not tainted 5.0.0+ #7
   NIP:  c0000000000f1a6c LR: c0000000002acc7c CTR: c0000000002a8f90
   REGS: c0000001e80469a0 TRAP: 0300   Not tainted  (5.0.0+)
   MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE>  CR: 48022448  XER: 20000000
   CFAR: c00000000000deb4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 
   GPR00: c0000000002acc7c c0000001e8046c30 c000000001607500 0000000000000000 
   GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000128618 
   GPR08: 000007ffffffffff 0000000000000000 ffffffffffffffff c00000000001cd40 
   GPR12: c000000000446fd8 c00000003ffdf080 00000000ffff0000 0000000000000007 
   GPR16: c0000001edd74988 c0000001edd60400 00007fff89801130 000000000005e1b0 
   GPR20: c0000001edb77a08 c0000001e8047208 c0000001f03d9800 c0000001e8046e00 
   GPR24: 000000000000b1af c000000001126938 c0000001f03d9b28 0000000000010000 
   GPR28: c0000001e8046d30 0000000000000000 0000000000000000 0000000000000000 
   NIP [c0000000000f1a6c] perf_reg_value+0x5c/0xc0
   LR [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0
   Call Trace:
   [c0000001e8046c30] [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 (unreliable)
   [c0000001e8046c80] [c0000000002b9cd0] perf_output_sample+0x620/0x8c0
   [c0000001e8046d10] [c0000000002ba6b4] perf_event_output_forward+0x64/0x90
   [c0000001e8046d80] [c0000000002b2908] __perf_event_overflow+0x88/0x1e0
   [c0000001e8046dd0] [c0000000000f3d18] record_and_restart+0x288/0x670
   [c0000001e8047180] [c0000000000f4c18] perf_event_interrupt+0x2b8/0x4b0
   [c0000001e8047280] [c00000000002b380] performance_monitor_exception+0x50/0x70
   [c0000001e80472a0] [c000000000009ca0] performance_monitor_common+0x110/0x120
   --- interrupt: f01 at slice_scan_available+0x20/0xc0
       LR = slice_find_area+0x174/0x210
   [c0000001e8047630] [c000000000083ea0] slice_get_unmapped_area+0x3d0/0x7f0
   [c0000001e8047ae0] [c00000000032d5b0] get_unmapped_area+0xa0/0x170
   [c0000001e8047b10] [c00000000001cd40] arch_setup_additional_pages+0xc0/0x1c0
   [c0000001e8047b60] [c000000000446fd8] load_elf_binary+0xb48/0x1580
   [c0000001e8047c80] [c0000000003c3938] search_binary_handler+0xe8/0x2a0
   [c0000001e8047d10] [c0000000003c42f4] __do_execve_file.isra.13+0x694/0x980
   [c0000001e8047de0] [c000000000128618] call_usermodehelper_exec_async+0x248/0x290
   [c0000001e8047e20] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80
   Instruction dump:
   7c9e2378 7c7f1b78 f8010010 f821ffd1 419e0044 3d22ff6b 7bc41764 3929ae10 
   7d29202e 2b890150 419d003c 38210030 <7c7f482a> e8010010 ebc1fff0 ebe1fff8 
   ---[ end trace 54f3492ad1d403d8 ]---



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-29 17:03                           ` Peter Zijlstra
@ 2019-05-30 10:35                             ` Mark Rutland
  0 siblings, 0 replies; 30+ messages in thread
From: Mark Rutland @ 2019-05-30 10:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Young Xiao, linux, mingo, bp, hpa, x86, kan.liang,
	linux-arm-kernel, linux-kernel, ravi.bangoria, mpe, acme,
	eranian, fweisbec, jolsa

On Wed, May 29, 2019 at 07:03:13PM +0200, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 05:38:54PM +0100, Mark Rutland wrote:
> > Generally speaking though, if we ever task task_pt_regs() of an idle
> > task we'll get junk, and user_mode() could be true.
> 
> Agreed, but we're not doing that.

Sure.

I just think that might be an argument for having task_pt_regs() return
NULL for kthreads, or having a WARN_ON_ONCE(t->flags & PF_KTHREAD) to
catch missing checks elsewhere.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-30 10:27                 ` Ravi Bangoria
@ 2019-05-31 15:37                   ` Will Deacon
  2019-06-03 11:23                     ` Will Deacon
                                       ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Will Deacon @ 2019-05-31 15:37 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Peter Zijlstra, Young Xiao, linux, mark.rutland, mingo, bp, hpa,
	x86, kan.liang, linux-arm-kernel, linux-kernel, ravi.bangoria,
	mpe, acme, eranian, fweisbec, jolsa

On Thu, May 30, 2019 at 03:57:36PM +0530, Ravi Bangoria wrote:
> 
> 
> On 5/30/19 2:08 PM, Ravi Bangoria wrote:
> >> ---
> >> Subject: perf: Fix perf_sample_regs_user()
> >> From: Peter Zijlstra <peterz@infradead.org>
> >> Date: Wed May 29 14:37:24 CEST 2019
> >>
> >> perf_sample_regs_user() uses 'current->mm' to test for the presence of
> >> userspace, but this is insufficient, consider use_mm().
> >>
> >> A better test is: '!(current->flags & PF_KTHREAD)', exec() clears
> >> PF_KTHREAD after it sets the new ->mm but before it drops to userspace
> >> for the first time.
> > 
> > This looks correct. I'll give it a try.
> > 
> >>
> >> Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process")
> >>
> >> Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> >> Reported-by: Young Xiao <92siuyang@gmail.com>
> >> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> >> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> >> Cc: Michael Ellerman <mpe@ellerman.id.au>
> >> Cc: Jiri Olsa <jolsa@redhat.com>
> >> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> >> Cc: Stephane Eranian <eranian@google.com>
> >> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> >> Acked-by: Will Deacon <will.deacon@arm.com>
> >> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample")
> >> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> >> ---
> >>  kernel/events/core.c |    2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> --- a/kernel/events/core.c
> >> +++ b/kernel/events/core.c
> >> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct
> >>  	if (user_mode(regs)) {
> >>  		regs_user->abi = perf_reg_abi(current);
> >>  		regs_user->regs = regs;
> >> -	} else if (current->mm) {
> >> +	} else if (!(current->flags & PF_KTHREAD)) {
> 
> With this patch applied and my patch reverted, I still see it crashing
> because current->flags does not have PF_KTHREAD set. Sample trace with
> v5.0 kernel:
> 
> 
>    BUG: Kernel NULL pointer dereference at 0x00000000
>    Faulting instruction address: 0xc0000000000f1a6c
>    Oops: Kernel access of bad area, sig: 11 [#1]
>    LE SMP NR_CPUS=2048 NUMA pSeries
>    CPU: 17 PID: 3241 Comm: systemd-cgroups Kdump: loaded Not tainted 5.0.0+ #7
>    NIP:  c0000000000f1a6c LR: c0000000002acc7c CTR: c0000000002a8f90
>    REGS: c0000001e80469a0 TRAP: 0300   Not tainted  (5.0.0+)
>    MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE>  CR: 48022448  XER: 20000000
>    CFAR: c00000000000deb4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 
>    GPR00: c0000000002acc7c c0000001e8046c30 c000000001607500 0000000000000000 
>    GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000128618 
>    GPR08: 000007ffffffffff 0000000000000000 ffffffffffffffff c00000000001cd40 
>    GPR12: c000000000446fd8 c00000003ffdf080 00000000ffff0000 0000000000000007 
>    GPR16: c0000001edd74988 c0000001edd60400 00007fff89801130 000000000005e1b0 
>    GPR20: c0000001edb77a08 c0000001e8047208 c0000001f03d9800 c0000001e8046e00 
>    GPR24: 000000000000b1af c000000001126938 c0000001f03d9b28 0000000000010000 
>    GPR28: c0000001e8046d30 0000000000000000 0000000000000000 0000000000000000 
>    NIP [c0000000000f1a6c] perf_reg_value+0x5c/0xc0
>    LR [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0
>    Call Trace:
>    [c0000001e8046c30] [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 (unreliable)
>    [c0000001e8046c80] [c0000000002b9cd0] perf_output_sample+0x620/0x8c0
>    [c0000001e8046d10] [c0000000002ba6b4] perf_event_output_forward+0x64/0x90
>    [c0000001e8046d80] [c0000000002b2908] __perf_event_overflow+0x88/0x1e0
>    [c0000001e8046dd0] [c0000000000f3d18] record_and_restart+0x288/0x670
>    [c0000001e8047180] [c0000000000f4c18] perf_event_interrupt+0x2b8/0x4b0
>    [c0000001e8047280] [c00000000002b380] performance_monitor_exception+0x50/0x70
>    [c0000001e80472a0] [c000000000009ca0] performance_monitor_common+0x110/0x120
>    --- interrupt: f01 at slice_scan_available+0x20/0xc0
>        LR = slice_find_area+0x174/0x210
>    [c0000001e8047630] [c000000000083ea0] slice_get_unmapped_area+0x3d0/0x7f0
>    [c0000001e8047ae0] [c00000000032d5b0] get_unmapped_area+0xa0/0x170
>    [c0000001e8047b10] [c00000000001cd40] arch_setup_additional_pages+0xc0/0x1c0
>    [c0000001e8047b60] [c000000000446fd8] load_elf_binary+0xb48/0x1580
>    [c0000001e8047c80] [c0000000003c3938] search_binary_handler+0xe8/0x2a0
>    [c0000001e8047d10] [c0000000003c42f4] __do_execve_file.isra.13+0x694/0x980
>    [c0000001e8047de0] [c000000000128618] call_usermodehelper_exec_async+0x248/0x290
>    [c0000001e8047e20] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80
>    Instruction dump:
>    7c9e2378 7c7f1b78 f8010010 f821ffd1 419e0044 3d22ff6b 7bc41764 3929ae10 
>    7d29202e 2b890150 419d003c 38210030 <7c7f482a> e8010010 ebc1fff0 ebe1fff8 
>    ---[ end trace 54f3492ad1d403d8 ]---

Oh, nice! I think this happens because Power doesn't actually initialise
the regs after a kthread execs() until late in start_thread(). But the plot
thickens somewhat, since current_pt_regs() is different to
task_pt_regs(current) on Power (the former cannot return NULL).

So a really hideous hack on top of Peter's patch might be:

diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
index 0bbac612146e..5bde866024b6 100644
--- a/arch/arm64/kernel/perf_regs.c
+++ b/arch/arm64/kernel/perf_regs.c
@@ -57,6 +57,6 @@ void perf_get_regs_user(struct perf_regs *regs_user,
 			struct pt_regs *regs,
 			struct pt_regs *regs_user_copy)
 {
-	regs_user->regs = task_pt_regs(current);
+	regs_user->regs = current_pt_regs();
 	regs_user->abi = perf_reg_abi(current);
 }

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-31 15:37                   ` Will Deacon
@ 2019-06-03 11:23                     ` Will Deacon
  2019-06-03 11:48                     ` Peter Zijlstra
  2019-06-03 13:30                     ` Michael Ellerman
  2 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2019-06-03 11:23 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Peter Zijlstra, Young Xiao, linux, mark.rutland, mingo, bp, hpa,
	x86, kan.liang, linux-arm-kernel, linux-kernel, ravi.bangoria,
	mpe, acme, eranian, fweisbec, jolsa

On Fri, May 31, 2019 at 04:37:15PM +0100, Will Deacon wrote:
> Oh, nice! I think this happens because Power doesn't actually initialise
> the regs after a kthread execs() until late in start_thread(). But the plot
> thickens somewhat, since current_pt_regs() is different to
> task_pt_regs(current) on Power (the former cannot return NULL).
> 
> So a really hideous hack on top of Peter's patch might be:
> 
> diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
> index 0bbac612146e..5bde866024b6 100644
> --- a/arch/arm64/kernel/perf_regs.c
> +++ b/arch/arm64/kernel/perf_regs.c
> @@ -57,6 +57,6 @@ void perf_get_regs_user(struct perf_regs *regs_user,
>  			struct pt_regs *regs,
>  			struct pt_regs *regs_user_copy)
>  {
> -	regs_user->regs = task_pt_regs(current);
> +	regs_user->regs = current_pt_regs();
>  	regs_user->abi = perf_reg_abi(current);

^^^ Bah, this was clearly supposed to be a change in the powerpc code, but
you get the idea.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-31 15:37                   ` Will Deacon
  2019-06-03 11:23                     ` Will Deacon
@ 2019-06-03 11:48                     ` Peter Zijlstra
  2019-06-03 13:30                     ` Michael Ellerman
  2 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-06-03 11:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ravi Bangoria, Young Xiao, linux, mark.rutland, mingo, bp, hpa,
	x86, kan.liang, linux-arm-kernel, linux-kernel, ravi.bangoria,
	mpe, acme, eranian, fweisbec, jolsa

On Fri, May 31, 2019 at 04:37:03PM +0100, Will Deacon wrote:
> On Thu, May 30, 2019 at 03:57:36PM +0530, Ravi Bangoria wrote:
> > On 5/30/19 2:08 PM, Ravi Bangoria wrote:

> > >> --- a/kernel/events/core.c
> > >> +++ b/kernel/events/core.c
> > >> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct
> > >>  	if (user_mode(regs)) {
> > >>  		regs_user->abi = perf_reg_abi(current);
> > >>  		regs_user->regs = regs;
> > >> -	} else if (current->mm) {
> > >> +	} else if (!(current->flags & PF_KTHREAD)) {
> > 
> > With this patch applied and my patch reverted, I still see it crashing
> > because current->flags does not have PF_KTHREAD set. Sample trace with
> > v5.0 kernel:
> > 
> > 
> >    BUG: Kernel NULL pointer dereference at 0x00000000
> >    Faulting instruction address: 0xc0000000000f1a6c
> >    Oops: Kernel access of bad area, sig: 11 [#1]
> >    LE SMP NR_CPUS=2048 NUMA pSeries
> >    CPU: 17 PID: 3241 Comm: systemd-cgroups Kdump: loaded Not tainted 5.0.0+ #7
> >    NIP:  c0000000000f1a6c LR: c0000000002acc7c CTR: c0000000002a8f90
> >    REGS: c0000001e80469a0 TRAP: 0300   Not tainted  (5.0.0+)
> >    MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE>  CR: 48022448  XER: 20000000
> >    CFAR: c00000000000deb4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 
> >    GPR00: c0000000002acc7c c0000001e8046c30 c000000001607500 0000000000000000 
> >    GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000128618 
> >    GPR08: 000007ffffffffff 0000000000000000 ffffffffffffffff c00000000001cd40 
> >    GPR12: c000000000446fd8 c00000003ffdf080 00000000ffff0000 0000000000000007 
> >    GPR16: c0000001edd74988 c0000001edd60400 00007fff89801130 000000000005e1b0 
> >    GPR20: c0000001edb77a08 c0000001e8047208 c0000001f03d9800 c0000001e8046e00 
> >    GPR24: 000000000000b1af c000000001126938 c0000001f03d9b28 0000000000010000 
> >    GPR28: c0000001e8046d30 0000000000000000 0000000000000000 0000000000000000 
> >    NIP [c0000000000f1a6c] perf_reg_value+0x5c/0xc0
> >    LR [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0
> >    Call Trace:
> >    [c0000001e8046c30] [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 (unreliable)
> >    [c0000001e8046c80] [c0000000002b9cd0] perf_output_sample+0x620/0x8c0
> >    [c0000001e8046d10] [c0000000002ba6b4] perf_event_output_forward+0x64/0x90
> >    [c0000001e8046d80] [c0000000002b2908] __perf_event_overflow+0x88/0x1e0
> >    [c0000001e8046dd0] [c0000000000f3d18] record_and_restart+0x288/0x670
> >    [c0000001e8047180] [c0000000000f4c18] perf_event_interrupt+0x2b8/0x4b0
> >    [c0000001e8047280] [c00000000002b380] performance_monitor_exception+0x50/0x70
> >    [c0000001e80472a0] [c000000000009ca0] performance_monitor_common+0x110/0x120
> >    --- interrupt: f01 at slice_scan_available+0x20/0xc0
> >        LR = slice_find_area+0x174/0x210
> >    [c0000001e8047630] [c000000000083ea0] slice_get_unmapped_area+0x3d0/0x7f0
> >    [c0000001e8047ae0] [c00000000032d5b0] get_unmapped_area+0xa0/0x170
> >    [c0000001e8047b10] [c00000000001cd40] arch_setup_additional_pages+0xc0/0x1c0
> >    [c0000001e8047b60] [c000000000446fd8] load_elf_binary+0xb48/0x1580
> >    [c0000001e8047c80] [c0000000003c3938] search_binary_handler+0xe8/0x2a0
> >    [c0000001e8047d10] [c0000000003c42f4] __do_execve_file.isra.13+0x694/0x980
> >    [c0000001e8047de0] [c000000000128618] call_usermodehelper_exec_async+0x248/0x290
> >    [c0000001e8047e20] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80
> >    Instruction dump:
> >    7c9e2378 7c7f1b78 f8010010 f821ffd1 419e0044 3d22ff6b 7bc41764 3929ae10 
> >    7d29202e 2b890150 419d003c 38210030 <7c7f482a> e8010010 ebc1fff0 ebe1fff8 
> >    ---[ end trace 54f3492ad1d403d8 ]---
> 
> Oh, nice! I think this happens because Power doesn't actually initialise
> the regs after a kthread execs() until late in start_thread(). But the plot
> thickens somewhat, since current_pt_regs() is different to
> task_pt_regs(current) on Power (the former cannot return NULL).

So one possibility would be to have activate_mm() initialize the user
regs set. Doing it there ensure that the moment PF_KTHREAD gets cleared,
task_pt_regs() is valid.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] perf: Fix oops when kthread execs user process
  2019-05-31 15:37                   ` Will Deacon
  2019-06-03 11:23                     ` Will Deacon
  2019-06-03 11:48                     ` Peter Zijlstra
@ 2019-06-03 13:30                     ` Michael Ellerman
  2 siblings, 0 replies; 30+ messages in thread
From: Michael Ellerman @ 2019-06-03 13:30 UTC (permalink / raw)
  To: Will Deacon, Ravi Bangoria
  Cc: Peter Zijlstra, Young Xiao, linux, mark.rutland, mingo, bp, hpa,
	x86, kan.liang, linux-arm-kernel, linux-kernel, ravi.bangoria,
	acme, eranian, fweisbec, jolsa

Will Deacon <will.deacon@arm.com> writes:
> On Thu, May 30, 2019 at 03:57:36PM +0530, Ravi Bangoria wrote:
>> On 5/30/19 2:08 PM, Ravi Bangoria wrote:
>> >> ---
>> >> Subject: perf: Fix perf_sample_regs_user()
>> >> From: Peter Zijlstra <peterz@infradead.org>
>> >> Date: Wed May 29 14:37:24 CEST 2019
>> >>
>> >> perf_sample_regs_user() uses 'current->mm' to test for the presence of
>> >> userspace, but this is insufficient, consider use_mm().
>> >>
>> >> A better test is: '!(current->flags & PF_KTHREAD)', exec() clears
>> >> PF_KTHREAD after it sets the new ->mm but before it drops to userspace
>> >> for the first time.
>> > 
>> > This looks correct. I'll give it a try.
>> > 
>> >>
>> >> Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process")
>> >>
>> >> Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
>> >> Reported-by: Young Xiao <92siuyang@gmail.com>
>> >> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
>> >> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> >> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> >> Cc: Jiri Olsa <jolsa@redhat.com>
>> >> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> >> Cc: Stephane Eranian <eranian@google.com>
>> >> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
>> >> Acked-by: Will Deacon <will.deacon@arm.com>
>> >> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample")
>> >> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> >> ---
>> >>  kernel/events/core.c |    2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> --- a/kernel/events/core.c
>> >> +++ b/kernel/events/core.c
>> >> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct
>> >>  	if (user_mode(regs)) {
>> >>  		regs_user->abi = perf_reg_abi(current);
>> >>  		regs_user->regs = regs;
>> >> -	} else if (current->mm) {
>> >> +	} else if (!(current->flags & PF_KTHREAD)) {
>> 
>> With this patch applied and my patch reverted, I still see it crashing
>> because current->flags does not have PF_KTHREAD set. Sample trace with
>> v5.0 kernel:
>> 
>> 
>>    BUG: Kernel NULL pointer dereference at 0x00000000
>>    Faulting instruction address: 0xc0000000000f1a6c
>>    Oops: Kernel access of bad area, sig: 11 [#1]
>>    LE SMP NR_CPUS=2048 NUMA pSeries
>>    CPU: 17 PID: 3241 Comm: systemd-cgroups Kdump: loaded Not tainted 5.0.0+ #7
>>    NIP:  c0000000000f1a6c LR: c0000000002acc7c CTR: c0000000002a8f90
>>    REGS: c0000001e80469a0 TRAP: 0300   Not tainted  (5.0.0+)
>>    MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE>  CR: 48022448  XER: 20000000
>>    CFAR: c00000000000deb4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 
>>    GPR00: c0000000002acc7c c0000001e8046c30 c000000001607500 0000000000000000 
>>    GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000128618 
>>    GPR08: 000007ffffffffff 0000000000000000 ffffffffffffffff c00000000001cd40 
>>    GPR12: c000000000446fd8 c00000003ffdf080 00000000ffff0000 0000000000000007 
>>    GPR16: c0000001edd74988 c0000001edd60400 00007fff89801130 000000000005e1b0 
>>    GPR20: c0000001edb77a08 c0000001e8047208 c0000001f03d9800 c0000001e8046e00 
>>    GPR24: 000000000000b1af c000000001126938 c0000001f03d9b28 0000000000010000 
>>    GPR28: c0000001e8046d30 0000000000000000 0000000000000000 0000000000000000 
>>    NIP [c0000000000f1a6c] perf_reg_value+0x5c/0xc0
>>    LR [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0
>>    Call Trace:
>>    [c0000001e8046c30] [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 (unreliable)
>>    [c0000001e8046c80] [c0000000002b9cd0] perf_output_sample+0x620/0x8c0
>>    [c0000001e8046d10] [c0000000002ba6b4] perf_event_output_forward+0x64/0x90
>>    [c0000001e8046d80] [c0000000002b2908] __perf_event_overflow+0x88/0x1e0
>>    [c0000001e8046dd0] [c0000000000f3d18] record_and_restart+0x288/0x670
>>    [c0000001e8047180] [c0000000000f4c18] perf_event_interrupt+0x2b8/0x4b0
>>    [c0000001e8047280] [c00000000002b380] performance_monitor_exception+0x50/0x70
>>    [c0000001e80472a0] [c000000000009ca0] performance_monitor_common+0x110/0x120
>>    --- interrupt: f01 at slice_scan_available+0x20/0xc0
>>        LR = slice_find_area+0x174/0x210
>>    [c0000001e8047630] [c000000000083ea0] slice_get_unmapped_area+0x3d0/0x7f0
>>    [c0000001e8047ae0] [c00000000032d5b0] get_unmapped_area+0xa0/0x170
>>    [c0000001e8047b10] [c00000000001cd40] arch_setup_additional_pages+0xc0/0x1c0
>>    [c0000001e8047b60] [c000000000446fd8] load_elf_binary+0xb48/0x1580
>>    [c0000001e8047c80] [c0000000003c3938] search_binary_handler+0xe8/0x2a0
>>    [c0000001e8047d10] [c0000000003c42f4] __do_execve_file.isra.13+0x694/0x980
>>    [c0000001e8047de0] [c000000000128618] call_usermodehelper_exec_async+0x248/0x290
>>    [c0000001e8047e20] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80
>>    Instruction dump:
>>    7c9e2378 7c7f1b78 f8010010 f821ffd1 419e0044 3d22ff6b 7bc41764 3929ae10 
>>    7d29202e 2b890150 419d003c 38210030 <7c7f482a> e8010010 ebc1fff0 ebe1fff8 
>>    ---[ end trace 54f3492ad1d403d8 ]---
>
> Oh, nice! I think this happens because Power doesn't actually initialise
> the regs after a kthread execs() until late in start_thread().

Hmm, it's more or less at the top of start_thread(), but that's late vs
flush_old_exec(), so there's definitely a window there.

> But the plot thickens somewhat, since current_pt_regs() is different to
> task_pt_regs(current) on Power (the former cannot return NULL).

Ugh.

Mark had convinced me in the other part of the thread that returning
NULL for kthreads made sense, but having different results depending on
which similarly named accessor you use is gross.

We used to implement current_pt_regs() without actually looking at
current via:

	((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) - 1)

Where current_thread_info() just masked the stack pointer, so that was a
nice optimisation.

But now that we have THREAD_INFO_IN_TASK we're going via current anyway,
so we may as well just get rid of current_pt_regs() and make it a
synonym for task_pt_regs(current).

Though that will probably cause something else to break :D


> So a really hideous hack on top of Peter's patch might be:
>
> diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
> index 0bbac612146e..5bde866024b6 100644
> --- a/arch/arm64/kernel/perf_regs.c
> +++ b/arch/arm64/kernel/perf_regs.c
> @@ -57,6 +57,6 @@ void perf_get_regs_user(struct perf_regs *regs_user,
>  			struct pt_regs *regs,
>  			struct pt_regs *regs_user_copy)
>  {
> -	regs_user->regs = task_pt_regs(current);
> +	regs_user->regs = current_pt_regs();
>  	regs_user->abi = perf_reg_abi(current);


I'd be inclined to stick with what we've got, at least so long as we're
the only arch that lets task_pt_regs() return NULL.

void perf_get_regs_user(struct perf_regs *regs_user,
			struct pt_regs *regs,
			struct pt_regs *regs_user_copy)
{
	regs_user->regs = task_pt_regs(current);
	regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) :
			 PERF_SAMPLE_REGS_ABI_NONE;
}


cheers

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-06-03 13:30 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-28 12:31 [PATCH] perf: Fix oops when kthread execs user process Young Xiao
2019-05-28 12:41 ` Russell King - ARM Linux admin
2019-05-28 14:01 ` Peter Zijlstra
2019-05-28 15:32   ` Will Deacon
2019-05-28 16:12     ` Mark Rutland
2019-05-28 17:32     ` Peter Zijlstra
2019-05-29  9:17       ` Will Deacon
2019-05-29 10:10         ` Peter Zijlstra
2019-05-29 10:20           ` Will Deacon
2019-05-29 12:55             ` Peter Zijlstra
2019-05-29 13:05               ` Will Deacon
2019-05-29 13:25                 ` Peter Zijlstra
2019-05-29 14:35                   ` Will Deacon
2019-05-29 16:19                     ` Peter Zijlstra
2019-05-29 16:24                       ` Mark Rutland
2019-05-29 16:38                         ` Mark Rutland
2019-05-29 17:03                           ` Peter Zijlstra
2019-05-30 10:35                             ` Mark Rutland
2019-05-29 16:25                       ` Will Deacon
2019-05-29 16:44                         ` Peter Zijlstra
2019-05-30  7:28                           ` Will Deacon
2019-05-30  8:38               ` Ravi Bangoria
2019-05-30 10:27                 ` Ravi Bangoria
2019-05-31 15:37                   ` Will Deacon
2019-06-03 11:23                     ` Will Deacon
2019-06-03 11:48                     ` Peter Zijlstra
2019-06-03 13:30                     ` Michael Ellerman
2019-05-29 10:11       ` Mark Rutland
2019-05-29  4:21     ` Michael Ellerman
2019-05-29  1:44   ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).