LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] kernel: make TIF_NOTIFY_SIGNAL and core dumps co-exist
@ 2021-08-18  3:06 Jens Axboe
  2021-08-19  2:57 ` Linus Torvalds
  2021-08-23  4:55 ` Olivier Langlois
  0 siblings, 2 replies; 7+ messages in thread
From: Jens Axboe @ 2021-08-18  3:06 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Eric W. Biederman, Oleg Nesterov,
	Olivier Langlois, Tony Battersby

task_work being added with notify == TWA_SIGNAL will utilize
TIF_NOTIFY_SIGNAL for signaling the targeted task that work is available.
If this happens while a task is going through a core dump, it'll
potentially disturb and truncate the dump as a signal interruption.

Have task_work_add() with notify == TWA_SIGNAL check if a task has been
signaled for a core dump, and refuse to add the work if that is the case.
When a core dump is invoked, explicitly check for TIF_NOTIFY_SIGNAL and
run any pending task_work if that is set. This is similar to how an
exiting task will not get new task_work added, and we return the same
error for the core dump case. As we return success or failure from
task_work_add(), the caller has to be prepared to handle this case
already.

Currently this manifests itself in that io_uring tasks that end up using
task_work will experience truncated core dumps.

Reported-by: Tony Battersby <tonyb@cybernetics.com>
Reported-by: Olivier Langlois <olivier@trillion01.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

diff --git a/fs/coredump.c b/fs/coredump.c
index 07afb5ddb1c4..ca7c1ee44ada 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -602,6 +602,14 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		.mm_flags = mm->flags,
 	};
 
+	/*
+	 * task_work_add() will refuse to add work after PF_SIGNALED has
+	 * been set, ensure that we flush any pending TIF_NOTIFY_SIGNAL work
+	 * if any was queued before that.
+	 */
+	if (test_thread_flag(TIF_NOTIFY_SIGNAL))
+		tracehook_notify_signal();
+
 	audit_core_dumps(siginfo->si_signo);
 
 	binfmt = mm->binfmt;
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 1698fbe6f0e1..1ab28904adc4 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
 		head = READ_ONCE(task->task_works);
 		if (unlikely(head == &work_exited))
 			return -ESRCH;
+		/*
+		 * TIF_NOTIFY_SIGNAL notifications will interfere with
+		 * a core dump in progress, reject them.
+		 */
+		if (notify == TWA_SIGNAL && (task->flags & PF_SIGNALED))
+			return -ESRCH;
 		work->next = head;
 	} while (cmpxchg(&task->task_works, head, work) != head);

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: make TIF_NOTIFY_SIGNAL and core dumps co-exist
  2021-08-18  3:06 [PATCH] kernel: make TIF_NOTIFY_SIGNAL and core dumps co-exist Jens Axboe
@ 2021-08-19  2:57 ` Linus Torvalds
  2021-08-19 14:59   ` Jens Axboe
  2021-08-23  4:55 ` Olivier Langlois
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2021-08-19  2:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: LKML, Eric W. Biederman, Oleg Nesterov, Olivier Langlois, Tony Battersby

On Tue, Aug 17, 2021 at 8:06 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> task_work being added with notify == TWA_SIGNAL will utilize
> TIF_NOTIFY_SIGNAL for signaling the targeted task that work is available.
> If this happens while a task is going through a core dump, it'll
> potentially disturb and truncate the dump as a signal interruption.

This patch seems (a) buggy and (b) hacky.

> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>                 head = READ_ONCE(task->task_works);
>                 if (unlikely(head == &work_exited))
>                         return -ESRCH;
> +               /*
> +                * TIF_NOTIFY_SIGNAL notifications will interfere with
> +                * a core dump in progress, reject them.
> +                */
> +               if (notify == TWA_SIGNAL && (task->flags & PF_SIGNALED))
> +                       return -ESRCH;

This basically seems to check task->flags with no serialization.

I'm sure it works 99.9% of the time in practice, since you'd be really
unlucky to hit any races, but I really don't see what the
serialization logic is.

Also, the main user that actually triggered the problem already has

        if (unlikely(tsk->flags & PF_EXITING))
                goto fail;

just above the call to task_work_add(), so this all seems very hacky indeed.

Of course, I don't see what the serialization for _that_ one is either.

Pls explain. You can't just randomly add tests for random flags that
get modified by other random code.

               Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: make TIF_NOTIFY_SIGNAL and core dumps co-exist
  2021-08-19  2:57 ` Linus Torvalds
@ 2021-08-19 14:59   ` Jens Axboe
  2021-08-22 20:55     ` Olivier Langlois
  2022-03-21 18:20     ` Tony Battersby
  0 siblings, 2 replies; 7+ messages in thread
From: Jens Axboe @ 2021-08-19 14:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Eric W. Biederman, Oleg Nesterov, Olivier Langlois, Tony Battersby

On 8/18/21 8:57 PM, Linus Torvalds wrote:
> On Tue, Aug 17, 2021 at 8:06 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> task_work being added with notify == TWA_SIGNAL will utilize
>> TIF_NOTIFY_SIGNAL for signaling the targeted task that work is available.
>> If this happens while a task is going through a core dump, it'll
>> potentially disturb and truncate the dump as a signal interruption.
> 
> This patch seems (a) buggy and (b) hacky.
> 
>> --- a/kernel/task_work.c
>> +++ b/kernel/task_work.c
>> @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>>                 head = READ_ONCE(task->task_works);
>>                 if (unlikely(head == &work_exited))
>>                         return -ESRCH;
>> +               /*
>> +                * TIF_NOTIFY_SIGNAL notifications will interfere with
>> +                * a core dump in progress, reject them.
>> +                */
>> +               if (notify == TWA_SIGNAL && (task->flags & PF_SIGNALED))
>> +                       return -ESRCH;
> 
> This basically seems to check task->flags with no serialization.
> 
> I'm sure it works 99.9% of the time in practice, since you'd be really
> unlucky to hit any races, but I really don't see what the
> serialization logic is.
> 
> Also, the main user that actually triggered the problem already has
> 
>         if (unlikely(tsk->flags & PF_EXITING))
>                 goto fail;
> 
> just above the call to task_work_add(), so this all seems very hacky indeed.
> 
> Of course, I don't see what the serialization for _that_ one is either.
> 
> Pls explain. You can't just randomly add tests for random flags that
> get modified by other random code.

You're absolutely right. On the io_uring side, in the current tree,
there's only one check where current != task being checked - and that's
in the poll rewait arming. That one should likely just go away. It may
be fine as it is, as it just pertains to ring exit cancelations. We want
to ensure that we don't rearm poll requests if the process is canceling
and going away. I'll take a closer look at that one.

For this particular patch, I agree it's racy. I'll see if I can come up
with something better...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: make TIF_NOTIFY_SIGNAL and core dumps co-exist
  2021-08-19 14:59   ` Jens Axboe
@ 2021-08-22 20:55     ` Olivier Langlois
  2022-03-21 18:20     ` Tony Battersby
  1 sibling, 0 replies; 7+ messages in thread
From: Olivier Langlois @ 2021-08-22 20:55 UTC (permalink / raw)
  To: Jens Axboe, Linus Torvalds
  Cc: LKML, Eric W. Biederman, Oleg Nesterov, Tony Battersby

On Thu, 2021-08-19 at 08:59 -0600, Jens Axboe wrote:
> 
> You're absolutely right. On the io_uring side, in the current tree,
> there's only one check where current != task being checked - and
> that's
> in the poll rewait arming. That one should likely just go away. It
> may
> be fine as it is, as it just pertains to ring exit cancelations. We
> want
> to ensure that we don't rearm poll requests if the process is
> canceling
> and going away. I'll take a closer look at that one.
> 
> For this particular patch, I agree it's racy. I'll see if I can come
> up
> with something better...
> 
I have finally found the patch that you wanted me to test. I'm going to
do it ASAP despite still having issue.

I do have a different approach to solve the same core dump issue.

Feel free to consider it if this can avoid the race condition described
here.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: make TIF_NOTIFY_SIGNAL and core dumps co-exist
  2021-08-18  3:06 [PATCH] kernel: make TIF_NOTIFY_SIGNAL and core dumps co-exist Jens Axboe
  2021-08-19  2:57 ` Linus Torvalds
@ 2021-08-23  4:55 ` Olivier Langlois
  1 sibling, 0 replies; 7+ messages in thread
From: Olivier Langlois @ 2021-08-23  4:55 UTC (permalink / raw)
  To: Jens Axboe, LKML
  Cc: Linus Torvalds, Eric W. Biederman, Oleg Nesterov, Tony Battersby

On Tue, 2021-08-17 at 21:06 -0600, Jens Axboe wrote:
> task_work being added with notify == TWA_SIGNAL will utilize
> TIF_NOTIFY_SIGNAL for signaling the targeted task that work is
> available.
> If this happens while a task is going through a core dump, it'll
> potentially disturb and truncate the dump as a signal interruption.
> 
> Have task_work_add() with notify == TWA_SIGNAL check if a task has
> been
> signaled for a core dump, and refuse to add the work if that is the
> case.
> When a core dump is invoked, explicitly check for TIF_NOTIFY_SIGNAL
> and
> run any pending task_work if that is set. This is similar to how an
> exiting task will not get new task_work added, and we return the same
> error for the core dump case. As we return success or failure from
> task_work_add(), the caller has to be prepared to handle this case
> already.
> 
> Currently this manifests itself in that io_uring tasks that end up
> using
> task_work will experience truncated core dumps.
> 
> Reported-by: Tony Battersby <tonyb@cybernetics.com>
> Reported-by: Olivier Langlois <olivier@trillion01.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: stable@vger.kernel.org # 5.10+
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 07afb5ddb1c4..ca7c1ee44ada 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -602,6 +602,14 @@ void do_coredump(const kernel_siginfo_t
> *siginfo)
>                 .mm_flags = mm->flags,
>         };
>  
> +       /*
> +        * task_work_add() will refuse to add work after PF_SIGNALED
> has
> +        * been set, ensure that we flush any pending
> TIF_NOTIFY_SIGNAL work
> +        * if any was queued before that.
> +        */
> +       if (test_thread_flag(TIF_NOTIFY_SIGNAL))
> +               tracehook_notify_signal();
> +
>         audit_core_dumps(siginfo->si_signo);
>  
>         binfmt = mm->binfmt;
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index 1698fbe6f0e1..1ab28904adc4 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct
> callback_head *work,
>                 head = READ_ONCE(task->task_works);
>                 if (unlikely(head == &work_exited))
>                         return -ESRCH;
> +               /*
> +                * TIF_NOTIFY_SIGNAL notifications will interfere
> with
> +                * a core dump in progress, reject them.
> +                */
> +               if (notify == TWA_SIGNAL && (task->flags &
> PF_SIGNALED))
> +                       return -ESRCH;
>                 work->next = head;
>         } while (cmpxchg(&task->task_works, head, work) != head);
> 

tested successfully on 5.12.19

Tested-by: Olivier Langlois <olivier@trillion01.com>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: make TIF_NOTIFY_SIGNAL and core dumps co-exist
  2021-08-19 14:59   ` Jens Axboe
  2021-08-22 20:55     ` Olivier Langlois
@ 2022-03-21 18:20     ` Tony Battersby
  2022-03-22 15:04       ` Eric W. Biederman
  1 sibling, 1 reply; 7+ messages in thread
From: Tony Battersby @ 2022-03-21 18:20 UTC (permalink / raw)
  To: Jens Axboe, Linus Torvalds
  Cc: LKML, Eric W. Biederman, Oleg Nesterov, Olivier Langlois

On 8/19/21 10:59, Jens Axboe wrote:
> On 8/18/21 8:57 PM, Linus Torvalds wrote:
>> On Tue, Aug 17, 2021 at 8:06 PM Jens Axboe <axboe@kernel.dk> wrote:
>>> task_work being added with notify == TWA_SIGNAL will utilize
>>> TIF_NOTIFY_SIGNAL for signaling the targeted task that work is available.
>>> If this happens while a task is going through a core dump, it'll
>>> potentially disturb and truncate the dump as a signal interruption.
>> This patch seems (a) buggy and (b) hacky.
>>
>>> --- a/kernel/task_work.c
>>> +++ b/kernel/task_work.c
>>> @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>>>                 head = READ_ONCE(task->task_works);
>>>                 if (unlikely(head == &work_exited))
>>>                         return -ESRCH;
>>> +               /*
>>> +                * TIF_NOTIFY_SIGNAL notifications will interfere with
>>> +                * a core dump in progress, reject them.
>>> +                */
>>> +               if (notify == TWA_SIGNAL && (task->flags & PF_SIGNALED))
>>> +                       return -ESRCH;
>> This basically seems to check task->flags with no serialization.
>>
>> I'm sure it works 99.9% of the time in practice, since you'd be really
>> unlucky to hit any races, but I really don't see what the
>> serialization logic is.
>>
>> Also, the main user that actually triggered the problem already has
>>
>>         if (unlikely(tsk->flags & PF_EXITING))
>>                 goto fail;
>>
>> just above the call to task_work_add(), so this all seems very hacky indeed.
>>
>> Of course, I don't see what the serialization for _that_ one is either.
>>
>> Pls explain. You can't just randomly add tests for random flags that
>> get modified by other random code.
> You're absolutely right. On the io_uring side, in the current tree,
> there's only one check where current != task being checked - and that's
> in the poll rewait arming. That one should likely just go away. It may
> be fine as it is, as it just pertains to ring exit cancelations. We want
> to ensure that we don't rearm poll requests if the process is canceling
> and going away. I'll take a closer look at that one.
>
> For this particular patch, I agree it's racy. I'll see if I can come up
> with something better...
>

Continuing this thread from August 2021:

I previously tested a version of Jens' patch backported to 5.10 and it
fixed my problem.  Now I am trying to upgrade kernels, and 5.17 still
has the same problem - coredumps from an io_uring program to a pipe are
truncated.  Jens' patch applied to 5.17 again fixes the problem.  Has
there been any progress with fixing the problem upstream?

Reference:

https://lore.kernel.org/all/8af373ec-9609-35a4-f185-f9bdc63d39b7@cybernetics.com/
https://lore.kernel.org/all/76d3418c-e9ba-4392-858a-5da8028e3526@kernel.dk/

Tony Battersby
Cybernetics


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: make TIF_NOTIFY_SIGNAL and core dumps co-exist
  2022-03-21 18:20     ` Tony Battersby
@ 2022-03-22 15:04       ` Eric W. Biederman
  0 siblings, 0 replies; 7+ messages in thread
From: Eric W. Biederman @ 2022-03-22 15:04 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Jens Axboe, Linus Torvalds, LKML, Oleg Nesterov, Olivier Langlois

Tony Battersby <tonyb@cybernetics.com> writes:

> On 8/19/21 10:59, Jens Axboe wrote:
>> On 8/18/21 8:57 PM, Linus Torvalds wrote:
>>> On Tue, Aug 17, 2021 at 8:06 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> task_work being added with notify == TWA_SIGNAL will utilize
>>>> TIF_NOTIFY_SIGNAL for signaling the targeted task that work is available.
>>>> If this happens while a task is going through a core dump, it'll
>>>> potentially disturb and truncate the dump as a signal interruption.
>>> This patch seems (a) buggy and (b) hacky.
>>>
>>>> --- a/kernel/task_work.c
>>>> +++ b/kernel/task_work.c
>>>> @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>>>>                 head = READ_ONCE(task->task_works);
>>>>                 if (unlikely(head == &work_exited))
>>>>                         return -ESRCH;
>>>> +               /*
>>>> +                * TIF_NOTIFY_SIGNAL notifications will interfere with
>>>> +                * a core dump in progress, reject them.
>>>> +                */
>>>> +               if (notify == TWA_SIGNAL && (task->flags & PF_SIGNALED))
>>>> +                       return -ESRCH;
>>> This basically seems to check task->flags with no serialization.
>>>
>>> I'm sure it works 99.9% of the time in practice, since you'd be really
>>> unlucky to hit any races, but I really don't see what the
>>> serialization logic is.
>>>
>>> Also, the main user that actually triggered the problem already has
>>>
>>>         if (unlikely(tsk->flags & PF_EXITING))
>>>                 goto fail;
>>>
>>> just above the call to task_work_add(), so this all seems very hacky indeed.
>>>
>>> Of course, I don't see what the serialization for _that_ one is either.
>>>
>>> Pls explain. You can't just randomly add tests for random flags that
>>> get modified by other random code.
>> You're absolutely right. On the io_uring side, in the current tree,
>> there's only one check where current != task being checked - and that's
>> in the poll rewait arming. That one should likely just go away. It may
>> be fine as it is, as it just pertains to ring exit cancelations. We want
>> to ensure that we don't rearm poll requests if the process is canceling
>> and going away. I'll take a closer look at that one.
>>
>> For this particular patch, I agree it's racy. I'll see if I can come up
>> with something better...
>>
>
> Continuing this thread from August 2021:
>
> I previously tested a version of Jens' patch backported to 5.10 and it
> fixed my problem.  Now I am trying to upgrade kernels, and 5.17 still
> has the same problem - coredumps from an io_uring program to a pipe are
> truncated.  Jens' patch applied to 5.17 again fixes the problem.  Has
> there been any progress with fixing the problem upstream?
>
> Reference:
>
> https://lore.kernel.org/all/8af373ec-9609-35a4-f185-f9bdc63d39b7@cybernetics.com/
> https://lore.kernel.org/all/76d3418c-e9ba-4392-858a-5da8028e3526@kernel.dk/

I am still slowly working on this.  (I was unfortunately preempted by
some painful to track down and fix regressions elsewhere).

When I was doubly checking to be certain I understood the problem the
case you describe is one of the easy cases that needs to be handled.

There is at least one more difficult interaction that is not solved by
squelching task_work_add after PF_SIGNALED is set, and I am not 100%
convinced that it is even correct to squelch task_work_add at that point
in the code.

The progress I have made to date that I am sending to Linus for v5.18 is
the removal of tracehook.h which makes the code much more
understandable.

I think I have a general solution that I am planning to post after
v5.18-rc1 that I have not tested yet on the cases that I know about,
but I expect it will work.

So I think that puts a good general fix 2-3 weeks out.

This is quite possibly a case where perfection is getting in the way of
the good, but I honestly can't judge anything except a fix that cleans
up everything and is complete.  There are too many weird and subtle
interactions that I don't understand.

So I am going to continue concentrating on a good general solution so
that the code is readable and makes sense.

Eric


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-22 15:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-18  3:06 [PATCH] kernel: make TIF_NOTIFY_SIGNAL and core dumps co-exist Jens Axboe
2021-08-19  2:57 ` Linus Torvalds
2021-08-19 14:59   ` Jens Axboe
2021-08-22 20:55     ` Olivier Langlois
2022-03-21 18:20     ` Tony Battersby
2022-03-22 15:04       ` Eric W. Biederman
2021-08-23  4:55 ` Olivier Langlois

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).