LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] ANDROID: binder: fix binder work return error is wrongly consumed
@ 2018-05-02 14:30 gityuan
  2018-05-02 15:55 ` 答复: " 袁辉辉
  2018-05-03 17:51 ` Martijn Coenen
  0 siblings, 2 replies; 3+ messages in thread
From: gityuan @ 2018-05-02 14:30 UTC (permalink / raw)
  To: gregkh, maco, tkjos, arve; +Cc: devel, linux-kernel, yuanhuihui

From: yuanhuihui <yuanhuihui@xiaomi.com>

transaction consume mismatched binder work. as follow:

1) Client send transaction when left binder work in the thread->todo
list.

2) in kernel space, client consumed BC_TRANSACTION, and produced
BINDER_WORK_RETURN_ERROR into thread->todo list as target proc is dead

3) Client should get return error from thread->todo firstly,
finish this transaction, and back to userspace immediately.

4) In fact, client pick up mismatched binder work cause unexpected
results because of the previous remaining binder work. maybe cause return error
consumed by other transaction context.

Recently I encountered some versions of mobile phones to reproduce this
problem that is  oneway binder call can be blocked forever in
system_server process. the root cause is that BINDER_WORK_RETURN_ERROR
is comsumed by mismatch transacation context. binder thread is waiting
for BINDER_WORK_RETURN_ERROR:

    "Binder:1253_C" prio=5 tid=107 Native
    at android.os.BinderProxy.transactNative(Native method)
    at android.os.BinderProxy.transact(Binder.java:767)
    at android.app.IApplicationThread$Stub$Proxy.bindApplication(IApplicationThread.java:1398)
    at com.android.server.am.ActivityManagerService.attachApplicationLocked(ActivityManagerService.java:7320)
    at com.android.server.am.ActivityManagerService.attachApplication(ActivityManagerService.java:7425)
    - locked <0x0bc6ee81> (a com.android.server.am.ActivityManagerService)
    at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:291)
    at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2997)
    at android.os.Binder.execTransact(Binder.java:700)

I see newest binder driver fixed by "ANDROID: binder: don't enqueue
death notifications to thread todo". It really can solve the
BINDER_WORK_DEAD_BINDER nesting  scenarios :)

But there is potential risks in the future, future functional extensions
need to consider nesting issues, maybe extending more methods where we
push to thread->todo. I think that using queueing return error transaction
to the head of thread todo list is more appropriate, as follows:

1) During a transaction, the client will add BINDER_WORK_RETURN_ERROR into
thread->todo list, however, it would be better to pick up
BINDER_WORK_RETURN_ERROR firstly and finish the transaction immediately,
jump out of the nest.

2) Client pick up the left binder work from thread->todo, using the
same thread, do not need wake up other idle binder thread.

3) It works fine in the old binder version (before split big binder
lock), binder_transaction only set thread return_error when target
process has dead, do not add BINDER_WORK_RETURN_ERROR into thread->todo,
but binder_thread_read() check return_error firstly. If occurs return_error,
finish this transaction and back to userspace immediately.

So I prefer to put the BINDER_WORK_RETURN_ERROR to the head of the
queue, same as the old version of binder driver, once and for all.

Signed-off-by: yuanhuihui <yuanhuihui@xiaomi.com>
---
 drivers/android/binder.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 4eab5be3d00f..1ed1809b8769 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -861,6 +861,60 @@ binder_enqueue_thread_work(struct binder_thread *thread,
 	binder_inner_proc_unlock(thread->proc);
 }
 
+/**
+ * binder_enqueue_work_head_ilocked() - Add an item to the head of work list
+ * @work:         struct binder_work to add to list
+ * @target_list:  list to add work to
+ *
+ * Adds the work to the specified list. Asserts that work
+ * is not already on a list.
+ *
+ * Requires the proc->inner_lock to be held.
+ */
+static void
+binder_enqueue_work_head_ilocked(struct binder_work *work,
+			   struct list_head *target_list)
+{
+	BUG_ON(target_list == NULL);
+	BUG_ON(work->entry.next && !list_empty(&work->entry));
+	list_add(&work->entry, target_list);
+}
+
+/**
+ * binder_enqueue_thread_work_head_ilocked() - Add an item to the head of thread work list
+ * @thread:       thread to queue work to
+ * @work:         struct binder_work to add to list
+ *
+ * Adds the work to the head of thread todo list, and enables processing
+ * of the todo queue.
+ *
+ * Requires the proc->inner_lock to be held.
+ */
+static void
+binder_enqueue_thread_work_head_ilocked(struct binder_thread *thread,
+				   struct binder_work *work)
+{
+	binder_enqueue_work_head_ilocked(work, &thread->todo);
+	thread->process_todo = true;
+}
+
+/**
+ * binder_enqueue_thread_work_head() - Add an item to the head of thread work list
+ * @thread:       thread to queue work to
+ * @work:         struct binder_work to add to list
+ *
+ * Adds the work to the head of thread todo list, and enables processing
+ * of the todo queue.
+ */
+static void
+binder_enqueue_thread_work_head(struct binder_thread *thread,
+			   struct binder_work *work)
+{
+	binder_inner_proc_lock(thread->proc);
+	binder_enqueue_thread_work_head_ilocked(thread, work);
+	binder_inner_proc_unlock(thread->proc);
+}
+
 static void
 binder_dequeue_work_ilocked(struct binder_work *work)
 {
@@ -3287,11 +3341,11 @@ static void binder_transaction(struct binder_proc *proc,
 	BUG_ON(thread->return_error.cmd != BR_OK);
 	if (in_reply_to) {
 		thread->return_error.cmd = BR_TRANSACTION_COMPLETE;
-		binder_enqueue_thread_work(thread, &thread->return_error.work);
+		binder_enqueue_thread_work_head(thread, &thread->return_error.work);
 		binder_send_failed_reply(in_reply_to, return_error);
 	} else {
 		thread->return_error.cmd = return_error;
-		binder_enqueue_thread_work(thread, &thread->return_error.work);
+		binder_enqueue_thread_work_head(thread, &thread->return_error.work);
 	}
 }
 
@@ -3929,6 +3983,7 @@ static int binder_thread_read(struct binder_proc *proc,
 			ptr += sizeof(uint32_t);
 
 			binder_stat_br(proc, thread, e->cmd);
+			goto done; /* RETURN_ERROR notifications can finish transactions */
 		} break;
 		case BINDER_WORK_TRANSACTION_COMPLETE: {
 			binder_inner_proc_unlock(proc);
-- 
2.14.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* 答复: [PATCH] ANDROID: binder: fix binder work return error is wrongly consumed
  2018-05-02 14:30 [PATCH] ANDROID: binder: fix binder work return error is wrongly consumed gityuan
@ 2018-05-02 15:55 ` 袁辉辉
  2018-05-03 17:51 ` Martijn Coenen
  1 sibling, 0 replies; 3+ messages in thread
From: 袁辉辉 @ 2018-05-02 15:55 UTC (permalink / raw)
  To: gityuan, gregkh, maco, tkjos, arve; +Cc: devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7541 bytes --]

Hi,

I add one nesting  scenarios which is binder work return error is wrongly consumed in here: 
https://issuetracker.google.com/u/0/issues/79123724

Interpretation of the  process see attached:
(+ stands for production, - stands for consumption,BW_XXX is a shorthand for BINDE_WORK_XXX)

1) The first oneway reportOneDeath() consumed BINDER_WORK_RETURN_ERROR which is producted by bindApplication()

2) Ths second non-oneway reportOneDeath() consumed BR_TRANSACTION_COMPLETE which left by the first reportOneDeath()

3) The thread will hang in bindApplication() forever, because BINDER_WORK_RETURN_ERROR  
which is producted by bindApplication() is wrongly consumed by reportOneDeath()


________________________________________
发件人: gityuan@gmail.com <gityuan@gmail.com>
发送时间: 2018年5月2日 22:30
收件人: gregkh@linuxfoundation.org; maco@android.com; tkjos@google.com; arve@android.com
抄送: devel@driverdev.osuosl.org; linux-kernel@vger.kernel.org; 袁辉辉
主题: [PATCH] ANDROID: binder: fix binder work return error is wrongly consumed

From: yuanhuihui <yuanhuihui@xiaomi.com>

transaction consume mismatched binder work. as follow:

1) Client send transaction when left binder work in the thread->todo
list.

2) in kernel space, client consumed BC_TRANSACTION, and produced
BINDER_WORK_RETURN_ERROR into thread->todo list as target proc is dead

3) Client should get return error from thread->todo firstly,
finish this transaction, and back to userspace immediately.

4) In fact, client pick up mismatched binder work cause unexpected
results because of the previous remaining binder work. maybe cause return error
consumed by other transaction context.

Recently I encountered some versions of mobile phones to reproduce this
problem that is  oneway binder call can be blocked forever in
system_server process. the root cause is that BINDER_WORK_RETURN_ERROR
is comsumed by mismatch transacation context. binder thread is waiting
for BINDER_WORK_RETURN_ERROR:

    "Binder:1253_C" prio=5 tid=107 Native
    at android.os.BinderProxy.transactNative(Native method)
    at android.os.BinderProxy.transact(Binder.java:767)
    at android.app.IApplicationThread$Stub$Proxy.bindApplication(IApplicationThread.java:1398)
    at com.android.server.am.ActivityManagerService.attachApplicationLocked(ActivityManagerService.java:7320)
    at com.android.server.am.ActivityManagerService.attachApplication(ActivityManagerService.java:7425)
    - locked <0x0bc6ee81> (a com.android.server.am.ActivityManagerService)
    at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:291)
    at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2997)
    at android.os.Binder.execTransact(Binder.java:700)

I see newest binder driver fixed by "ANDROID: binder: don't enqueue
death notifications to thread todo". It really can solve the
BINDER_WORK_DEAD_BINDER nesting  scenarios :)

But there is potential risks in the future, future functional extensions
need to consider nesting issues, maybe extending more methods where we
push to thread->todo. I think that using queueing return error transaction
to the head of thread todo list is more appropriate, as follows:

1) During a transaction, the client will add BINDER_WORK_RETURN_ERROR into
thread->todo list, however, it would be better to pick up
BINDER_WORK_RETURN_ERROR firstly and finish the transaction immediately,
jump out of the nest.

2) Client pick up the left binder work from thread->todo, using the
same thread, do not need wake up other idle binder thread.

3) It works fine in the old binder version (before split big binder
lock), binder_transaction only set thread return_error when target
process has dead, do not add BINDER_WORK_RETURN_ERROR into thread->todo,
but binder_thread_read() check return_error firstly. If occurs return_error,
finish this transaction and back to userspace immediately.

So I prefer to put the BINDER_WORK_RETURN_ERROR to the head of the
queue, same as the old version of binder driver, once and for all.

Signed-off-by: yuanhuihui <yuanhuihui@xiaomi.com>
---
 drivers/android/binder.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 4eab5be3d00f..1ed1809b8769 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -861,6 +861,60 @@ binder_enqueue_thread_work(struct binder_thread *thread,
        binder_inner_proc_unlock(thread->proc);
 }

+/**
+ * binder_enqueue_work_head_ilocked() - Add an item to the head of work list
+ * @work:         struct binder_work to add to list
+ * @target_list:  list to add work to
+ *
+ * Adds the work to the specified list. Asserts that work
+ * is not already on a list.
+ *
+ * Requires the proc->inner_lock to be held.
+ */
+static void
+binder_enqueue_work_head_ilocked(struct binder_work *work,
+                          struct list_head *target_list)
+{
+       BUG_ON(target_list == NULL);
+       BUG_ON(work->entry.next && !list_empty(&work->entry));
+       list_add(&work->entry, target_list);
+}
+
+/**
+ * binder_enqueue_thread_work_head_ilocked() - Add an item to the head of thread work list
+ * @thread:       thread to queue work to
+ * @work:         struct binder_work to add to list
+ *
+ * Adds the work to the head of thread todo list, and enables processing
+ * of the todo queue.
+ *
+ * Requires the proc->inner_lock to be held.
+ */
+static void
+binder_enqueue_thread_work_head_ilocked(struct binder_thread *thread,
+                                  struct binder_work *work)
+{
+       binder_enqueue_work_head_ilocked(work, &thread->todo);
+       thread->process_todo = true;
+}
+
+/**
+ * binder_enqueue_thread_work_head() - Add an item to the head of thread work list
+ * @thread:       thread to queue work to
+ * @work:         struct binder_work to add to list
+ *
+ * Adds the work to the head of thread todo list, and enables processing
+ * of the todo queue.
+ */
+static void
+binder_enqueue_thread_work_head(struct binder_thread *thread,
+                          struct binder_work *work)
+{
+       binder_inner_proc_lock(thread->proc);
+       binder_enqueue_thread_work_head_ilocked(thread, work);
+       binder_inner_proc_unlock(thread->proc);
+}
+
 static void
 binder_dequeue_work_ilocked(struct binder_work *work)
 {
@@ -3287,11 +3341,11 @@ static void binder_transaction(struct binder_proc *proc,
        BUG_ON(thread->return_error.cmd != BR_OK);
        if (in_reply_to) {
                thread->return_error.cmd = BR_TRANSACTION_COMPLETE;
-               binder_enqueue_thread_work(thread, &thread->return_error.work);
+               binder_enqueue_thread_work_head(thread, &thread->return_error.work);
                binder_send_failed_reply(in_reply_to, return_error);
        } else {
                thread->return_error.cmd = return_error;
-               binder_enqueue_thread_work(thread, &thread->return_error.work);
+               binder_enqueue_thread_work_head(thread, &thread->return_error.work);
        }
 }

@@ -3929,6 +3983,7 @@ static int binder_thread_read(struct binder_proc *proc,
                        ptr += sizeof(uint32_t);

                        binder_stat_br(proc, thread, e->cmd);
+                       goto done; /* RETURN_ERROR notifications can finish transactions */
                } break;
                case BINDER_WORK_TRANSACTION_COMPLETE: {
                        binder_inner_proc_unlock(proc);
--
2.14.1


[-- Attachment #2: oneway_binder_call_hang.jpg --]
[-- Type: image/jpeg, Size: 274767 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] ANDROID: binder: fix binder work return error is wrongly consumed
  2018-05-02 14:30 [PATCH] ANDROID: binder: fix binder work return error is wrongly consumed gityuan
  2018-05-02 15:55 ` 答复: " 袁辉辉
@ 2018-05-03 17:51 ` Martijn Coenen
  1 sibling, 0 replies; 3+ messages in thread
From: Martijn Coenen @ 2018-05-03 17:51 UTC (permalink / raw)
  To: gityuan
  Cc: Greg KH, Todd Kjos, Arve Hjønnevåg,
	open list:ANDROID DRIVERS, LKML, yuanhuihui

On Wed, May 2, 2018 at 7:30 AM,  <gityuan@gmail.com> wrote:
> But there is potential risks in the future, future functional extensions
> need to consider nesting issues, maybe extending more methods where we
> push to thread->todo. I think that using queueing return error transaction
> to the head of thread todo list is more appropriate, as follows:

Historically it was not safe to issue binder transactions from death
recipients because of issues like this - though I don't think we've
ever been clear about that in the documentation. "ANDROID: binder:
don't enqueue
death notifications to thread todo" fixes that, and makes it safe to
do - though the current driver may still have an issue if two death
receipts are queued (need to look into that). If we ever were to add
functionality into the driver again that queues something to
thread->todo that could result in nesting, we would need to do more
than just this change to make it safe. Consider this scenario, which
would fail even with your patch:

1) BR_DEAD_BINDER is in thread->todo
2) We issue a new transaction T1 to a different process, which succeeds
3) We queue the reply BR_REPLY for T1 to thread->todo and return to userspace
4) userspace finds BR_DEAD_BINDER first, runs a death recipient which
issues a new binder transaction, T2
5) We find the BR_REPLY for T1 instead of T2, and now we have a problem

So, this fix by itself won't make it safe, so I think instead we
should always prevent such nesting in the driver. That also keeps the
code simpler and easier to understand - just queue things in order.

Thanks,
Martijn

>
> 1) During a transaction, the client will add BINDER_WORK_RETURN_ERROR into
> thread->todo list, however, it would be better to pick up
> BINDER_WORK_RETURN_ERROR firstly and finish the transaction immediately,
> jump out of the nest.
>
> 2) Client pick up the left binder work from thread->todo, using the
> same thread, do not need wake up other idle binder thread.
>
> 3) It works fine in the old binder version (before split big binder
> lock), binder_transaction only set thread return_error when target
> process has dead, do not add BINDER_WORK_RETURN_ERROR into thread->todo,
> but binder_thread_read() check return_error firstly. If occurs return_error,
> finish this transaction and back to userspace immediately.
>
> So I prefer to put the BINDER_WORK_RETURN_ERROR to the head of the
> queue, same as the old version of binder driver, once and for all.
>
> Signed-off-by: yuanhuihui <yuanhuihui@xiaomi.com>
> ---
>  drivers/android/binder.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 4eab5be3d00f..1ed1809b8769 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -861,6 +861,60 @@ binder_enqueue_thread_work(struct binder_thread *thread,
>         binder_inner_proc_unlock(thread->proc);
>  }
>
> +/**
> + * binder_enqueue_work_head_ilocked() - Add an item to the head of work list
> + * @work:         struct binder_work to add to list
> + * @target_list:  list to add work to
> + *
> + * Adds the work to the specified list. Asserts that work
> + * is not already on a list.
> + *
> + * Requires the proc->inner_lock to be held.
> + */
> +static void
> +binder_enqueue_work_head_ilocked(struct binder_work *work,
> +                          struct list_head *target_list)
> +{
> +       BUG_ON(target_list == NULL);
> +       BUG_ON(work->entry.next && !list_empty(&work->entry));
> +       list_add(&work->entry, target_list);
> +}
> +
> +/**
> + * binder_enqueue_thread_work_head_ilocked() - Add an item to the head of thread work list
> + * @thread:       thread to queue work to
> + * @work:         struct binder_work to add to list
> + *
> + * Adds the work to the head of thread todo list, and enables processing
> + * of the todo queue.
> + *
> + * Requires the proc->inner_lock to be held.
> + */
> +static void
> +binder_enqueue_thread_work_head_ilocked(struct binder_thread *thread,
> +                                  struct binder_work *work)
> +{
> +       binder_enqueue_work_head_ilocked(work, &thread->todo);
> +       thread->process_todo = true;
> +}
> +
> +/**
> + * binder_enqueue_thread_work_head() - Add an item to the head of thread work list
> + * @thread:       thread to queue work to
> + * @work:         struct binder_work to add to list
> + *
> + * Adds the work to the head of thread todo list, and enables processing
> + * of the todo queue.
> + */
> +static void
> +binder_enqueue_thread_work_head(struct binder_thread *thread,
> +                          struct binder_work *work)
> +{
> +       binder_inner_proc_lock(thread->proc);
> +       binder_enqueue_thread_work_head_ilocked(thread, work);
> +       binder_inner_proc_unlock(thread->proc);
> +}
> +
>  static void
>  binder_dequeue_work_ilocked(struct binder_work *work)
>  {
> @@ -3287,11 +3341,11 @@ static void binder_transaction(struct binder_proc *proc,
>         BUG_ON(thread->return_error.cmd != BR_OK);
>         if (in_reply_to) {
>                 thread->return_error.cmd = BR_TRANSACTION_COMPLETE;
> -               binder_enqueue_thread_work(thread, &thread->return_error.work);
> +               binder_enqueue_thread_work_head(thread, &thread->return_error.work);
>                 binder_send_failed_reply(in_reply_to, return_error);
>         } else {
>                 thread->return_error.cmd = return_error;
> -               binder_enqueue_thread_work(thread, &thread->return_error.work);
> +               binder_enqueue_thread_work_head(thread, &thread->return_error.work);
>         }
>  }
>
> @@ -3929,6 +3983,7 @@ static int binder_thread_read(struct binder_proc *proc,
>                         ptr += sizeof(uint32_t);
>
>                         binder_stat_br(proc, thread, e->cmd);
> +                       goto done; /* RETURN_ERROR notifications can finish transactions */
>                 } break;
>                 case BINDER_WORK_TRANSACTION_COMPLETE: {
>                         binder_inner_proc_unlock(proc);
> --
> 2.14.1
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-05-03 17:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-02 14:30 [PATCH] ANDROID: binder: fix binder work return error is wrongly consumed gityuan
2018-05-02 15:55 ` 答复: " 袁辉辉
2018-05-03 17:51 ` Martijn Coenen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).