LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] mmc: core: fix race condition in mmc_wait_data_done
@ 2015-01-13  6:42 Jialing Fu
  2015-01-27  3:25 ` Ban
  0 siblings, 1 reply; 2+ messages in thread
From: Jialing Fu @ 2015-01-13  6:42 UTC (permalink / raw)
  To: Chris Ball, Ulf Hansson, linux-mmc, linux-kernel,
	Konstantin Dorfman, Seungwon Jeon
  Cc: Jialing Fu

The following panic is captured in ker3.14, but the issue still exists
in latest kernel.
---------------------------------------------------------------------
[   20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer dereference
at virtual address 00000578
......
[   20.738499] c0 3136 (Compiler) PC is at _raw_spin_lock_irqsave+0x24/0x60
[   20.738527] c0 3136 (Compiler) LR is at _raw_spin_lock_irqsave+0x20/0x60
[   20.740134] c0 3136 (Compiler) Call trace:
[   20.740165] c0 3136 (Compiler) [<ffffffc0008ee900>] _raw_spin_lock_irqsave+0x24/0x60
[   20.740200] c0 3136 (Compiler) [<ffffffc0000dd024>] __wake_up+0x1c/0x54
[   20.740230] c0 3136 (Compiler) [<ffffffc000639414>] mmc_wait_data_done+0x28/0x34
[   20.740262] c0 3136 (Compiler) [<ffffffc0006391a0>] mmc_request_done+0xa4/0x220
[   20.740314] c0 3136 (Compiler) [<ffffffc000656894>] sdhci_tasklet_finish+0xac/0x264
[   20.740352] c0 3136 (Compiler) [<ffffffc0000a2b58>] tasklet_action+0xa0/0x158
[   20.740382] c0 3136 (Compiler) [<ffffffc0000a2078>] __do_softirq+0x10c/0x2e4
[   20.740411] c0 3136 (Compiler) [<ffffffc0000a24bc>] irq_exit+0x8c/0xc0
[   20.740439] c0 3136 (Compiler) [<ffffffc00008489c>] handle_IRQ+0x48/0xac
[   20.740469] c0 3136 (Compiler) [<ffffffc000081428>] gic_handle_irq+0x38/0x7c
----------------------------------------------------------------------

Because in SMP, "mrq" has race condition between below two paths:
path1: CPU0: <tasklet context>
  static void mmc_wait_data_done(struct mmc_request *mrq)
  {
    mrq->host->context_info.is_done_rcv = true;
    //
    // If CPU0 has just finished "is_done_rcv = true" in path1, and at
    // this moment, IRQ or ICache line missing happens in CPU0.
    // What happens in CPU1 (path2)?
    //
    // If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode:
    // path2 would have chance to break from wait_event_interruptible
    // in mmc_wait_for_data_req_done and continue to run for next
    // mmc_request (mmc_blk_rw_rq_prep).
    //
    // Within mmc_blk_rq_prep, mrq is cleared to 0.
    // If below line still gets host from "mrq" as the result of
    // compiler, the panic happens as we traced.
    wake_up_interruptible(&mrq->host->context_info.wait);
  }

path2: CPU1: <The mmcqd thread runs mmc_queue_thread>
 static int mmc_wait_for_data_req_done(...
 {
    ...
    while (1) {
          wait_event_interruptible(context_info->wait,
                  (context_info->is_done_rcv ||
                   context_info->is_new_req));

  static void mmc_blk_rw_rq_prep(...
  {
    ...
    memset(brq, 0, sizeof(struct mmc_blk_request));

This issue happens very coincidentally; however adding mdelay(1) in
mmc_wait_data_done as below could duplicate it easily.
  static void mmc_wait_data_done(struct mmc_request *mrq)
  {
    mrq->host->context_info.is_done_rcv = true;
    mdelay(1);
    wake_up_interruptible(&mrq->host->context_info.wait);
  }
At runtime, IRQ or ICache line missing may just happen at the same place
of the mdelay(1).

This patch gets the mmc_context_info at the beginning of function, it can
avoid this race condition.

Signed-off-by: Jialing Fu <jlfu@marvell.com>
---
 drivers/mmc/core/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 9584bff..f08c9a8 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -326,8 +326,10 @@ EXPORT_SYMBOL(mmc_start_bkops);
  */
 static void mmc_wait_data_done(struct mmc_request *mrq)
 {
-	mrq->host->context_info.is_done_rcv = true;
-	wake_up_interruptible(&mrq->host->context_info.wait);
+	struct mmc_context_info *context_info = &mrq->host->context_info;
+
+	context_info->is_done_rcv = true;
+	wake_up_interruptible(&context_info->wait);
 }
 
 static void mmc_wait_done(struct mmc_request *mrq)
-- 
1.9.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] mmc: core: fix race condition in mmc_wait_data_done
  2015-01-13  6:42 [PATCH] mmc: core: fix race condition in mmc_wait_data_done Jialing Fu
@ 2015-01-27  3:25 ` Ban
  0 siblings, 0 replies; 2+ messages in thread
From: Ban @ 2015-01-27  3:25 UTC (permalink / raw)
  To: linux-kernel

Jialing Fu <jlfu <at> marvell.com> writes:

> 
> The following panic is captured in ker3.14, but the issue still exists
> in latest kernel.
> ---------------------------------------------------------------------
> [   20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer 
dereference
> at virtual address 00000578
> ......
> [   20.738499] c0 3136 (Compiler) PC is at 
_raw_spin_lock_irqsave+0x24/0x60
> [   20.738527] c0 3136 (Compiler) LR is at 
_raw_spin_lock_irqsave+0x20/0x60
> [   20.740134] c0 3136 (Compiler) Call trace:
> [   20.740165] c0 3136 (Compiler) [<ffffffc0008ee900>] 
_raw_spin_lock_irqsave+0x24/0x60
> [   20.740200] c0 3136 (Compiler) [<ffffffc0000dd024>] __wake_up+0x1c/0x54
> [   20.740230] c0 3136 (Compiler) [<ffffffc000639414>] 
mmc_wait_data_done+0x28/0x34
> [   20.740262] c0 3136 (Compiler) [<ffffffc0006391a0>] 
mmc_request_done+0xa4/0x220
> [   20.740314] c0 3136 (Compiler) [<ffffffc000656894>] 
sdhci_tasklet_finish+0xac/0x264
> [   20.740352] c0 3136 (Compiler) [<ffffffc0000a2b58>] 
tasklet_action+0xa0/0x158
> [   20.740382] c0 3136 (Compiler) [<ffffffc0000a2078>] 
__do_softirq+0x10c/0x2e4
> [   20.740411] c0 3136 (Compiler) [<ffffffc0000a24bc>] irq_exit+0x8c/0xc0
> [   20.740439] c0 3136 (Compiler) [<ffffffc00008489c>] 
handle_IRQ+0x48/0xac
> [   20.740469] c0 3136 (Compiler) [<ffffffc000081428>] 
gic_handle_irq+0x38/0x7c
> ----------------------------------------------------------------------
> 
> Because in SMP, "mrq" has race condition between below two paths:
> path1: CPU0: <tasklet context>
>   static void mmc_wait_data_done(struct mmc_request *mrq)
>   {
>     mrq->host->context_info.is_done_rcv = true;
>     //
>     // If CPU0 has just finished "is_done_rcv = true" in path1, and at
>     // this moment, IRQ or ICache line missing happens in CPU0.
>     // What happens in CPU1 (path2)?
>     //
>     // If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode:
>     // path2 would have chance to break from wait_event_interruptible
>     // in mmc_wait_for_data_req_done and continue to run for next
>     // mmc_request (mmc_blk_rw_rq_prep).
>     //
>     // Within mmc_blk_rq_prep, mrq is cleared to 0.
>     // If below line still gets host from "mrq" as the result of
>     // compiler, the panic happens as we traced.
>     wake_up_interruptible(&mrq->host->context_info.wait);
>   }
> 
> path2: CPU1: <The mmcqd thread runs mmc_queue_thread>
>  static int mmc_wait_for_data_req_done(...
>  {
>     ...
>     while (1) {
>           wait_event_interruptible(context_info->wait,
>                   (context_info->is_done_rcv ||
>                    context_info->is_new_req));
> 
>   static void mmc_blk_rw_rq_prep(...
>   {
>     ...
>     memset(brq, 0, sizeof(struct mmc_blk_request));
> 
> This issue happens very coincidentally; however adding mdelay(1) in
> mmc_wait_data_done as below could duplicate it easily.
>   static void mmc_wait_data_done(struct mmc_request *mrq)
>   {
>     mrq->host->context_info.is_done_rcv = true;
>     mdelay(1);
>     wake_up_interruptible(&mrq->host->context_info.wait);
>   }
> At runtime, IRQ or ICache line missing may just happen at the same place
> of the mdelay(1).
> 
> This patch gets the mmc_context_info at the beginning of function, it can
> avoid this race condition.
> 
> Signed-off-by: Jialing Fu <jlfu <at> marvell.com>
> ---
>  drivers/mmc/core/core.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 9584bff..f08c9a8 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
>  <at>  <at>  -326,8 +326,10  <at>  <at>  EXPORT_SYMBOL(mmc_start_bkops);
>   */
>  static void mmc_wait_data_done(struct mmc_request *mrq)
>  {
> -	mrq->host->context_info.is_done_rcv = true;
> -	wake_up_interruptible(&mrq->host->context_info.wait);
> +	struct mmc_context_info *context_info = &mrq->host->context_info;
> +
> +	context_info->is_done_rcv = true;
> +	wake_up_interruptible(&context_info->wait);
>  }
> 
>  static void mmc_wait_done(struct mmc_request *mrq)

Hi Jialing, 
I met the same issue at kernel v3.10, and after adding mdelay(1) to 
mmc_wait_data_done() function, it could duplicate at every boot-up stage.
[   15.157899]  [<ffffffff828e504a>] _raw_spin_lock_irqsave+0x2a/0x40
[   15.157925]  [<ffffffff820dfd23>] __wake_up+0x23/0x50
[   15.157951]  [<ffffffff8262c4ce>] mmc_wait_data_done+0x3e/0x50
[   15.157975]  [<ffffffff8262c5a6>] mmc_request_done+0xa6/0x250
[   15.157999]  [<ffffffff828e4f48>] ? _raw_spin_unlock_irqrestore+0x28/0x60
[   15.158027]  [<ffffffff82645fdb>] sdhci_tasklet_finish+0xeb/0x190
[   15.158054]  [<ffffffff820b7d8c>] tasklet_action+0x6c/0xe0
[   15.158077]  [<ffffffff820b7850>] __do_softirq+0x110/0x2d0
[   15.158103]  [<ffffffff828ec60c>] call_softirq+0x1c/0x30
[   15.158127]  [<ffffffff820041ed>] do_softirq+0x6d/0xa0
[   15.158150]  [<ffffffff820b7ba5>] irq_exit+0xb5/0xc0
[   15.158172]  [<ffffffff828eccc6>] do_IRQ+0x56/0xc0
[   15.158195]  [<ffffffff828e53ef>] common_interrupt+0x6f/0x6f

I have no idea whether this patch is work or not because of rarely hit ratio 
even without your solution. However, I'll keep monitor this subject if it is 
healthy to be merged.

Thanks


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-01-27  3:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-13  6:42 [PATCH] mmc: core: fix race condition in mmc_wait_data_done Jialing Fu
2015-01-27  3:25 ` Ban

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).