LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch] aio: fix buggy put_ioctx call in aio_complete - v2
@ 2006-12-30  1:34 Chen, Kenneth W
  2007-02-01 21:13 ` Jeff Moyer
  0 siblings, 1 reply; 2+ messages in thread
From: Chen, Kenneth W @ 2006-12-30  1:34 UTC (permalink / raw)
  To: 'Andrew Morton', 'Benjamin LaHaise', zach.brown
  Cc: linux-aio, linux-kernel

An AIO bug was reported that sleeping function is being called in softirq
context:

BUG: warning at kernel/mutex.c:132/__mutex_lock_common()
Call Trace:
     [<a000000100577b00>] __mutex_lock_slowpath+0x640/0x6c0
     [<a000000100577ba0>] mutex_lock+0x20/0x40
     [<a0000001000a25b0>] flush_workqueue+0xb0/0x1a0
     [<a00000010018c0c0>] __put_ioctx+0xc0/0x240
     [<a00000010018d470>] aio_complete+0x2f0/0x420
     [<a00000010019cc80>] finished_one_bio+0x200/0x2a0
     [<a00000010019d1c0>] dio_bio_complete+0x1c0/0x200
     [<a00000010019d260>] dio_bio_end_aio+0x60/0x80
     [<a00000010014acd0>] bio_endio+0x110/0x1c0
     [<a0000001002770e0>] __end_that_request_first+0x180/0xba0
     [<a000000100277b90>] end_that_request_chunk+0x30/0x60
     [<a0000002073c0c70>] scsi_end_request+0x50/0x300 [scsi_mod]
     [<a0000002073c1240>] scsi_io_completion+0x200/0x8a0 [scsi_mod]
     [<a0000002074729b0>] sd_rw_intr+0x330/0x860 [sd_mod]
     [<a0000002073b3ac0>] scsi_finish_command+0x100/0x1c0 [scsi_mod]
     [<a0000002073c2910>] scsi_softirq_done+0x230/0x300 [scsi_mod]
     [<a000000100277d20>] blk_done_softirq+0x160/0x1c0
     [<a000000100083e00>] __do_softirq+0x200/0x240
     [<a000000100083eb0>] do_softirq+0x70/0xc0

See report: http://marc.theaimsgroup.com/?l=linux-kernel&m=116599593200888&w=2

flush_workqueue() is not allowed to be called in the softirq context. 
However, aio_complete() called from I/O interrupt can potentially call
put_ioctx with last ref count on ioctx and triggers bug.  It is simply
incorrect to perform ioctx freeing from aio_complete.

The bug is trigger-able from a race between io_destroy() and aio_complete().
A possible scenario:

cpu0                               cpu1
io_destroy                         aio_complete
  wait_for_all_aios {                __aio_put_req
     ...                                 ctx->reqs_active--;
     if (!ctx->reqs_active)
        return;
  }
  ...
  put_ioctx(ioctx)

                                     put_ioctx(ctx);
                                        __put_ioctx
                                          bam! Bug trigger!

The real problem is that the condition check of ctx->reqs_active in
wait_for_all_aios() is incorrect that access to reqs_active is not
being properly protected by spin lock.

This patch adds that protective spin lock, and at the same time removes
all duplicate ref counting for each kiocb as reqs_active is already used
as a ref count for each active ioctx.  This also ensures that buggy call
to flush_workqueue() in softirq context is eliminated.


Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>

--- ./fs/aio.c.orig	2006-12-21 08:08:14.000000000 -0800
+++ ./fs/aio.c	2006-12-21 08:14:27.000000000 -0800
@@ -298,17 +298,23 @@ static void wait_for_all_aios(struct kio
 	struct task_struct *tsk = current;
 	DECLARE_WAITQUEUE(wait, tsk);
 
+	spin_lock_irq(&ctx->ctx_lock);
 	if (!ctx->reqs_active)
-		return;
+		goto out;
 
 	add_wait_queue(&ctx->wait, &wait);
 	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
 	while (ctx->reqs_active) {
+		spin_unlock_irq(&ctx->ctx_lock);
 		schedule();
 		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		spin_lock_irq(&ctx->ctx_lock);
 	}
 	__set_task_state(tsk, TASK_RUNNING);
 	remove_wait_queue(&ctx->wait, &wait);
+
+out:
+	spin_unlock_irq(&ctx->ctx_lock);
 }
 
 /* wait_on_sync_kiocb:
@@ -424,7 +430,6 @@ static struct kiocb fastcall *__aio_get_
 	ring = kmap_atomic(ctx->ring_info.ring_pages[0], KM_USER0);
 	if (ctx->reqs_active < aio_ring_avail(&ctx->ring_info, ring)) {
 		list_add(&req->ki_list, &ctx->active_reqs);
-		get_ioctx(ctx);
 		ctx->reqs_active++;
 		okay = 1;
 	}
@@ -536,8 +541,6 @@ int fastcall aio_put_req(struct kiocb *r
 	spin_lock_irq(&ctx->ctx_lock);
 	ret = __aio_put_req(ctx, req);
 	spin_unlock_irq(&ctx->ctx_lock);
-	if (ret)
-		put_ioctx(ctx);
 	return ret;
 }
 
@@ -782,8 +785,7 @@ static int __aio_run_iocbs(struct kioctx
 		 */
 		iocb->ki_users++;       /* grab extra reference */
 		aio_run_iocb(iocb);
-		if (__aio_put_req(ctx, iocb))  /* drop extra ref */
-			put_ioctx(ctx);
+		__aio_put_req(ctx, iocb);
  	}
 	if (!list_empty(&ctx->run_list))
 		return 1;
@@ -998,14 +1000,10 @@ put_rq:
 	/* everything turned out well, dispose of the aiocb. */
 	ret = __aio_put_req(ctx, iocb);
 
-	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
-
 	if (waitqueue_active(&ctx->wait))
 		wake_up(&ctx->wait);
 
-	if (ret)
-		put_ioctx(ctx);
-
+	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [patch] aio: fix buggy put_ioctx call in aio_complete - v2
  2006-12-30  1:34 [patch] aio: fix buggy put_ioctx call in aio_complete - v2 Chen, Kenneth W
@ 2007-02-01 21:13 ` Jeff Moyer
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff Moyer @ 2007-02-01 21:13 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: 'Andrew Morton', 'Benjamin LaHaise',
	zach.brown, linux-aio, linux-kernel

==> On Fri, 29 Dec 2006 17:34:26 -0800, "Chen, Kenneth W" <kenneth.w.chen@intel.com> said:

Kenneth> An AIO bug was reported that sleeping function is being called in softirq
Kenneth> context:

Kenneth> The real problem is that the condition check of ctx->reqs_active in
Kenneth> wait_for_all_aios() is incorrect that access to reqs_active is not
Kenneth> being properly protected by spin lock.

Kenneth> This patch adds that protective spin lock, and at the same time removes
Kenneth> all duplicate ref counting for each kiocb as reqs_active is already used
Kenneth> as a ref count for each active ioctx.  This also ensures that buggy call
Kenneth> to flush_workqueue() in softirq context is eliminated.

I was finally able to wrap my head around this, and testing has
confirmed that this patch fixes the problem.  So,

Acked-by: Jeff Moyer <jmoyer@redhat.com>

Better late than never.

-Jeff

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-02-01 21:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-30  1:34 [patch] aio: fix buggy put_ioctx call in aio_complete - v2 Chen, Kenneth W
2007-02-01 21:13 ` Jeff Moyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).