LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* blk_throtl_exit  taking q->queue_lock is problematic
@ 2011-02-16  7:31 NeilBrown
  2011-02-16 15:53 ` Vivek Goyal
  2011-02-17 20:00 ` Vivek Goyal
  0 siblings, 2 replies; 17+ messages in thread
From: NeilBrown @ 2011-02-16  7:31 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel



Hi,

 I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev
is finally released.

This is a problem for because by that time the queue_lock doesn't exist any
more.  It is in a separate data structure controlled by the RAID personality
and by the time that the block device is being destroyed the raid personality
has shutdown and the data structure containing the lock has been freed.

This has not been a problem before.  Nothing else takes queue_lock after
blk_cleanup_queue.

I could of course set queue_lock to point to __queue_lock and initialise that,
but it seems untidy and probably violates some locking requirements.

Is there some way you could use some other lock - maybe a global lock, or
maybe used __queue_lock directly ???

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit  taking q->queue_lock is problematic
  2011-02-16  7:31 blk_throtl_exit taking q->queue_lock is problematic NeilBrown
@ 2011-02-16 15:53 ` Vivek Goyal
  2011-02-17  0:35   ` NeilBrown
  2011-02-17 20:00 ` Vivek Goyal
  1 sibling, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-02-16 15:53 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-kernel

On Wed, Feb 16, 2011 at 06:31:14PM +1100, NeilBrown wrote:
> 
> 
> Hi,
> 
>  I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev
> is finally released.
> 
> This is a problem for because by that time the queue_lock doesn't exist any
> more.  It is in a separate data structure controlled by the RAID personality
> and by the time that the block device is being destroyed the raid personality
> has shutdown and the data structure containing the lock has been freed.
> 
> This has not been a problem before.  Nothing else takes queue_lock after
> blk_cleanup_queue.

I agree that this is a problem. blk_throtl_exit() needs queue lock to
avoid other races with cgroup code and for avoiding races for its
lists etc.

> 
> I could of course set queue_lock to point to __queue_lock and initialise that,
> but it seems untidy and probably violates some locking requirements.
> 
> Is there some way you could use some other lock - maybe a global lock, or
> maybe used __queue_lock directly ???

Initially I had put blk_throtl_exit() in blk_cleanup_queue() where it is
known that ->queue_lock is still around. Due to a bug, Jens moved it
to blk_release_queue(). I still think that blk_cleanup_queue() is a better
place to call blk_throtl_exit().

I think following patch should solve the issue. This patch is also not
completely race free. I was thinking that can we get rid of
throtl_shutdown_timer_wq() call in blk_sync_queue(). IOW, in what
circumstances blk_sync_queue() is used.

Thanks
Vivek


o Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is
  written in such a way that it needs queue lock. In blk_release_queue()
  there is no gurantee that ->queue_lock is still around.

o Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported
  one problem.

  https://lkml.org/lkml/2010/10/23/86

  And a quick fix moved blk_throtl_exit() to blk_release_queue().

	commit 7ad58c028652753814054f4e3ac58f925e7343f4
	Author: Jens Axboe <jaxboe@fusionio.com>
	Date:   Sat Oct 23 20:40:26 2010 +0200

    	block: fix use-after-free bug in blk throttle code

o This patch reverts above change and instead checks for q->td in
  throtl_shutdown_timer_wq(). 

o This is also not completely race free as check for q->td is without
  spinlock and we can't take spinlock here as it is called from
  blk_release_queue->blk_sync_queue() where ->queue_lock might have gone
  away.

o So the question is should we really call throtl_shutdown_timer_wq() from
  blk_sync_queue(). It might not make much sense because there might
  be queued bios in throttling logic. The only way to cleanup all bios
  and cancel all async activity is blk_throtl_exit(). 

  I also don't see it being called to cancel async activity for CFQ. Who
  makes sure that async activity is cancelled. IOW, I am wondering in
  what circumstances blk_sync_queue() is called and is it required to
  call throtl_shutdown_timer_wq() from blk_sync_queue(). If we can get
  rid of it, then we have taken care of all the races, AFAIK.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-core.c     |    2 ++
 block/blk-sysfs.c    |    2 --
 block/blk-throttle.c |    6 ++++++
 3 files changed, 8 insertions(+), 2 deletions(-)

Index: linux-2.6/block/blk-core.c
===================================================================
--- linux-2.6.orig/block/blk-core.c	2011-02-14 17:43:06.000000000 -0500
+++ linux-2.6/block/blk-core.c	2011-02-16 10:11:58.910022185 -0500
@@ -474,6 +474,8 @@ void blk_cleanup_queue(struct request_qu
 	if (q->elevator)
 		elevator_exit(q->elevator);
 
+	blk_throtl_exit(q);
+
 	blk_put_queue(q);
 }
 EXPORT_SYMBOL(blk_cleanup_queue);
Index: linux-2.6/block/blk-sysfs.c
===================================================================
--- linux-2.6.orig/block/blk-sysfs.c	2011-02-11 09:25:16.000000000 -0500
+++ linux-2.6/block/blk-sysfs.c	2011-02-16 10:12:16.379762988 -0500
@@ -471,8 +471,6 @@ static void blk_release_queue(struct kob
 
 	blk_sync_queue(q);
 
-	blk_throtl_exit(q);
-
 	if (rl->rq_pool)
 		mempool_destroy(rl->rq_pool);
 
Index: linux-2.6/block/blk-throttle.c
===================================================================
--- linux-2.6.orig/block/blk-throttle.c	2011-02-16 10:08:12.000000000 -0500
+++ linux-2.6/block/blk-throttle.c	2011-02-16 10:45:18.006119406 -0500
@@ -961,6 +961,9 @@ void throtl_shutdown_timer_wq(struct req
 {
 	struct throtl_data *td = q->td;
 
+	if (!td)
+		return;
+
 	cancel_delayed_work_sync(&td->throtl_work);
 }
 
@@ -1122,6 +1125,9 @@ void blk_throtl_exit(struct request_queu
 	 * it.
 	 */
 	throtl_shutdown_timer_wq(q);
+
+	/* Decouple throtl data from queue. */
+	q->td = NULL;
 	throtl_td_free(td);
 }
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit  taking q->queue_lock is problematic
  2011-02-16 15:53 ` Vivek Goyal
@ 2011-02-17  0:35   ` NeilBrown
  2011-02-17  1:10     ` Vivek Goyal
  0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2011-02-17  0:35 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel

On Wed, 16 Feb 2011 10:53:05 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:

> On Wed, Feb 16, 2011 at 06:31:14PM +1100, NeilBrown wrote:
> > 
> > 
> > Hi,
> > 
> >  I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev
> > is finally released.
> > 
> > This is a problem for because by that time the queue_lock doesn't exist any
> > more.  It is in a separate data structure controlled by the RAID personality
> > and by the time that the block device is being destroyed the raid personality
> > has shutdown and the data structure containing the lock has been freed.
> > 
> > This has not been a problem before.  Nothing else takes queue_lock after
> > blk_cleanup_queue.
> 
> I agree that this is a problem. blk_throtl_exit() needs queue lock to
> avoid other races with cgroup code and for avoiding races for its
> lists etc.
> 
> > 
> > I could of course set queue_lock to point to __queue_lock and initialise that,
> > but it seems untidy and probably violates some locking requirements.
> > 
> > Is there some way you could use some other lock - maybe a global lock, or
> > maybe used __queue_lock directly ???
> 
> Initially I had put blk_throtl_exit() in blk_cleanup_queue() where it is
> known that ->queue_lock is still around. Due to a bug, Jens moved it
> to blk_release_queue(). I still think that blk_cleanup_queue() is a better
> place to call blk_throtl_exit().

Why do you say that it is known that ->queue_lock is still around in
blk_cleanup_queue?  In md it isn't. :-(
Is there some (other) reason that it needs to be?

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit  taking q->queue_lock is problematic
  2011-02-17  0:35   ` NeilBrown
@ 2011-02-17  1:10     ` Vivek Goyal
  2011-02-17  5:55       ` NeilBrown
  0 siblings, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-02-17  1:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-kernel

On Thu, Feb 17, 2011 at 11:35:36AM +1100, NeilBrown wrote:
> On Wed, 16 Feb 2011 10:53:05 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > On Wed, Feb 16, 2011 at 06:31:14PM +1100, NeilBrown wrote:
> > > 
> > > 
> > > Hi,
> > > 
> > >  I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev
> > > is finally released.
> > > 
> > > This is a problem for because by that time the queue_lock doesn't exist any
> > > more.  It is in a separate data structure controlled by the RAID personality
> > > and by the time that the block device is being destroyed the raid personality
> > > has shutdown and the data structure containing the lock has been freed.
> > > 
> > > This has not been a problem before.  Nothing else takes queue_lock after
> > > blk_cleanup_queue.
> > 
> > I agree that this is a problem. blk_throtl_exit() needs queue lock to
> > avoid other races with cgroup code and for avoiding races for its
> > lists etc.
> > 
> > > 
> > > I could of course set queue_lock to point to __queue_lock and initialise that,
> > > but it seems untidy and probably violates some locking requirements.
> > > 
> > > Is there some way you could use some other lock - maybe a global lock, or
> > > maybe used __queue_lock directly ???
> > 
> > Initially I had put blk_throtl_exit() in blk_cleanup_queue() where it is
> > known that ->queue_lock is still around. Due to a bug, Jens moved it
> > to blk_release_queue(). I still think that blk_cleanup_queue() is a better
> > place to call blk_throtl_exit().
> 
> Why do you say that it is known that ->queue_lock is still around in
> blk_cleanup_queue?  In md it isn't. :-(
> Is there some (other) reason that it needs to be?

I think this is only true for devices having an elevator because
elevator_exit() will call cfq_exit_queue() and take queue lock. So request
based multipath devices should have it initialized till now.

But yes, for devices not running an elevator, there does not seem to be
any other component requiring queue lock to be still there.

Like elevator, throttling logic has data structures which need to be
cleaned up if driver decides to cleanup the queue. What is that point
till when we can use the queue->lock safely? If driver is providing
the spinlock embedded in a structue, then it would make logical sense
to call back the queue at some point of time and say that spinlock
is going away and cleanup any dependencies. I thought blk_cleanup_queue()
will be that call but looks like it is not true for all cases.

So is it possible to keep the spinlock intact when md is calling up
blk_cleanup_queue()?

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit  taking q->queue_lock is problematic
  2011-02-17  1:10     ` Vivek Goyal
@ 2011-02-17  5:55       ` NeilBrown
  2011-02-17 15:01         ` Vivek Goyal
  2011-02-17 16:59         ` Vivek Goyal
  0 siblings, 2 replies; 17+ messages in thread
From: NeilBrown @ 2011-02-17  5:55 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel

On Wed, 16 Feb 2011 20:10:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:

> So is it possible to keep the spinlock intact when md is calling up
> blk_cleanup_queue()?
> 

It would be possible, yes - but messy.  I would probably end up just making
->queue_lock always point to __queue_lock, and then only take it at the
places where I call 'block' code which wants to test to see if it is
currently held (like the plugging routines).

The queue lock (and most of the request queue) is simply irrelevant for md.
I would prefer to get away from having to touch it at all...

I'll see how messy it would be to stop using it completely and it can just be
__queue_lock.

Though for me - it would be much easier if you just used __queue_lock .....


NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit  taking q->queue_lock is problematic
  2011-02-17  5:55       ` NeilBrown
@ 2011-02-17 15:01         ` Vivek Goyal
  2011-02-17 16:59         ` Vivek Goyal
  1 sibling, 0 replies; 17+ messages in thread
From: Vivek Goyal @ 2011-02-17 15:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-kernel

On Thu, Feb 17, 2011 at 04:55:01PM +1100, NeilBrown wrote:
> On Wed, 16 Feb 2011 20:10:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > So is it possible to keep the spinlock intact when md is calling up
> > blk_cleanup_queue()?
> > 
> 
> It would be possible, yes - but messy.  I would probably end up just making
> ->queue_lock always point to __queue_lock, and then only take it at the
> places where I call 'block' code which wants to test to see if it is
> currently held (like the plugging routines).
> 
> The queue lock (and most of the request queue) is simply irrelevant for md.
> I would prefer to get away from having to touch it at all...
> 
> I'll see how messy it would be to stop using it completely and it can just be
> __queue_lock.
> 
> Though for me - it would be much easier if you just used __queue_lock .....

Ok, Thinking more about it, How about introducing another spin lock in
request queue, say q->throtl_lock. I seem to be using queue lock for
synchronization between blk-cgroup and throttling code and also for
integrity of throttling data structures when multiple threads are
operating. Looks like I might be able to get away using throtl_lock and
not rely on queue_lock at all.

Want to avoid using __queue_lock for two reasons.

- It is not clear when we are actually sharing the lock with the driver
  and when we are not. In future if there are interaction with other
  code on request queue, then it will be confusing. If we introduce
  throtl_lock, then for additional queue synchronization, one can take
  q->queue_lock also.

- Splitting the queue lock should probably be good as it will reduce the
  contention on q->queue_lock. If it is benefecial, then we should not
  use __queue_lock to actually reduce the contention. 

I will do this change and see whether it works or not.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit  taking q->queue_lock is problematic
  2011-02-17  5:55       ` NeilBrown
  2011-02-17 15:01         ` Vivek Goyal
@ 2011-02-17 16:59         ` Vivek Goyal
  2011-02-18  2:40           ` NeilBrown
  1 sibling, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-02-17 16:59 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-kernel

On Thu, Feb 17, 2011 at 04:55:01PM +1100, NeilBrown wrote:
> On Wed, 16 Feb 2011 20:10:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > So is it possible to keep the spinlock intact when md is calling up
> > blk_cleanup_queue()?
> > 
> 
> It would be possible, yes - but messy.  I would probably end up just making
> ->queue_lock always point to __queue_lock, and then only take it at the
> places where I call 'block' code which wants to test to see if it is
> currently held (like the plugging routines).
> 
> The queue lock (and most of the request queue) is simply irrelevant for md.
> I would prefer to get away from having to touch it at all...

If queue lock is mostly irrelevant for md, then why md should provide an
external lock and not use queue's internal spin lock?

> 
> I'll see how messy it would be to stop using it completely and it can just be
> __queue_lock.
> 
> Though for me - it would be much easier if you just used __queue_lock .....

Ok, here is the simple patch which splits the queue lock and uses
throtl_lock for throttling logic. I booted and it seems to be working.

Having said that, this now introduces the possibility of races for any
services I take from request queue. I need to see if I need to take
queue lock and that makes me little uncomfortable. 

I am using kblockd_workqueue to queue throtl work. Looks like I don't
need queue lock for that. I am also using block tracing infrastructure
and my understanding is that I don't need queue lock for that as well.

So if we do this change for performance reasons, it still makes sense
but doing this change because md provided a q->queue_lock and took away that
lock without notifying block layer hence we do this change, is still not
the right reason, IMHO.

Thanks
Vivek

Yet-to-be-Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-core.c       |    1 +
 block/blk-throttle.c   |   16 ++++++++--------
 include/linux/blkdev.h |    3 +++
 3 files changed, 12 insertions(+), 8 deletions(-)

Index: linux-2.6/include/linux/blkdev.h
===================================================================
--- linux-2.6.orig/include/linux/blkdev.h	2011-02-14 17:43:07.000000000 -0500
+++ linux-2.6/include/linux/blkdev.h	2011-02-17 10:08:35.400922999 -0500
@@ -320,6 +320,9 @@ struct request_queue
 	spinlock_t		__queue_lock;
 	spinlock_t		*queue_lock;
 
+	/* Throttling lock. Protectects throttling data structrues on queue */
+	spinlock_t		throtl_lock;
+
 	/*
 	 * queue kobject
 	 */
Index: linux-2.6/block/blk-core.c
===================================================================
--- linux-2.6.orig/block/blk-core.c	2011-02-16 13:07:30.000000000 -0500
+++ linux-2.6/block/blk-core.c	2011-02-17 10:09:44.236884133 -0500
@@ -547,6 +547,7 @@ struct request_queue *blk_alloc_queue_no
 
 	mutex_init(&q->sysfs_lock);
 	spin_lock_init(&q->__queue_lock);
+	spin_lock_init(&q->throtl_lock);
 
 	return q;
 }
Index: linux-2.6/block/blk-throttle.c
===================================================================
--- linux-2.6.orig/block/blk-throttle.c	2011-02-17 10:01:47.000000000 -0500
+++ linux-2.6/block/blk-throttle.c	2011-02-17 10:12:51.949128895 -0500
@@ -763,7 +763,7 @@ static int throtl_dispatch(struct reques
 	struct bio_list bio_list_on_stack;
 	struct bio *bio;
 
-	spin_lock_irq(q->queue_lock);
+	spin_lock_irq(&q->throtl_lock);
 
 	throtl_process_limit_change(td);
 
@@ -783,7 +783,7 @@ static int throtl_dispatch(struct reques
 
 	throtl_schedule_next_dispatch(td);
 out:
-	spin_unlock_irq(q->queue_lock);
+	spin_unlock_irq(&q->throtl_lock);
 
 	/*
 	 * If we dispatched some requests, unplug the queue to make sure
@@ -882,9 +882,9 @@ void throtl_unlink_blkio_group(void *key
 	unsigned long flags;
 	struct throtl_data *td = key;
 
-	spin_lock_irqsave(td->queue->queue_lock, flags);
+	spin_lock_irqsave(&td->queue->throtl_lock, flags);
 	throtl_destroy_tg(td, tg_of_blkg(blkg));
-	spin_unlock_irqrestore(td->queue->queue_lock, flags);
+	spin_unlock_irqrestore(&td->queue->throtl_lock, flags);
 }
 
 /*
@@ -991,7 +991,7 @@ int blk_throtl_bio(struct request_queue 
 		return 0;
 	}
 
-	spin_lock_irq(q->queue_lock);
+	spin_lock_irq(&q->throtl_lock);
 	tg = throtl_get_tg(td);
 
 	if (tg->nr_queued[rw]) {
@@ -1032,7 +1032,7 @@ queue_bio:
 	}
 
 out:
-	spin_unlock_irq(q->queue_lock);
+	spin_unlock_irq(&q->throtl_lock);
 	return 0;
 }
 
@@ -1093,14 +1093,14 @@ void blk_throtl_exit(struct request_queu
 
 	throtl_shutdown_timer_wq(q);
 
-	spin_lock_irq(q->queue_lock);
+	spin_lock_irq(&q->throtl_lock);
 	throtl_release_tgs(td);
 
 	/* If there are other groups */
 	if (td->nr_undestroyed_grps > 0)
 		wait = true;
 
-	spin_unlock_irq(q->queue_lock);
+	spin_unlock_irq(&q->throtl_lock);
 
 	/*
 	 * Wait for tg->blkg->key accessors to exit their grace periods.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit  taking q->queue_lock is problematic
  2011-02-16  7:31 blk_throtl_exit taking q->queue_lock is problematic NeilBrown
  2011-02-16 15:53 ` Vivek Goyal
@ 2011-02-17 20:00 ` Vivek Goyal
  2011-02-18  1:57   ` NeilBrown
  1 sibling, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-02-17 20:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-kernel

On Wed, Feb 16, 2011 at 06:31:14PM +1100, NeilBrown wrote:
> 
> 
> Hi,
> 
>  I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev
> is finally released.
> 
> This is a problem for because by that time the queue_lock doesn't exist any
> more.  It is in a separate data structure controlled by the RAID personality
> and by the time that the block device is being destroyed the raid personality
> has shutdown and the data structure containing the lock has been freed.

Hi Neil,

I am having a look at queue allocation in md and had few queries.

I was looking at md_alloc(), where we do

mddev->queue = blk_alloc_queue(GFP_KERNEL);
blk_queue_make_request(mddev->queue, md_make_request);

call to blk_queue_make_request() will make sure queue_lock is initiliazed
to internal __queue_lock.

Then I looked at raid0_run(), which is again setting the queue lock.

mddev->queue->queue_lock = &mddev->queue->__queue_lock;

I think this is redundant now as during md_alloc() we already did it.

Similar seems to be the case for linear.c and multipath.c

Following seem to be the cases where we overide the default lock.

raid1.c, raid5.c, raid10.c

I was going through the raid1.c, and I see that q->queue_lock has
been initialized to &conf->deivce_lock. Can we do the reverse. Introduce
spinlock pointer in conf and point it at queue->queue_lock? Anyway you
mentioned that personality's data structure are freed before request
queue is cleaned up, so there should not be any lifetime issues.

Also  I was wondering how does it help sharing the lock between request
queue and some other data structures in driver.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit  taking q->queue_lock is problematic
  2011-02-17 20:00 ` Vivek Goyal
@ 2011-02-18  1:57   ` NeilBrown
  0 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-02-18  1:57 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel

On Thu, 17 Feb 2011 15:00:11 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:

> On Wed, Feb 16, 2011 at 06:31:14PM +1100, NeilBrown wrote:
> > 
> > 
> > Hi,
> > 
> >  I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev
> > is finally released.
> > 
> > This is a problem for because by that time the queue_lock doesn't exist any
> > more.  It is in a separate data structure controlled by the RAID personality
> > and by the time that the block device is being destroyed the raid personality
> > has shutdown and the data structure containing the lock has been freed.
> 
> Hi Neil,
> 
> I am having a look at queue allocation in md and had few queries.
> 
> I was looking at md_alloc(), where we do
> 
> mddev->queue = blk_alloc_queue(GFP_KERNEL);
> blk_queue_make_request(mddev->queue, md_make_request);
> 
> call to blk_queue_make_request() will make sure queue_lock is initiliazed
> to internal __queue_lock.

That is a relatively recent change. commit a4e7d46407d in July 2009 by the
look of it.

> 
> Then I looked at raid0_run(), which is again setting the queue lock.
> 
> mddev->queue->queue_lock = &mddev->queue->__queue_lock;
> 
> I think this is redundant now as during md_alloc() we already did it.

Yep, it is now redundant.

> 
> Similar seems to be the case for linear.c and multipath.c
> 
> Following seem to be the cases where we overide the default lock.
> 
> raid1.c, raid5.c, raid10.c
> 
> I was going through the raid1.c, and I see that q->queue_lock has
> been initialized to &conf->deivce_lock. Can we do the reverse. Introduce
> spinlock pointer in conf and point it at queue->queue_lock? Anyway you
> mentioned that personality's data structure are freed before request
> queue is cleaned up, so there should not be any lifetime issues.

The reason it is this way is largely historical.  I don't recall the details
exactly but md data structures were always quite independent of request_queue.

The only reason I set ->queue_lock at all is because some blk functions
started checking that it was held ... something to do with plugging and
stacking limits I think.  See commit e7e72bf641b1 (may 2008) for more details.


> 
> Also  I was wondering how does it help sharing the lock between request
> queue and some other data structures in driver.

All it does is silence some warnings.

NeilBrown

> 
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit  taking q->queue_lock is problematic
  2011-02-17 16:59         ` Vivek Goyal
@ 2011-02-18  2:40           ` NeilBrown
  2011-02-18  3:19             ` Mike Snitzer
  2011-02-18 15:05             ` Vivek Goyal
  0 siblings, 2 replies; 17+ messages in thread
From: NeilBrown @ 2011-02-18  2:40 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel

On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:

> On Thu, Feb 17, 2011 at 04:55:01PM +1100, NeilBrown wrote:
> > On Wed, 16 Feb 2011 20:10:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > So is it possible to keep the spinlock intact when md is calling up
> > > blk_cleanup_queue()?
> > > 
> > 
> > It would be possible, yes - but messy.  I would probably end up just making
> > ->queue_lock always point to __queue_lock, and then only take it at the
> > places where I call 'block' code which wants to test to see if it is
> > currently held (like the plugging routines).
> > 
> > The queue lock (and most of the request queue) is simply irrelevant for md.
> > I would prefer to get away from having to touch it at all...
> 
> If queue lock is mostly irrelevant for md, then why md should provide an
> external lock and not use queue's internal spin lock?

See other email - historical reasons mostly.

> 
> > 
> > I'll see how messy it would be to stop using it completely and it can just be
> > __queue_lock.
> > 
> > Though for me - it would be much easier if you just used __queue_lock .....
> 
> Ok, here is the simple patch which splits the queue lock and uses
> throtl_lock for throttling logic. I booted and it seems to be working.
> 
> Having said that, this now introduces the possibility of races for any
> services I take from request queue. I need to see if I need to take
> queue lock and that makes me little uncomfortable. 
> 
> I am using kblockd_workqueue to queue throtl work. Looks like I don't
> need queue lock for that. I am also using block tracing infrastructure
> and my understanding is that I don't need queue lock for that as well.
> 
> So if we do this change for performance reasons, it still makes sense
> but doing this change because md provided a q->queue_lock and took away that
> lock without notifying block layer hence we do this change, is still not
> the right reason, IMHO.

Well...I like that patch, as it makes my life easier....

But I agree that md is doing something wrong.  Now that ->queue_lock is
always initialised, it is wrong to leave it in a state where it not defined.

So maybe I'll apply this (after testing it a bit.  The only reason for taking
the lock queue_lock in a couple of places is to silence some warnings.

Thanks,
NeilBrown


diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 8a2f767..0ed7f6b 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -216,7 +216,6 @@ static int linear_run (mddev_t *mddev)
 
 	if (md_check_no_bitmap(mddev))
 		return -EINVAL;
-	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
 	conf = linear_conf(mddev, mddev->raid_disks);
 
 	if (!conf)
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 6d7ddf3..3a62d44 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -435,7 +435,6 @@ static int multipath_run (mddev_t *mddev)
 	 * bookkeeping area. [whatever we allocate in multipath_run(),
 	 * should be freed in multipath_stop()]
 	 */
-	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
 
 	conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL);
 	mddev->private = conf;
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 75671df..c0ac457 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -361,7 +361,6 @@ static int raid0_run(mddev_t *mddev)
 	if (md_check_no_bitmap(mddev))
 		return -EINVAL;
 	blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
-	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
 
 	/* if private is not null, we are here after takeover */
 	if (mddev->private == NULL) {
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a23ffa3..909282d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -593,7 +593,9 @@ static int flush_pending_writes(conf_t *conf)
 	if (conf->pending_bio_list.head) {
 		struct bio *bio;
 		bio = bio_list_get(&conf->pending_bio_list);
+		spin_lock(conf->mddev->queue->queue_lock);
 		blk_remove_plug(conf->mddev->queue);
+		spin_unlock(conf->mddev->queue->queue_lock);
 		spin_unlock_irq(&conf->device_lock);
 		/* flush any pending bitmap writes to
 		 * disk before proceeding w/ I/O */
@@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio)
 		atomic_inc(&r1_bio->remaining);
 		spin_lock_irqsave(&conf->device_lock, flags);
 		bio_list_add(&conf->pending_bio_list, mbio);
+		spin_lock(mddev->queue->queue_lock);
 		blk_plug_device(mddev->queue);
+		spin_unlock(mddev->queue->queue_lock);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
 	}
 	r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL);
@@ -2021,7 +2025,6 @@ static int run(mddev_t *mddev)
 	if (IS_ERR(conf))
 		return PTR_ERR(conf);
 
-	mddev->queue->queue_lock = &conf->device_lock;
 	list_for_each_entry(rdev, &mddev->disks, same_set) {
 		disk_stack_limits(mddev->gendisk, rdev->bdev,
 				  rdev->data_offset << 9);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3b607b2..60e6cb1 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -662,7 +662,9 @@ static int flush_pending_writes(conf_t *conf)
 	if (conf->pending_bio_list.head) {
 		struct bio *bio;
 		bio = bio_list_get(&conf->pending_bio_list);
+		spin_lock(conf->mddev->queue->queue_lock);
 		blk_remove_plug(conf->mddev->queue);
+		spin_unlock(conf->mddev->queue->queue_lock);
 		spin_unlock_irq(&conf->device_lock);
 		/* flush any pending bitmap writes to disk
 		 * before proceeding w/ I/O */
@@ -970,8 +972,10 @@ static int make_request(mddev_t *mddev, struct bio * bio)
 
 		atomic_inc(&r10_bio->remaining);
 		spin_lock_irqsave(&conf->device_lock, flags);
+		spin_lock(mddev->queue->queue_lock);
 		bio_list_add(&conf->pending_bio_list, mbio);
 		blk_plug_device(mddev->queue);
+		spin_unlock(mddev->queue->queue_lock);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
 	}
 
@@ -2304,8 +2308,6 @@ static int run(mddev_t *mddev)
 	if (!conf)
 		goto out;
 
-	mddev->queue->queue_lock = &conf->device_lock;
-
 	mddev->thread = conf->thread;
 	conf->thread = NULL;
 
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7028128..78536fd 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5204,7 +5204,6 @@ static int run(mddev_t *mddev)
 
 		mddev->queue->backing_dev_info.congested_data = mddev;
 		mddev->queue->backing_dev_info.congested_fn = raid5_congested;
-		mddev->queue->queue_lock = &conf->device_lock;
 		mddev->queue->unplug_fn = raid5_unplug_queue;
 
 		chunk_size = mddev->chunk_sectors << 9;


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit taking q->queue_lock is problematic
  2011-02-18  2:40           ` NeilBrown
@ 2011-02-18  3:19             ` Mike Snitzer
  2011-02-18  3:33               ` NeilBrown
  2011-02-18 15:05             ` Vivek Goyal
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2011-02-18  3:19 UTC (permalink / raw)
  To: NeilBrown; +Cc: Vivek Goyal, Jens Axboe, linux-kernel

On Thu, Feb 17, 2011 at 9:40 PM, NeilBrown <neilb@suse.de> wrote:
> On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
>> So if we do this change for performance reasons, it still makes sense
>> but doing this change because md provided a q->queue_lock and took away that
>> lock without notifying block layer hence we do this change, is still not
>> the right reason, IMHO.
>
> Well...I like that patch, as it makes my life easier....
>
> But I agree that md is doing something wrong.  Now that ->queue_lock is
> always initialised, it is wrong to leave it in a state where it not defined.
>
> So maybe I'll apply this (after testing it a bit.  The only reason for taking
> the lock queue_lock in a couple of places is to silence some warnings.
>
> Thanks,
> NeilBrown
>
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index a23ffa3..909282d 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio)
>                atomic_inc(&r1_bio->remaining);
>                spin_lock_irqsave(&conf->device_lock, flags);
>                bio_list_add(&conf->pending_bio_list, mbio);
> +               spin_lock(mddev->queue->queue_lock);
>                blk_plug_device(mddev->queue);
> +               spin_unlock(mddev->queue->queue_lock);
>                spin_unlock_irqrestore(&conf->device_lock, flags);
>        }
>        r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL);

Noticed an inconsistency, raid10.c's additional locking also protects
the bio_list_add() whereas raid1.c's doesn't.  Seems the additional
protection in raid10 isn't needed?

> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 3b607b2..60e6cb1 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -970,8 +972,10 @@ static int make_request(mddev_t *mddev, struct bio * bio)
>
>                atomic_inc(&r10_bio->remaining);
>                spin_lock_irqsave(&conf->device_lock, flags);
> +               spin_lock(mddev->queue->queue_lock);
>                bio_list_add(&conf->pending_bio_list, mbio);
>                blk_plug_device(mddev->queue);
> +               spin_unlock(mddev->queue->queue_lock);
>                spin_unlock_irqrestore(&conf->device_lock, flags);
>        }
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit taking q->queue_lock is problematic
  2011-02-18  3:19             ` Mike Snitzer
@ 2011-02-18  3:33               ` NeilBrown
  2011-02-18 14:04                 ` Mike Snitzer
  2011-02-18 15:04                 ` Vivek Goyal
  0 siblings, 2 replies; 17+ messages in thread
From: NeilBrown @ 2011-02-18  3:33 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Vivek Goyal, Jens Axboe, linux-kernel

On Thu, 17 Feb 2011 22:19:52 -0500 Mike Snitzer <snitzer@redhat.com> wrote:

> On Thu, Feb 17, 2011 at 9:40 PM, NeilBrown <neilb@suse.de> wrote:
> > On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> >> So if we do this change for performance reasons, it still makes sense
> >> but doing this change because md provided a q->queue_lock and took away that
> >> lock without notifying block layer hence we do this change, is still not
> >> the right reason, IMHO.
> >
> > Well...I like that patch, as it makes my life easier....
> >
> > But I agree that md is doing something wrong.  Now that ->queue_lock is
> > always initialised, it is wrong to leave it in a state where it not defined.
> >
> > So maybe I'll apply this (after testing it a bit.  The only reason for taking
> > the lock queue_lock in a couple of places is to silence some warnings.
> >
> > Thanks,
> > NeilBrown
> >
> >
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index a23ffa3..909282d 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio)
> >                atomic_inc(&r1_bio->remaining);
> >                spin_lock_irqsave(&conf->device_lock, flags);
> >                bio_list_add(&conf->pending_bio_list, mbio);
> > +               spin_lock(mddev->queue->queue_lock);
> >                blk_plug_device(mddev->queue);
> > +               spin_unlock(mddev->queue->queue_lock);
> >                spin_unlock_irqrestore(&conf->device_lock, flags);
> >        }
> >        r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL);
> 
> Noticed an inconsistency, raid10.c's additional locking also protects
> the bio_list_add() whereas raid1.c's doesn't.  Seems the additional
> protection in raid10 isn't needed?

Correct - not needed at all.
I put it there because it felt a little cleaner keeping the two 'lock's
together like the two 'unlock's.  Probably confusing though...

My other though is to stop using the block-layer plugging altogether like I
have in RAID5 (Which I needed to do to make it work with DM).  Then I
wouldn't need to touch queue_lock at all - very tempting.


Thanks for the review.

NeilBrown


> 
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index 3b607b2..60e6cb1 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -970,8 +972,10 @@ static int make_request(mddev_t *mddev, struct bio * bio)
> >
> >                atomic_inc(&r10_bio->remaining);
> >                spin_lock_irqsave(&conf->device_lock, flags);
> > +               spin_lock(mddev->queue->queue_lock);
> >                bio_list_add(&conf->pending_bio_list, mbio);
> >                blk_plug_device(mddev->queue);
> > +               spin_unlock(mddev->queue->queue_lock);
> >                spin_unlock_irqrestore(&conf->device_lock, flags);
> >        }
> >


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit taking q->queue_lock is problematic
  2011-02-18  3:33               ` NeilBrown
@ 2011-02-18 14:04                 ` Mike Snitzer
  2011-02-18 15:04                 ` Vivek Goyal
  1 sibling, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2011-02-18 14:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: Vivek Goyal, Jens Axboe, linux-kernel

On Thu, Feb 17 2011 at 10:33pm -0500,
NeilBrown <neilb@suse.de> wrote:

> On Thu, 17 Feb 2011 22:19:52 -0500 Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Thu, Feb 17, 2011 at 9:40 PM, NeilBrown <neilb@suse.de> wrote:
> > > On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> > >> So if we do this change for performance reasons, it still makes sense
> > >> but doing this change because md provided a q->queue_lock and took away that
> > >> lock without notifying block layer hence we do this change, is still not
> > >> the right reason, IMHO.
> > >
> > > Well...I like that patch, as it makes my life easier....
> > >
> > > But I agree that md is doing something wrong.  Now that ->queue_lock is
> > > always initialised, it is wrong to leave it in a state where it not defined.
> > >
> > > So maybe I'll apply this (after testing it a bit.  The only reason for taking
> > > the lock queue_lock in a couple of places is to silence some warnings.
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > >
> > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > index a23ffa3..909282d 100644
> > > --- a/drivers/md/raid1.c
> > > +++ b/drivers/md/raid1.c
> > > @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio)
> > >                atomic_inc(&r1_bio->remaining);
> > >                spin_lock_irqsave(&conf->device_lock, flags);
> > >                bio_list_add(&conf->pending_bio_list, mbio);
> > > +               spin_lock(mddev->queue->queue_lock);
> > >                blk_plug_device(mddev->queue);
> > > +               spin_unlock(mddev->queue->queue_lock);
> > >                spin_unlock_irqrestore(&conf->device_lock, flags);
> > >        }
> > >        r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL);
> > 
> > Noticed an inconsistency, raid10.c's additional locking also protects
> > the bio_list_add() whereas raid1.c's doesn't.  Seems the additional
> > protection in raid10 isn't needed?
> 
> Correct - not needed at all.
> I put it there because it felt a little cleaner keeping the two 'lock's
> together like the two 'unlock's.  Probably confusing though...
> 
> My other though is to stop using the block-layer plugging altogether like I
> have in RAID5 (Which I needed to do to make it work with DM).  Then I
> wouldn't need to touch queue_lock at all - very tempting.

FYI, Jens has a considerable plugging overhaul staged in his tree for 2.6.39:
http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-2.6.39/stack-plug

So if you do ween MD off of the block-layer's plugging Jens will need to
adapt his patches that touch MD.. not a big deal.

Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit taking q->queue_lock is problematic
  2011-02-18  3:33               ` NeilBrown
  2011-02-18 14:04                 ` Mike Snitzer
@ 2011-02-18 15:04                 ` Vivek Goyal
  2011-02-21  7:24                   ` NeilBrown
  1 sibling, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-02-18 15:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: Mike Snitzer, Jens Axboe, linux-kernel

On Fri, Feb 18, 2011 at 02:33:25PM +1100, NeilBrown wrote:
> On Thu, 17 Feb 2011 22:19:52 -0500 Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Thu, Feb 17, 2011 at 9:40 PM, NeilBrown <neilb@suse.de> wrote:
> > > On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> > >> So if we do this change for performance reasons, it still makes sense
> > >> but doing this change because md provided a q->queue_lock and took away that
> > >> lock without notifying block layer hence we do this change, is still not
> > >> the right reason, IMHO.
> > >
> > > Well...I like that patch, as it makes my life easier....
> > >
> > > But I agree that md is doing something wrong.  Now that ->queue_lock is
> > > always initialised, it is wrong to leave it in a state where it not defined.
> > >
> > > So maybe I'll apply this (after testing it a bit.  The only reason for taking
> > > the lock queue_lock in a couple of places is to silence some warnings.
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > >
> > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > index a23ffa3..909282d 100644
> > > --- a/drivers/md/raid1.c
> > > +++ b/drivers/md/raid1.c
> > > @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio)
> > >                atomic_inc(&r1_bio->remaining);
> > >                spin_lock_irqsave(&conf->device_lock, flags);
> > >                bio_list_add(&conf->pending_bio_list, mbio);
> > > +               spin_lock(mddev->queue->queue_lock);
> > >                blk_plug_device(mddev->queue);
> > > +               spin_unlock(mddev->queue->queue_lock);
> > >                spin_unlock_irqrestore(&conf->device_lock, flags);
> > >        }
> > >        r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL);
> > 
> > Noticed an inconsistency, raid10.c's additional locking also protects
> > the bio_list_add() whereas raid1.c's doesn't.  Seems the additional
> > protection in raid10 isn't needed?
> 
> Correct - not needed at all.
> I put it there because it felt a little cleaner keeping the two 'lock's
> together like the two 'unlock's.  Probably confusing though...

I guess you could use blk_plug_device_unlocked() to get rid of ugliness
and this routine will take care of taking queue lock.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit  taking q->queue_lock is problematic
  2011-02-18  2:40           ` NeilBrown
  2011-02-18  3:19             ` Mike Snitzer
@ 2011-02-18 15:05             ` Vivek Goyal
  1 sibling, 0 replies; 17+ messages in thread
From: Vivek Goyal @ 2011-02-18 15:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-kernel

On Fri, Feb 18, 2011 at 01:40:25PM +1100, NeilBrown wrote:
> On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > On Thu, Feb 17, 2011 at 04:55:01PM +1100, NeilBrown wrote:
> > > On Wed, 16 Feb 2011 20:10:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> > > 
> > > > So is it possible to keep the spinlock intact when md is calling up
> > > > blk_cleanup_queue()?
> > > > 
> > > 
> > > It would be possible, yes - but messy.  I would probably end up just making
> > > ->queue_lock always point to __queue_lock, and then only take it at the
> > > places where I call 'block' code which wants to test to see if it is
> > > currently held (like the plugging routines).
> > > 
> > > The queue lock (and most of the request queue) is simply irrelevant for md.
> > > I would prefer to get away from having to touch it at all...
> > 
> > If queue lock is mostly irrelevant for md, then why md should provide an
> > external lock and not use queue's internal spin lock?
> 
> See other email - historical reasons mostly.
> 
> > 
> > > 
> > > I'll see how messy it would be to stop using it completely and it can just be
> > > __queue_lock.
> > > 
> > > Though for me - it would be much easier if you just used __queue_lock .....
> > 
> > Ok, here is the simple patch which splits the queue lock and uses
> > throtl_lock for throttling logic. I booted and it seems to be working.
> > 
> > Having said that, this now introduces the possibility of races for any
> > services I take from request queue. I need to see if I need to take
> > queue lock and that makes me little uncomfortable. 
> > 
> > I am using kblockd_workqueue to queue throtl work. Looks like I don't
> > need queue lock for that. I am also using block tracing infrastructure
> > and my understanding is that I don't need queue lock for that as well.
> > 
> > So if we do this change for performance reasons, it still makes sense
> > but doing this change because md provided a q->queue_lock and took away that
> > lock without notifying block layer hence we do this change, is still not
> > the right reason, IMHO.
> 
> Well...I like that patch, as it makes my life easier....
> 
> But I agree that md is doing something wrong.  Now that ->queue_lock is
> always initialised, it is wrong to leave it in a state where it not defined.
> 
> So maybe I'll apply this (after testing it a bit.  The only reason for taking
> the lock queue_lock in a couple of places is to silence some warnings.
> 
> Thanks,
> NeilBrown

Thanks Neil. This looks much better in the sense that now we know that
queue_lock is still valid at the blk_cleanup_queue() time and did not
vanish unexpectdly.

Thanks
Vivek

> 
> 
> diff --git a/drivers/md/linear.c b/drivers/md/linear.c
> index 8a2f767..0ed7f6b 100644
> --- a/drivers/md/linear.c
> +++ b/drivers/md/linear.c
> @@ -216,7 +216,6 @@ static int linear_run (mddev_t *mddev)
>  
>  	if (md_check_no_bitmap(mddev))
>  		return -EINVAL;
> -	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
>  	conf = linear_conf(mddev, mddev->raid_disks);
>  
>  	if (!conf)
> diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
> index 6d7ddf3..3a62d44 100644
> --- a/drivers/md/multipath.c
> +++ b/drivers/md/multipath.c
> @@ -435,7 +435,6 @@ static int multipath_run (mddev_t *mddev)
>  	 * bookkeeping area. [whatever we allocate in multipath_run(),
>  	 * should be freed in multipath_stop()]
>  	 */
> -	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
>  
>  	conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL);
>  	mddev->private = conf;
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 75671df..c0ac457 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -361,7 +361,6 @@ static int raid0_run(mddev_t *mddev)
>  	if (md_check_no_bitmap(mddev))
>  		return -EINVAL;
>  	blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
> -	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
>  
>  	/* if private is not null, we are here after takeover */
>  	if (mddev->private == NULL) {
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index a23ffa3..909282d 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -593,7 +593,9 @@ static int flush_pending_writes(conf_t *conf)
>  	if (conf->pending_bio_list.head) {
>  		struct bio *bio;
>  		bio = bio_list_get(&conf->pending_bio_list);
> +		spin_lock(conf->mddev->queue->queue_lock);
>  		blk_remove_plug(conf->mddev->queue);
> +		spin_unlock(conf->mddev->queue->queue_lock);
>  		spin_unlock_irq(&conf->device_lock);
>  		/* flush any pending bitmap writes to
>  		 * disk before proceeding w/ I/O */
> @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio)
>  		atomic_inc(&r1_bio->remaining);
>  		spin_lock_irqsave(&conf->device_lock, flags);
>  		bio_list_add(&conf->pending_bio_list, mbio);
> +		spin_lock(mddev->queue->queue_lock);
>  		blk_plug_device(mddev->queue);
> +		spin_unlock(mddev->queue->queue_lock);
>  		spin_unlock_irqrestore(&conf->device_lock, flags);
>  	}
>  	r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL);
> @@ -2021,7 +2025,6 @@ static int run(mddev_t *mddev)
>  	if (IS_ERR(conf))
>  		return PTR_ERR(conf);
>  
> -	mddev->queue->queue_lock = &conf->device_lock;
>  	list_for_each_entry(rdev, &mddev->disks, same_set) {
>  		disk_stack_limits(mddev->gendisk, rdev->bdev,
>  				  rdev->data_offset << 9);
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 3b607b2..60e6cb1 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -662,7 +662,9 @@ static int flush_pending_writes(conf_t *conf)
>  	if (conf->pending_bio_list.head) {
>  		struct bio *bio;
>  		bio = bio_list_get(&conf->pending_bio_list);
> +		spin_lock(conf->mddev->queue->queue_lock);
>  		blk_remove_plug(conf->mddev->queue);
> +		spin_unlock(conf->mddev->queue->queue_lock);
>  		spin_unlock_irq(&conf->device_lock);
>  		/* flush any pending bitmap writes to disk
>  		 * before proceeding w/ I/O */
> @@ -970,8 +972,10 @@ static int make_request(mddev_t *mddev, struct bio * bio)
>  
>  		atomic_inc(&r10_bio->remaining);
>  		spin_lock_irqsave(&conf->device_lock, flags);
> +		spin_lock(mddev->queue->queue_lock);
>  		bio_list_add(&conf->pending_bio_list, mbio);
>  		blk_plug_device(mddev->queue);
> +		spin_unlock(mddev->queue->queue_lock);
>  		spin_unlock_irqrestore(&conf->device_lock, flags);
>  	}
>  
> @@ -2304,8 +2308,6 @@ static int run(mddev_t *mddev)
>  	if (!conf)
>  		goto out;
>  
> -	mddev->queue->queue_lock = &conf->device_lock;
> -
>  	mddev->thread = conf->thread;
>  	conf->thread = NULL;
>  
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 7028128..78536fd 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5204,7 +5204,6 @@ static int run(mddev_t *mddev)
>  
>  		mddev->queue->backing_dev_info.congested_data = mddev;
>  		mddev->queue->backing_dev_info.congested_fn = raid5_congested;
> -		mddev->queue->queue_lock = &conf->device_lock;
>  		mddev->queue->unplug_fn = raid5_unplug_queue;
>  
>  		chunk_size = mddev->chunk_sectors << 9;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit taking q->queue_lock is problematic
  2011-02-18 15:04                 ` Vivek Goyal
@ 2011-02-21  7:24                   ` NeilBrown
  2011-02-21 14:42                     ` Vivek Goyal
  0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2011-02-21  7:24 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Mike Snitzer, Jens Axboe, linux-kernel

On Fri, 18 Feb 2011 10:04:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:

> On Fri, Feb 18, 2011 at 02:33:25PM +1100, NeilBrown wrote:
> > On Thu, 17 Feb 2011 22:19:52 -0500 Mike Snitzer <snitzer@redhat.com> wrote:

> > > Noticed an inconsistency, raid10.c's additional locking also protects
> > > the bio_list_add() whereas raid1.c's doesn't.  Seems the additional
> > > protection in raid10 isn't needed?
> > 
> > Correct - not needed at all.
> > I put it there because it felt a little cleaner keeping the two 'lock's
> > together like the two 'unlock's.  Probably confusing though...
> 
> I guess you could use blk_plug_device_unlocked() to get rid of ugliness
> and this routine will take care of taking queue lock.
> 

Yep, that gets rid of some ugliness.
I've made that change and will submit it in due course.
So blk_throtl doesn't need any change to avoid the problem with md - that
changes are made in md instead.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: blk_throtl_exit taking q->queue_lock is problematic
  2011-02-21  7:24                   ` NeilBrown
@ 2011-02-21 14:42                     ` Vivek Goyal
  0 siblings, 0 replies; 17+ messages in thread
From: Vivek Goyal @ 2011-02-21 14:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: Mike Snitzer, Jens Axboe, linux-kernel

On Mon, Feb 21, 2011 at 06:24:19PM +1100, NeilBrown wrote:
> On Fri, 18 Feb 2011 10:04:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > On Fri, Feb 18, 2011 at 02:33:25PM +1100, NeilBrown wrote:
> > > On Thu, 17 Feb 2011 22:19:52 -0500 Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > > > Noticed an inconsistency, raid10.c's additional locking also protects
> > > > the bio_list_add() whereas raid1.c's doesn't.  Seems the additional
> > > > protection in raid10 isn't needed?
> > > 
> > > Correct - not needed at all.
> > > I put it there because it felt a little cleaner keeping the two 'lock's
> > > together like the two 'unlock's.  Probably confusing though...
> > 
> > I guess you could use blk_plug_device_unlocked() to get rid of ugliness
> > and this routine will take care of taking queue lock.
> > 
> 
> Yep, that gets rid of some ugliness.
> I've made that change and will submit it in due course.
> So blk_throtl doesn't need any change to avoid the problem with md - that
> changes are made in md instead.

Thanks Neil. I might still end up moving blk_throtl_exit() to
blk_cleanup_queue() once I have sorted out blk_sync_queue(). Because
at the end of blk_cleanup_queue() driver is free to release the spin
lock and there are no gurantees that in blk_release_queue() lock is 
still there.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-02-21 14:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-16  7:31 blk_throtl_exit taking q->queue_lock is problematic NeilBrown
2011-02-16 15:53 ` Vivek Goyal
2011-02-17  0:35   ` NeilBrown
2011-02-17  1:10     ` Vivek Goyal
2011-02-17  5:55       ` NeilBrown
2011-02-17 15:01         ` Vivek Goyal
2011-02-17 16:59         ` Vivek Goyal
2011-02-18  2:40           ` NeilBrown
2011-02-18  3:19             ` Mike Snitzer
2011-02-18  3:33               ` NeilBrown
2011-02-18 14:04                 ` Mike Snitzer
2011-02-18 15:04                 ` Vivek Goyal
2011-02-21  7:24                   ` NeilBrown
2011-02-21 14:42                     ` Vivek Goyal
2011-02-18 15:05             ` Vivek Goyal
2011-02-17 20:00 ` Vivek Goyal
2011-02-18  1:57   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).