LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
@ 2021-10-13 8:40 John Garry
2021-10-13 9:22 ` Ming Lei
0 siblings, 1 reply; 10+ messages in thread
From: John Garry @ 2021-10-13 8:40 UTC (permalink / raw)
To: axboe
Cc: ming.lei, linux-block, linux-kernel, kashyap.desai, hare, John Garry
Since it is now possible for a tagset to share a single set of tags, the
iter function should not re-iter the tags for the count of #hw queues in
that case. Rather it should just iter once.
Fixes: e0fdf846c7bb ("blk-mq: Use shared tags for shared sbitmap support")
Reported-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: John Garry <john.garry@huawei.com>
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 72a2724a4eee..c943b6529619 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -378,9 +378,12 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
busy_tag_iter_fn *fn, void *priv)
{
- int i;
+ unsigned int flags = tagset->flags;
+ int i, nr_tags;
+
+ nr_tags = blk_mq_is_shared_tags(flags) ? 1 : tagset->nr_hw_queues;
- for (i = 0; i < tagset->nr_hw_queues; i++) {
+ for (i = 0; i < nr_tags; i++) {
if (tagset->tags && tagset->tags[i])
__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
BT_TAG_ITER_STARTED);
--
2.26.2
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
2021-10-13 8:40 [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags John Garry
@ 2021-10-13 9:22 ` Ming Lei
2021-10-13 10:01 ` John Garry
0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-10-13 9:22 UTC (permalink / raw)
To: John Garry; +Cc: axboe, linux-block, linux-kernel, kashyap.desai, hare
On Wed, Oct 13, 2021 at 04:40:59PM +0800, John Garry wrote:
> Since it is now possible for a tagset to share a single set of tags, the
> iter function should not re-iter the tags for the count of #hw queues in
> that case. Rather it should just iter once.
>
> Fixes: e0fdf846c7bb ("blk-mq: Use shared tags for shared sbitmap support")
> Reported-by: Kashyap Desai <kashyap.desai@broadcom.com>
> Signed-off-by: John Garry <john.garry@huawei.com>
>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 72a2724a4eee..c943b6529619 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -378,9 +378,12 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
> void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
> busy_tag_iter_fn *fn, void *priv)
> {
> - int i;
> + unsigned int flags = tagset->flags;
> + int i, nr_tags;
> +
> + nr_tags = blk_mq_is_shared_tags(flags) ? 1 : tagset->nr_hw_queues;
>
> - for (i = 0; i < tagset->nr_hw_queues; i++) {
> + for (i = 0; i < nr_tags; i++) {
> if (tagset->tags && tagset->tags[i])
> __blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
> BT_TAG_ITER_STARTED);
blk_mq_queue_tag_busy_iter() needn't such change?
Thanks,
Ming
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
2021-10-13 9:22 ` Ming Lei
@ 2021-10-13 10:01 ` John Garry
2021-10-13 10:20 ` Ming Lei
0 siblings, 1 reply; 10+ messages in thread
From: John Garry @ 2021-10-13 10:01 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, kashyap.desai, hare
On 13/10/2021 10:22, Ming Lei wrote:
> On Wed, Oct 13, 2021 at 04:40:59PM +0800, John Garry wrote:
>> Since it is now possible for a tagset to share a single set of tags, the
>> iter function should not re-iter the tags for the count of #hw queues in
>> that case. Rather it should just iter once.
>>
>> Fixes: e0fdf846c7bb ("blk-mq: Use shared tags for shared sbitmap support")
>> Reported-by: Kashyap Desai<kashyap.desai@broadcom.com>
>> Signed-off-by: John Garry<john.garry@huawei.com>
>>
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index 72a2724a4eee..c943b6529619 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -378,9 +378,12 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
>> void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
>> busy_tag_iter_fn *fn, void *priv)
>> {
>> - int i;
>> + unsigned int flags = tagset->flags;
>> + int i, nr_tags;
>> +
>> + nr_tags = blk_mq_is_shared_tags(flags) ? 1 : tagset->nr_hw_queues;
>>
>> - for (i = 0; i < tagset->nr_hw_queues; i++) {
>> + for (i = 0; i < nr_tags; i++) {
>> if (tagset->tags && tagset->tags[i])
>> __blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
>> BT_TAG_ITER_STARTED);
> blk_mq_queue_tag_busy_iter() needn't such change?
I didn't think so.
blk_mq_queue_tag_busy_iter() will indeed re-iter the tags per hctx.
However in bt_iter(), we check rq->mq_hctx == hctx for calling the iter
callback:
static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
{
...
if (rq->q == hctx->queue && rq->mq_hctx == hctx)
ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
And this would only pass for the correct hctx which we're iter'ing for.
Indeed, it would be nice not to iter excessive times, but I didn't see a
straightforward way to change that.
There is also blk_mq_all_tag_iter():
void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
void *priv)
{
__blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
}
But then the only user is blk_mq_hctx_has_requests():
static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
{
struct blk_mq_tags *tags = hctx->sched_tags ?
hctx->sched_tags : hctx->tags;
struct rq_iter_data data = {
.hctx = hctx,
};
blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
return data.has_rq;
}
But, again like bt_iter(), blk_mq_has_request() will check the hctx matches:
static bool blk_mq_has_request(struct request *rq, void *data, bool
reserved)
{
struct rq_iter_data *iter_data = data;
if (rq->mq_hctx != iter_data->hctx)
return true;
iter_data->has_rq = true;
return false;
}
Thanks,
John
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
2021-10-13 10:01 ` John Garry
@ 2021-10-13 10:20 ` Ming Lei
2021-10-13 11:11 ` John Garry
0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-10-13 10:20 UTC (permalink / raw)
To: John Garry; +Cc: axboe, linux-block, linux-kernel, kashyap.desai, hare
On Wed, Oct 13, 2021 at 11:01:11AM +0100, John Garry wrote:
> On 13/10/2021 10:22, Ming Lei wrote:
> > On Wed, Oct 13, 2021 at 04:40:59PM +0800, John Garry wrote:
> > > Since it is now possible for a tagset to share a single set of tags, the
> > > iter function should not re-iter the tags for the count of #hw queues in
> > > that case. Rather it should just iter once.
> > >
> > > Fixes: e0fdf846c7bb ("blk-mq: Use shared tags for shared sbitmap support")
> > > Reported-by: Kashyap Desai<kashyap.desai@broadcom.com>
> > > Signed-off-by: John Garry<john.garry@huawei.com>
> > >
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 72a2724a4eee..c943b6529619 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -378,9 +378,12 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
> > > void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
> > > busy_tag_iter_fn *fn, void *priv)
> > > {
> > > - int i;
> > > + unsigned int flags = tagset->flags;
> > > + int i, nr_tags;
> > > +
> > > + nr_tags = blk_mq_is_shared_tags(flags) ? 1 : tagset->nr_hw_queues;
> > > - for (i = 0; i < tagset->nr_hw_queues; i++) {
> > > + for (i = 0; i < nr_tags; i++) {
> > > if (tagset->tags && tagset->tags[i])
> > > __blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
> > > BT_TAG_ITER_STARTED);
> > blk_mq_queue_tag_busy_iter() needn't such change?
>
> I didn't think so.
>
> blk_mq_queue_tag_busy_iter() will indeed re-iter the tags per hctx. However
> in bt_iter(), we check rq->mq_hctx == hctx for calling the iter callback:
>
> static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> {
> ...
>
> if (rq->q == hctx->queue && rq->mq_hctx == hctx)
> ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
>
> And this would only pass for the correct hctx which we're iter'ing for.
It is true for both shared and non-shared sbitmap since we don't share
hctx, so what does matter? With single shared tags, you can iterate over
all requests originated from all hw queues, right?
> Indeed, it would be nice not to iter excessive times, but I didn't see a
> straightforward way to change that.
In Kashyap's report, the lock contention is actually from
blk_mq_queue_tag_busy_iter(), see:
https://lore.kernel.org/linux-block/8867352d-2107-1f8a-0f1c-ef73450bf256@huawei.com/
>
> There is also blk_mq_all_tag_iter():
>
> void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
> void *priv)
> {
> __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
> }
>
> But then the only user is blk_mq_hctx_has_requests():
>
> static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
> {
> struct blk_mq_tags *tags = hctx->sched_tags ?
> hctx->sched_tags : hctx->tags;
> struct rq_iter_data data = {
> .hctx = hctx,
> };
>
> blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
> return data.has_rq;
> }
This above one only iterates over the specified hctx/tags, it won't be
affected.
>
> But, again like bt_iter(), blk_mq_has_request() will check the hctx matches:
Not see what matters wrt. checking hctx.
Thanks,
Ming
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
2021-10-13 10:20 ` Ming Lei
@ 2021-10-13 11:11 ` John Garry
2021-10-13 14:29 ` Ming Lei
0 siblings, 1 reply; 10+ messages in thread
From: John Garry @ 2021-10-13 11:11 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, kashyap.desai, hare
>>> blk_mq_queue_tag_busy_iter() needn't such change? >> I didn't think so.>>>> blk_mq_queue_tag_busy_iter() will indeed
re-iter the tags per hctx. However
>> in bt_iter(), we check rq->mq_hctx == hctx for calling the iter callback:
>>
>> static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>> {
>> ...
>>
>> if (rq->q == hctx->queue && rq->mq_hctx == hctx)
>> ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
>>
>> And this would only pass for the correct hctx which we're iter'ing for.
> It is true for both shared and non-shared sbitmap since we don't share
> hctx, so what does matter?
It matters that we are doing the right thing for shared tags. My point
is we iter but don't call the callback unless the correct hctx.
As I see, this has not changed in transitioning from shared sbitmap to
shared tags.
> With single shared tags, you can iterate over
> all requests originated from all hw queues, right?
>
Right, for the same request queue, we should do that.
>> Indeed, it would be nice not to iter excessive times, but I didn't see a
>> straightforward way to change that.
> In Kashyap's report, the lock contention is actually from
> blk_mq_queue_tag_busy_iter(), see:
>
> https://lore.kernel.org/linux-block/8867352d-2107-1f8a-0f1c-ef73450bf256@huawei.com/
>
As I understand, Kashyap mentioned no throughput regression with my
series, but just higher cpu usage in blk_mq_find_and_get_req().
I'll see if I can see such a thing in my setup.
But could it be that since we only have a single sets of requests per
tagset, and not a set of requests per HW queue, there is more contention
on the common set of requests in the refcount_inc_not_zero() call ***,
below:
static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
unsigned int bitnr)
{
...
rq = tags->rqs[bitnr];
if (... || !refcount_inc_not_zero(&rq->ref)) ***
...
}
But I wonder why this function is even called often...
>> There is also blk_mq_all_tag_iter():
>>
>> void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
>> void *priv)
>> {
>> __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
>> }
>>
>> But then the only user is blk_mq_hctx_has_requests():
>>
>> static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
>> {
>> struct blk_mq_tags *tags = hctx->sched_tags ?
>> hctx->sched_tags : hctx->tags;
>> struct rq_iter_data data = {
>> .hctx = hctx,
>> };
>>
>> blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
>> return data.has_rq;
>> }
> This above one only iterates over the specified hctx/tags, it won't be
> affected.
>
>> But, again like bt_iter(), blk_mq_has_request() will check the hctx matches:
> Not see what matters wrt. checking hctx.
I'm just saying that something like the following would be broken for
shared tags:
static bool blk_mq_has_request(struct request *rq, void *data, bool
reserved)
{
struct rq_iter_data *iter_data = data;
iter_data->has_rq = true;
return true;
}
static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
{
struct rq_iter_data data = {
};
blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
return data.has_rq;
}
As it ignores that we want to check for a specific hctx.
Thanks,
John
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
2021-10-13 11:11 ` John Garry
@ 2021-10-13 14:29 ` Ming Lei
2021-10-13 15:13 ` John Garry
0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-10-13 14:29 UTC (permalink / raw)
To: John Garry
Cc: axboe, linux-block, linux-kernel, kashyap.desai, hare, ming.lei
On Wed, Oct 13, 2021 at 12:11:12PM +0100, John Garry wrote:
> > > > blk_mq_queue_tag_busy_iter() needn't such change? >> I didn't
> > > > think so.>>>> blk_mq_queue_tag_busy_iter() will indeed
> re-iter the tags per hctx. However
> > > in bt_iter(), we check rq->mq_hctx == hctx for calling the iter callback:
> > >
> > > static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> > > {
> > > ...
> > >
> > > if (rq->q == hctx->queue && rq->mq_hctx == hctx)
> > > ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
> > >
> > > And this would only pass for the correct hctx which we're iter'ing for.
> > It is true for both shared and non-shared sbitmap since we don't share
> > hctx, so what does matter?
>
> It matters that we are doing the right thing for shared tags. My point is we
> iter but don't call the callback unless the correct hctx.
>
> As I see, this has not changed in transitioning from shared sbitmap to
> shared tags.
>
> > With single shared tags, you can iterate over
> > all requests originated from all hw queues, right?
> >
> Right, for the same request queue, we should do that.
>
> > > Indeed, it would be nice not to iter excessive times, but I didn't see a
> > > straightforward way to change that.
>
>
> > In Kashyap's report, the lock contention is actually from
> > blk_mq_queue_tag_busy_iter(), see:
> >
> > https://lore.kernel.org/linux-block/8867352d-2107-1f8a-0f1c-ef73450bf256@huawei.com/
> >
>
> As I understand, Kashyap mentioned no throughput regression with my series,
> but just higher cpu usage in blk_mq_find_and_get_req().
>
> I'll see if I can see such a thing in my setup.
>
> But could it be that since we only have a single sets of requests per
> tagset, and not a set of requests per HW queue, there is more contention on
> the common set of requests in the refcount_inc_not_zero() call ***, below:
>
> static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
> unsigned int bitnr)
> {
> ...
>
> rq = tags->rqs[bitnr];
> if (... || !refcount_inc_not_zero(&rq->ref)) ***
> ...
> }
Kashyap's log shows that contention on tags->lock is increased, that
should be caused by nr_hw_queues iterating. blk_mq_find_and_get_req()
will be run nr_hw_queue times compared with pre-shared-sbitmap, since it
is done before checking rq->mq_hctx.
>
> But I wonder why this function is even called often...
>
> > > There is also blk_mq_all_tag_iter():
> > >
> > > void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
> > > void *priv)
> > > {
> > > __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
> > > }
> > >
> > > But then the only user is blk_mq_hctx_has_requests():
> > >
> > > static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
> > > {
> > > struct blk_mq_tags *tags = hctx->sched_tags ?
> > > hctx->sched_tags : hctx->tags;
> > > struct rq_iter_data data = {
> > > .hctx = hctx,
> > > };
> > >
> > > blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
> > > return data.has_rq;
> > > }
> > This above one only iterates over the specified hctx/tags, it won't be
> > affected.
> >
> > > But, again like bt_iter(), blk_mq_has_request() will check the hctx matches:
> > Not see what matters wrt. checking hctx.
>
> I'm just saying that something like the following would be broken for shared
> tags:
>
> static bool blk_mq_has_request(struct request *rq, void *data, bool
> reserved)
> {
> struct rq_iter_data *iter_data = data;
>
> iter_data->has_rq = true;
> return true;
> }
>
> static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
> {
> struct rq_iter_data data = {
> };
>
> blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
> return data.has_rq;
> }
>
> As it ignores that we want to check for a specific hctx.
No, that isn't what I meant, follows the change I suggested:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 72a2724a4eee..2a2ad6dfcc33 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -232,8 +232,9 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
if (!rq)
return true;
- if (rq->q == hctx->queue && rq->mq_hctx == hctx)
- ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
+ if (rq->q == hctx->queue && (rq->mq_hctx == hctx ||
+ blk_mq_is_shared_tags(hctx->flags)))
+ ret = iter_data->fn(rq->mq_hctx, rq, iter_data->data, reserved);
blk_mq_put_rq_ref(rq);
return ret;
}
@@ -460,6 +461,9 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
if (tags->nr_reserved_tags)
bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+
+ if (blk_mq_is_shared_tags(hctx->flags))
+ break;
}
blk_queue_exit(q);
}
Thanks,
Ming
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
2021-10-13 14:29 ` Ming Lei
@ 2021-10-13 15:13 ` John Garry
2021-10-18 8:08 ` John Garry
0 siblings, 1 reply; 10+ messages in thread
From: John Garry @ 2021-10-13 15:13 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, kashyap.desai, hare
On 13/10/2021 15:29, Ming Lei wrote:
>> As I understand, Kashyap mentioned no throughput regression with my series,
>> but just higher cpu usage in blk_mq_find_and_get_req().
>>
>> I'll see if I can see such a thing in my setup.
>>
>> But could it be that since we only have a single sets of requests per
>> tagset, and not a set of requests per HW queue, there is more contention on
>> the common set of requests in the refcount_inc_not_zero() call ***, below:
>>
>> static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
>> unsigned int bitnr)
>> {
>> ...
>>
>> rq = tags->rqs[bitnr];
>> if (... || !refcount_inc_not_zero(&rq->ref)) ***
>> ...
>> }
> Kashyap's log shows that contention on tags->lock is increased, that
> should be caused by nr_hw_queues iterating.
If the lock contention increases on tags->lock then I am not totally
surprised. For shared sbitmap, each HW queue had its own tags (and tags
lock). Now with shared tags, we have a single lock over the tagset, and
so we would have more contention. That's on the basis that we have many
parallel callers to blk_mq_queue_tag_busy_iter().
> blk_mq_find_and_get_req()
> will be run nr_hw_queue times compared with pre-shared-sbitmap, since it
> is done before checking rq->mq_hctx.
Isn't shared sitmap older than blk_mq_find_and_get_req()?
Anyway, for 5.14 shared sbitmap support, we iter nr_hw_queue times. And
now, for shared tags, we still do that. I don't see what's changed in
that regard.
>
>> But I wonder why this function is even called often...
>>
>>>> There is also blk_mq_all_tag_iter():
>>>>
>>>> void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
>>>> void *priv)
>>>> {
>>>> __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
>>>> }
>>>>
>>>> But then the only user is blk_mq_hctx_has_requests():
>>>>
>>>> static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
>>>> {
>>>> struct blk_mq_tags *tags = hctx->sched_tags ?
>>>> hctx->sched_tags : hctx->tags;
>>>> struct rq_iter_data data = {
>>>> .hctx = hctx,
>>>> };
>>>>
>>>> blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
>>>> return data.has_rq;
>>>> }
>>> This above one only iterates over the specified hctx/tags, it won't be
>>> affected.
>>>
>>>> But, again like bt_iter(), blk_mq_has_request() will check the hctx matches:
>>> Not see what matters wrt. checking hctx.
>> I'm just saying that something like the following would be broken for shared
>> tags:
>>
>> static bool blk_mq_has_request(struct request *rq, void *data, bool
>> reserved)
>> {
>> struct rq_iter_data *iter_data = data;
>>
>> iter_data->has_rq = true;
>> return true;
>> }
>>
>> static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
>> {
>> struct rq_iter_data data = {
>> };
>>
>> blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
>> return data.has_rq;
>> }
>>
>> As it ignores that we want to check for a specific hctx.
> No, that isn't what I meant, follows the change I suggested:
I didn't mean that this was your suggestion. I am just saying that we
need to be careful iter'ing tags for shared tags now, as in that example.
>
>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 72a2724a4eee..2a2ad6dfcc33 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -232,8 +232,9 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> if (!rq)
> return true;
>
> - if (rq->q == hctx->queue && rq->mq_hctx == hctx)
> - ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
> + if (rq->q == hctx->queue && (rq->mq_hctx == hctx ||
> + blk_mq_is_shared_tags(hctx->flags)))
> + ret = iter_data->fn(rq->mq_hctx, rq, iter_data->data, reserved);
> blk_mq_put_rq_ref(rq);
> return ret;
> }
> @@ -460,6 +461,9 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
> if (tags->nr_reserved_tags)
> bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
> bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
> +
> + if (blk_mq_is_shared_tags(hctx->flags))
> + break;
> }
> blk_queue_exit(q);
> }
>
I suppose that is ok, and means that we iter once.
However, I have to ask, where is the big user of
blk_mq_queue_tag_busy_iter() coming from? I saw this from Kashyap's mail:
> 1.31% 1.31% kworker/57:1H-k [kernel.vmlinux]
> native_queued_spin_lock_slowpath
> ret_from_fork
> kthread
> worker_thread
> process_one_work
> blk_mq_timeout_work
> blk_mq_queue_tag_busy_iter
> bt_iter
> blk_mq_find_and_get_req
> _raw_spin_lock_irqsave
> native_queued_spin_lock_slowpath
How or why blk_mq_timeout_work()?
Thanks,
john
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
2021-10-13 15:13 ` John Garry
@ 2021-10-18 8:08 ` John Garry
2021-10-18 9:07 ` Ming Lei
0 siblings, 1 reply; 10+ messages in thread
From: John Garry @ 2021-10-18 8:08 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, kashyap.desai, hare
On 13/10/2021 16:13, John Garry wrote:
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index 72a2724a4eee..2a2ad6dfcc33 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -232,8 +232,9 @@ static bool bt_iter(struct sbitmap *bitmap,
>> unsigned int bitnr, void *data)
>> if (!rq)
>> return true;
>> - if (rq->q == hctx->queue && rq->mq_hctx == hctx)
>> - ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
>> + if (rq->q == hctx->queue && (rq->mq_hctx == hctx ||
>> + blk_mq_is_shared_tags(hctx->flags)))
>> + ret = iter_data->fn(rq->mq_hctx, rq, iter_data->data, reserved);
>> blk_mq_put_rq_ref(rq);
>> return ret;
>> }
>> @@ -460,6 +461,9 @@ void blk_mq_queue_tag_busy_iter(struct
>> request_queue *q, busy_iter_fn *fn,
>> if (tags->nr_reserved_tags)
>> bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
>> bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
>> +
>> + if (blk_mq_is_shared_tags(hctx->flags))
>> + break;
>> }
>> blk_queue_exit(q);
>> }
>>
>
> I suppose that is ok, and means that we iter once.
>
> However, I have to ask, where is the big user of
> blk_mq_queue_tag_busy_iter() coming from? I saw this from Kashyap's mail:
>
> > 1.31% 1.31% kworker/57:1H-k [kernel.vmlinux]
> > native_queued_spin_lock_slowpath
> > ret_from_fork
> > kthread
> > worker_thread
> > process_one_work
> > blk_mq_timeout_work
> > blk_mq_queue_tag_busy_iter
> > bt_iter
> > blk_mq_find_and_get_req
> > _raw_spin_lock_irqsave
> > native_queued_spin_lock_slowpath
>
> How or why blk_mq_timeout_work()?
Just some update: I tried hisi_sas with 10x SAS SSDs, megaraid sas with
1x SATA HDD (that's all I have), and null blk with lots of devices, and
I still can't see high usage of blk_mq_queue_tag_busy_iter().
So how about we get this patch processed (to fix
blk_mq_tagset_busy_iter()), as it is independent of
blk_mq_queue_tag_busy_iter()? And then wait for some update or some more
info from Kashyap regarding blk_mq_queue_tag_busy_iter()
Thanks,
John
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
2021-10-18 8:08 ` John Garry
@ 2021-10-18 9:07 ` Ming Lei
2021-10-18 9:33 ` John Garry
0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-10-18 9:07 UTC (permalink / raw)
To: John Garry; +Cc: axboe, linux-block, linux-kernel, kashyap.desai, hare
On Mon, Oct 18, 2021 at 09:08:57AM +0100, John Garry wrote:
> On 13/10/2021 16:13, John Garry wrote:
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 72a2724a4eee..2a2ad6dfcc33 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -232,8 +232,9 @@ static bool bt_iter(struct sbitmap *bitmap,
> > > unsigned int bitnr, void *data)
> > > if (!rq)
> > > return true;
> > > - if (rq->q == hctx->queue && rq->mq_hctx == hctx)
> > > - ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
> > > + if (rq->q == hctx->queue && (rq->mq_hctx == hctx ||
> > > + blk_mq_is_shared_tags(hctx->flags)))
> > > + ret = iter_data->fn(rq->mq_hctx, rq, iter_data->data, reserved);
> > > blk_mq_put_rq_ref(rq);
> > > return ret;
> > > }
> > > @@ -460,6 +461,9 @@ void blk_mq_queue_tag_busy_iter(struct
> > > request_queue *q, busy_iter_fn *fn,
> > > if (tags->nr_reserved_tags)
> > > bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
> > > bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
> > > +
> > > + if (blk_mq_is_shared_tags(hctx->flags))
> > > + break;
> > > }
> > > blk_queue_exit(q);
> > > }
> > >
> >
> > I suppose that is ok, and means that we iter once.
> >
> > However, I have to ask, where is the big user of
> > blk_mq_queue_tag_busy_iter() coming from? I saw this from Kashyap's
> > mail:
> >
> > > 1.31% 1.31% kworker/57:1H-k [kernel.vmlinux]
> > > native_queued_spin_lock_slowpath
> > > ret_from_fork
> > > kthread
> > > worker_thread
> > > process_one_work
> > > blk_mq_timeout_work
> > > blk_mq_queue_tag_busy_iter
> > > bt_iter
> > > blk_mq_find_and_get_req
> > > _raw_spin_lock_irqsave
> > > native_queued_spin_lock_slowpath
> >
> > How or why blk_mq_timeout_work()?
>
> Just some update: I tried hisi_sas with 10x SAS SSDs, megaraid sas with 1x
> SATA HDD (that's all I have), and null blk with lots of devices, and I still
> can't see high usage of blk_mq_queue_tag_busy_iter().
It should be triggered easily in case of heavy io accounting:
while true; do cat /proc/diskstats; done
> So how about we get this patch processed (to fix blk_mq_tagset_busy_iter()),
> as it is independent of blk_mq_queue_tag_busy_iter()? And then wait for some
> update or some more info from Kashyap regarding blk_mq_queue_tag_busy_iter()
Looks fine:
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Thanks,
Ming
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
2021-10-18 9:07 ` Ming Lei
@ 2021-10-18 9:33 ` John Garry
0 siblings, 0 replies; 10+ messages in thread
From: John Garry @ 2021-10-18 9:33 UTC (permalink / raw)
To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, kashyap.desai, hare
On 18/10/2021 10:07, Ming Lei wrote:
> On Mon, Oct 18, 2021 at 09:08:57AM +0100, John Garry wrote:
>> On 13/10/2021 16:13, John Garry wrote:
>>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>>> index 72a2724a4eee..2a2ad6dfcc33 100644
>>>> --- a/block/blk-mq-tag.c
>>>> +++ b/block/blk-mq-tag.c
>>>> @@ -232,8 +232,9 @@ static bool bt_iter(struct sbitmap *bitmap,
>>>> unsigned int bitnr, void *data)
>>>> if (!rq)
>>>> return true;
>>>> - if (rq->q == hctx->queue && rq->mq_hctx == hctx)
>>>> - ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
>>>> + if (rq->q == hctx->queue && (rq->mq_hctx == hctx ||
>>>> + blk_mq_is_shared_tags(hctx->flags)))
>>>> + ret = iter_data->fn(rq->mq_hctx, rq, iter_data->data, reserved);
>>>> blk_mq_put_rq_ref(rq);
>>>> return ret;
>>>> }
>>>> @@ -460,6 +461,9 @@ void blk_mq_queue_tag_busy_iter(struct
>>>> request_queue *q, busy_iter_fn *fn,
>>>> if (tags->nr_reserved_tags)
>>>> bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
>>>> bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
>>>> +
>>>> + if (blk_mq_is_shared_tags(hctx->flags))
>>>> + break;
>>>> }
>>>> blk_queue_exit(q);
>>>> }
>>>>
>>> I suppose that is ok, and means that we iter once.
>>>
>>> However, I have to ask, where is the big user of
>>> blk_mq_queue_tag_busy_iter() coming from? I saw this from Kashyap's
>>> mail:
>>>
>>> > 1.31% 1.31% kworker/57:1H-k [kernel.vmlinux]
>>> > native_queued_spin_lock_slowpath
>>> > ret_from_fork
>>> > kthread
>>> > worker_thread
>>> > process_one_work
>>> > blk_mq_timeout_work
>>> > blk_mq_queue_tag_busy_iter
>>> > bt_iter
>>> > blk_mq_find_and_get_req
>>> > _raw_spin_lock_irqsave
>>> > native_queued_spin_lock_slowpath
>>>
>>> How or why blk_mq_timeout_work()?
>> Just some update: I tried hisi_sas with 10x SAS SSDs, megaraid sas with 1x
>> SATA HDD (that's all I have), and null blk with lots of devices, and I still
>> can't see high usage of blk_mq_queue_tag_busy_iter().
> It should be triggered easily in case of heavy io accounting:
>
> while true; do cat /proc/diskstats; done
>
Let me check that.
>
>> So how about we get this patch processed (to fix blk_mq_tagset_busy_iter()),
>> as it is independent of blk_mq_queue_tag_busy_iter()? And then wait for some
>> update or some more info from Kashyap regarding blk_mq_queue_tag_busy_iter()
> Looks fine:
>
> Reviewed-by: Ming Lei<ming.lei@redhat.com>
Thanks, I'll just send a v2 with your tag for clarity, as there has been
much discussion here.
John
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-10-18 9:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-13 8:40 [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags John Garry
2021-10-13 9:22 ` Ming Lei
2021-10-13 10:01 ` John Garry
2021-10-13 10:20 ` Ming Lei
2021-10-13 11:11 ` John Garry
2021-10-13 14:29 ` Ming Lei
2021-10-13 15:13 ` John Garry
2021-10-18 8:08 ` John Garry
2021-10-18 9:07 ` Ming Lei
2021-10-18 9:33 ` John Garry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).