LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v5 0/6] handle unexpected message from server
@ 2021-09-09 14:12 Yu Kuai
  2021-09-09 14:12 ` [PATCH v5 1/6] nbd: don't handle response without a corresponding request message Yu Kuai
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Yu Kuai @ 2021-09-09 14:12 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

This patch set tries to fix that client might oops if nbd server send
unexpected message to client, for example, our syzkaller report a uaf
in nbd_read_stat():

Call trace:
 dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x144/0x1b4 lib/dump_stack.c:118
 print_address_description+0x68/0x2d0 mm/kasan/report.c:253
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x134/0x2f0 mm/kasan/report.c:409
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
 __read_once_size include/linux/compiler.h:193 [inline]
 blk_mq_rq_state block/blk-mq.h:106 [inline]
 blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
 nbd_read_stat drivers/block/nbd.c:670 [inline]
 recv_work+0x1bc/0x890 drivers/block/nbd.c:749
 process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2147
 worker_thread+0x80/0x9d0 kernel/workqueue.c:2302
 kthread+0x1d8/0x1e0 kernel/kthread.c:255
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174

1) At first, a normal io is submitted and completed with scheduler:

internel_tag = blk_mq_get_tag -> get tag from sched_tags
 blk_mq_rq_ctx_init
  sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
...
blk_mq_get_driver_tag
 __blk_mq_get_driver_tag -> get tag from tags
 tags->rq[tag] = sched_tag->static_rq[internel_tag]

So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
to the request: sched_tags->static_rq[internal_tag]. Even if the
io is finished.

2) nbd server send a reply with random tag directly:

recv_work
 nbd_read_stat
  blk_mq_tag_to_rq(tags, tag)
   rq = tags->rq[tag]

3) if the sched_tags->static_rq is freed:

blk_mq_sched_free_requests
 blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
  -> step 2) access rq before clearing rq mapping
  blk_mq_clear_rq_mapping(set, tags, hctx_idx);
  __free_pages() -> rq is freed here

4) Then, nbd continue to use the freed request in nbd_read_stat()

Changes in v5:
 - move patch 1 & 2 in v4 (patch 4 & 5 in v5) behind
 - add some comment in patch 5
Changes in v4:
 - change the name of the patchset, since uaf is not the only problem
 if server send unexpected reply message.
 - instead of adding new interface, use blk_mq_find_and_get_req().
 - add patch 5 to this series
Changes in v3:
 - v2 can't fix the problem thoroughly, add patch 3-4 to this series.
 - modify descriptions.
 - patch 5 is just a cleanup
Changes in v2:
 - as Bart suggested, add a new helper function for drivers to get
 request by tag.

Yu Kuai (6):
  nbd: don't handle response without a corresponding request message
  nbd: make sure request completion won't concurrent
  nbd: check sock index in nbd_read_stat()
  blk-mq: export two symbols to get request by tag
  nbd: convert to use blk_mq_find_and_get_req()
  nbd: don't start request if nbd_queue_rq() failed

 block/blk-mq-tag.c     |  5 +++--
 block/blk-mq.c         |  1 +
 drivers/block/nbd.c    | 51 ++++++++++++++++++++++++++++++++++++------
 include/linux/blk-mq.h |  3 +++
 4 files changed, 51 insertions(+), 9 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5 1/6] nbd: don't handle response without a corresponding request message
  2021-09-09 14:12 [PATCH v5 0/6] handle unexpected message from server Yu Kuai
@ 2021-09-09 14:12 ` Yu Kuai
  2021-09-14  0:54   ` Ming Lei
  2021-09-09 14:12 ` [PATCH v5 2/6] nbd: make sure request completion won't concurrent Yu Kuai
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2021-09-09 14:12 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

While handling a response message from server, nbd_read_stat() will
try to get request by tag, and then complete the request. However,
this is problematic if nbd haven't sent a corresponding request
message:

t1                      t2
                        submit_bio
                         nbd_queue_rq
                          blk_mq_start_request
recv_work
 nbd_read_stat
  blk_mq_tag_to_rq
 blk_mq_complete_request
                          nbd_send_cmd

Thus add a new cmd flag 'NBD_CMD_INFLIGHT', it will be set in
nbd_send_cmd() and checked in nbd_read_stat().

Noted that this patch can't fix that blk_mq_tag_to_rq() might
return a freed request, and this will be fixed in following
patches.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/nbd.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5170a630778d..04861b585b62 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -126,6 +126,12 @@ struct nbd_device {
 };
 
 #define NBD_CMD_REQUEUED	1
+/*
+ * This flag will be set if nbd_queue_rq() succeed, and will be checked and
+ * cleared in completion. Both setting and clearing of the flag are protected
+ * by cmd->lock.
+ */
+#define NBD_CMD_INFLIGHT	2
 
 struct nbd_cmd {
 	struct nbd_device *nbd;
@@ -400,6 +406,7 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 	if (!mutex_trylock(&cmd->lock))
 		return BLK_EH_RESET_TIMER;
 
+	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
 	if (!refcount_inc_not_zero(&nbd->config_refs)) {
 		cmd->status = BLK_STS_TIMEOUT;
 		mutex_unlock(&cmd->lock);
@@ -729,6 +736,12 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 	cmd = blk_mq_rq_to_pdu(req);
 
 	mutex_lock(&cmd->lock);
+	if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
+		dev_err(disk_to_dev(nbd->disk), "Suspicious reply %d (status %u flags %lu)",
+			tag, cmd->status, cmd->flags);
+		ret = -ENOENT;
+		goto out;
+	}
 	if (cmd->cmd_cookie != nbd_handle_to_cookie(handle)) {
 		dev_err(disk_to_dev(nbd->disk), "Double reply on req %p, cmd_cookie %u, handle cookie %u\n",
 			req, cmd->cmd_cookie, nbd_handle_to_cookie(handle));
@@ -829,6 +842,7 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved)
 
 	mutex_lock(&cmd->lock);
 	cmd->status = BLK_STS_IOERR;
+	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
 	mutex_unlock(&cmd->lock);
 
 	blk_mq_complete_request(req);
@@ -964,7 +978,13 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	 * returns EAGAIN can be retried on a different socket.
 	 */
 	ret = nbd_send_cmd(nbd, cmd, index);
-	if (ret == -EAGAIN) {
+	/*
+	 * Access to this flag is protected by cmd->lock, thus it's safe to set
+	 * the flag after nbd_send_cmd() succeed to send request to server.
+	 */
+	if (!ret)
+		__set_bit(NBD_CMD_INFLIGHT, &cmd->flags);
+	else if (ret == -EAGAIN) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Request send failed, requeueing\n");
 		nbd_mark_nsock_dead(nbd, nsock, 1);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5 2/6] nbd: make sure request completion won't concurrent
  2021-09-09 14:12 [PATCH v5 0/6] handle unexpected message from server Yu Kuai
  2021-09-09 14:12 ` [PATCH v5 1/6] nbd: don't handle response without a corresponding request message Yu Kuai
@ 2021-09-09 14:12 ` Yu Kuai
  2021-09-14  0:57   ` Ming Lei
  2021-09-09 14:12 ` [PATCH v5 3/6] nbd: check sock index in nbd_read_stat() Yu Kuai
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2021-09-09 14:12 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

commit cddce0116058 ("nbd: Aovid double completion of a request")
try to fix that nbd_clear_que() and recv_work() can complete a
request concurrently. However, the problem still exists:

t1                    t2                     t3

nbd_disconnect_and_put
 flush_workqueue
                      recv_work
                       blk_mq_complete_request
                        blk_mq_complete_request_remote -> this is true
                         WRITE_ONCE(rq->state, MQ_RQ_COMPLETE)
                          blk_mq_raise_softirq
                                             blk_done_softirq
                                              blk_complete_reqs
                                               nbd_complete_rq
                                                blk_mq_end_request
                                                 blk_mq_free_request
                                                  WRITE_ONCE(rq->state, MQ_RQ_IDLE)
  nbd_clear_que
   blk_mq_tagset_busy_iter
    nbd_clear_req
                                                   __blk_mq_free_request
                                                    blk_mq_put_tag
     blk_mq_complete_request -> complete again

There are three places where request can be completed in nbd:
recv_work(), nbd_clear_que() and nbd_xmit_timeout(). Since they
all hold cmd->lock before completing the request, it's easy to
avoid the problem by setting and checking a cmd flag.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/nbd.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 04861b585b62..550c8dc438ac 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -406,7 +406,11 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 	if (!mutex_trylock(&cmd->lock))
 		return BLK_EH_RESET_TIMER;
 
-	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
+	if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
+		mutex_unlock(&cmd->lock);
+		return BLK_EH_DONE;
+	}
+
 	if (!refcount_inc_not_zero(&nbd->config_refs)) {
 		cmd->status = BLK_STS_TIMEOUT;
 		mutex_unlock(&cmd->lock);
@@ -842,7 +846,10 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved)
 
 	mutex_lock(&cmd->lock);
 	cmd->status = BLK_STS_IOERR;
-	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
+	if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
+		mutex_unlock(&cmd->lock);
+		return true;
+	}
 	mutex_unlock(&cmd->lock);
 
 	blk_mq_complete_request(req);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5 3/6] nbd: check sock index in nbd_read_stat()
  2021-09-09 14:12 [PATCH v5 0/6] handle unexpected message from server Yu Kuai
  2021-09-09 14:12 ` [PATCH v5 1/6] nbd: don't handle response without a corresponding request message Yu Kuai
  2021-09-09 14:12 ` [PATCH v5 2/6] nbd: make sure request completion won't concurrent Yu Kuai
@ 2021-09-09 14:12 ` Yu Kuai
  2021-09-09 14:12 ` [PATCH v5 4/6] blk-mq: export two symbols to get request by tag Yu Kuai
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Yu Kuai @ 2021-09-09 14:12 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

The sock that clent send request in nbd_send_cmd() and receive reply
in nbd_read_stat() should be the same.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/nbd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 550c8dc438ac..6d8cbf8be231 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -746,6 +746,10 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 		ret = -ENOENT;
 		goto out;
 	}
+	if (cmd->index != index) {
+		dev_err(disk_to_dev(nbd->disk), "Unexpected reply %d from different sock %d (expected %d)",
+			tag, index, cmd->index);
+	}
 	if (cmd->cmd_cookie != nbd_handle_to_cookie(handle)) {
 		dev_err(disk_to_dev(nbd->disk), "Double reply on req %p, cmd_cookie %u, handle cookie %u\n",
 			req, cmd->cmd_cookie, nbd_handle_to_cookie(handle));
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5 4/6] blk-mq: export two symbols to get request by tag
  2021-09-09 14:12 [PATCH v5 0/6] handle unexpected message from server Yu Kuai
                   ` (2 preceding siblings ...)
  2021-09-09 14:12 ` [PATCH v5 3/6] nbd: check sock index in nbd_read_stat() Yu Kuai
@ 2021-09-09 14:12 ` Yu Kuai
  2021-09-09 14:12 ` [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req() Yu Kuai
  2021-09-09 14:12 ` [PATCH v5 6/6] nbd: don't start request if nbd_queue_rq() failed Yu Kuai
  5 siblings, 0 replies; 24+ messages in thread
From: Yu Kuai @ 2021-09-09 14:12 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

nbd has a defect that blk_mq_tag_to_rq() might return a freed
request in nbd_read_stat(). We need a new mechanism if we want to
fix this in nbd driver, which is rather complicated.

Thus use blk_mq_find_and_get_req() to replace blk_mq_tag_to_rq(),
which can make sure the returned request is not freed, and then we
can do more checking while 'cmd->lock' is hold.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-mq-tag.c     | 5 +++--
 block/blk-mq.c         | 1 +
 include/linux/blk-mq.h | 3 +++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 86f87346232a..b4f66b75b4d1 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -200,8 +200,8 @@ struct bt_iter_data {
 	bool reserved;
 };
 
-static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
-		unsigned int bitnr)
+struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
+					unsigned int bitnr)
 {
 	struct request *rq;
 	unsigned long flags;
@@ -213,6 +213,7 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
 	spin_unlock_irqrestore(&tags->lock, flags);
 	return rq;
 }
+EXPORT_SYMBOL(blk_mq_find_and_get_req);
 
 static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 08626cb0534c..5113aa3788a2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -916,6 +916,7 @@ void blk_mq_put_rq_ref(struct request *rq)
 	else if (refcount_dec_and_test(&rq->ref))
 		__blk_mq_free_request(rq);
 }
+EXPORT_SYMBOL(blk_mq_put_rq_ref);
 
 static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, void *priv, bool reserved)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 13ba1861e688..03e02990609d 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -637,4 +637,7 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio);
 void blk_mq_hctx_set_fq_lock_class(struct blk_mq_hw_ctx *hctx,
 		struct lock_class_key *key);
 
+void blk_mq_put_rq_ref(struct request *rq);
+struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
+					unsigned int bitnr);
 #endif
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-09 14:12 [PATCH v5 0/6] handle unexpected message from server Yu Kuai
                   ` (3 preceding siblings ...)
  2021-09-09 14:12 ` [PATCH v5 4/6] blk-mq: export two symbols to get request by tag Yu Kuai
@ 2021-09-09 14:12 ` Yu Kuai
  2021-09-14  1:11   ` Ming Lei
  2021-09-09 14:12 ` [PATCH v5 6/6] nbd: don't start request if nbd_queue_rq() failed Yu Kuai
  5 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2021-09-09 14:12 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

blk_mq_tag_to_rq() can only ensure to return valid request in
following situation:

1) client send request message to server first
submit_bio
...
 blk_mq_get_tag
 ...
 blk_mq_get_driver_tag
 ...
 nbd_queue_rq
  nbd_handle_cmd
   nbd_send_cmd

2) client receive respond message from server
recv_work
 nbd_read_stat
  blk_mq_tag_to_rq

If step 1) is missing, blk_mq_tag_to_rq() will return a stale
request, which might be freed. Thus convert to use
blk_mq_find_and_get_req() to make sure the returned request is not
freed.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/nbd.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 6d8cbf8be231..d298e2b9e6ee 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -729,12 +729,13 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 	tag = nbd_handle_to_tag(handle);
 	hwq = blk_mq_unique_tag_to_hwq(tag);
 	if (hwq < nbd->tag_set.nr_hw_queues)
-		req = blk_mq_tag_to_rq(nbd->tag_set.tags[hwq],
-				       blk_mq_unique_tag_to_tag(tag));
+		req = blk_mq_find_and_get_req(nbd->tag_set.tags[hwq],
+					      blk_mq_unique_tag_to_tag(tag));
 	if (!req || !blk_mq_request_started(req)) {
 		dev_err(disk_to_dev(nbd->disk), "Unexpected reply (%d) %p\n",
 			tag, req);
-		return ERR_PTR(-ENOENT);
+		ret = -ENOENT;
+		goto put_req;
 	}
 	trace_nbd_header_received(req, handle);
 	cmd = blk_mq_rq_to_pdu(req);
@@ -806,6 +807,14 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 out:
 	trace_nbd_payload_received(req, handle);
 	mutex_unlock(&cmd->lock);
+put_req:
+	/*
+	 * It's safe to drop refcnt here because request completion won't
+	 * concurent, thus if nbd_read_stat() successd, the request refcnt
+	 * won't drop to zero here.
+	 */
+	if (req)
+		blk_mq_put_rq_ref(req);
 	return ret ? ERR_PTR(ret) : cmd;
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5 6/6] nbd: don't start request if nbd_queue_rq() failed
  2021-09-09 14:12 [PATCH v5 0/6] handle unexpected message from server Yu Kuai
                   ` (4 preceding siblings ...)
  2021-09-09 14:12 ` [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req() Yu Kuai
@ 2021-09-09 14:12 ` Yu Kuai
  5 siblings, 0 replies; 24+ messages in thread
From: Yu Kuai @ 2021-09-09 14:12 UTC (permalink / raw)
  To: axboe, josef, ming.lei, hch
  Cc: linux-block, linux-kernel, nbd, yukuai3, yi.zhang

Currently, blk_mq_end_request() will be called if nbd_queue_rq()
failed, thus start request in such situation is useless.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/nbd.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index d298e2b9e6ee..7a963c4ec0d1 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -943,7 +943,6 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	if (!refcount_inc_not_zero(&nbd->config_refs)) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Socks array is empty\n");
-		blk_mq_start_request(req);
 		return -EINVAL;
 	}
 	config = nbd->config;
@@ -952,7 +951,6 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Attempted send on invalid socket\n");
 		nbd_config_put(nbd);
-		blk_mq_start_request(req);
 		return -EINVAL;
 	}
 	cmd->status = BLK_STS_OK;
@@ -976,7 +974,6 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 			 */
 			sock_shutdown(nbd);
 			nbd_config_put(nbd);
-			blk_mq_start_request(req);
 			return -EIO;
 		}
 		goto again;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 1/6] nbd: don't handle response without a corresponding request message
  2021-09-09 14:12 ` [PATCH v5 1/6] nbd: don't handle response without a corresponding request message Yu Kuai
@ 2021-09-14  0:54   ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2021-09-14  0:54 UTC (permalink / raw)
  To: Yu Kuai; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On Thu, Sep 09, 2021 at 10:12:51PM +0800, Yu Kuai wrote:
> While handling a response message from server, nbd_read_stat() will
> try to get request by tag, and then complete the request. However,
> this is problematic if nbd haven't sent a corresponding request
> message:
> 
> t1                      t2
>                         submit_bio
>                          nbd_queue_rq
>                           blk_mq_start_request
> recv_work
>  nbd_read_stat
>   blk_mq_tag_to_rq
>  blk_mq_complete_request
>                           nbd_send_cmd
> 
> Thus add a new cmd flag 'NBD_CMD_INFLIGHT', it will be set in
> nbd_send_cmd() and checked in nbd_read_stat().
> 
> Noted that this patch can't fix that blk_mq_tag_to_rq() might
> return a freed request, and this will be fixed in following
> patches.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 2/6] nbd: make sure request completion won't concurrent
  2021-09-09 14:12 ` [PATCH v5 2/6] nbd: make sure request completion won't concurrent Yu Kuai
@ 2021-09-14  0:57   ` Ming Lei
  2021-09-14  3:11     ` yukuai (C)
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2021-09-14  0:57 UTC (permalink / raw)
  To: Yu Kuai; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On Thu, Sep 09, 2021 at 10:12:52PM +0800, Yu Kuai wrote:
> commit cddce0116058 ("nbd: Aovid double completion of a request")
> try to fix that nbd_clear_que() and recv_work() can complete a
> request concurrently. However, the problem still exists:
> 
> t1                    t2                     t3
> 
> nbd_disconnect_and_put
>  flush_workqueue
>                       recv_work
>                        blk_mq_complete_request
>                         blk_mq_complete_request_remote -> this is true
>                          WRITE_ONCE(rq->state, MQ_RQ_COMPLETE)
>                           blk_mq_raise_softirq
>                                              blk_done_softirq
>                                               blk_complete_reqs
>                                                nbd_complete_rq
>                                                 blk_mq_end_request
>                                                  blk_mq_free_request
>                                                   WRITE_ONCE(rq->state, MQ_RQ_IDLE)
>   nbd_clear_que
>    blk_mq_tagset_busy_iter
>     nbd_clear_req
>                                                    __blk_mq_free_request
>                                                     blk_mq_put_tag
>      blk_mq_complete_request -> complete again
> 
> There are three places where request can be completed in nbd:
> recv_work(), nbd_clear_que() and nbd_xmit_timeout(). Since they
> all hold cmd->lock before completing the request, it's easy to
> avoid the problem by setting and checking a cmd flag.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/block/nbd.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 04861b585b62..550c8dc438ac 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -406,7 +406,11 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
>  	if (!mutex_trylock(&cmd->lock))
>  		return BLK_EH_RESET_TIMER;
>  
> -	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
> +	if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
> +		mutex_unlock(&cmd->lock);
> +		return BLK_EH_DONE;
> +	}
> +
>  	if (!refcount_inc_not_zero(&nbd->config_refs)) {
>  		cmd->status = BLK_STS_TIMEOUT;
>  		mutex_unlock(&cmd->lock);
> @@ -842,7 +846,10 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved)
>  
>  	mutex_lock(&cmd->lock);
>  	cmd->status = BLK_STS_IOERR;
> -	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
> +	if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
> +		mutex_unlock(&cmd->lock);
> +		return true;
> +	}
>  	mutex_unlock(&cmd->lock);

If this request has completed from other code paths, ->status shouldn't be
updated here, maybe it is done successfully.

-- 
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-09 14:12 ` [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req() Yu Kuai
@ 2021-09-14  1:11   ` Ming Lei
  2021-09-14  3:11     ` yukuai (C)
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2021-09-14  1:11 UTC (permalink / raw)
  To: Yu Kuai; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
> blk_mq_tag_to_rq() can only ensure to return valid request in
> following situation:
> 
> 1) client send request message to server first
> submit_bio
> ...
>  blk_mq_get_tag
>  ...
>  blk_mq_get_driver_tag
>  ...
>  nbd_queue_rq
>   nbd_handle_cmd
>    nbd_send_cmd
> 
> 2) client receive respond message from server
> recv_work
>  nbd_read_stat
>   blk_mq_tag_to_rq
> 
> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
> request, which might be freed. Thus convert to use
> blk_mq_find_and_get_req() to make sure the returned request is not
> freed.

But NBD_CMD_INFLIGHT has been added for checking if the reply is
expected, do we still need blk_mq_find_and_get_req() for covering
this issue? BTW, request and its payload is pre-allocated, so there
isn't real use-after-free.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-14  1:11   ` Ming Lei
@ 2021-09-14  3:11     ` yukuai (C)
  2021-09-14  6:44       ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: yukuai (C) @ 2021-09-14  3:11 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On 2021/09/14 9:11, Ming Lei wrote:
> On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
>> blk_mq_tag_to_rq() can only ensure to return valid request in
>> following situation:
>>
>> 1) client send request message to server first
>> submit_bio
>> ...
>>   blk_mq_get_tag
>>   ...
>>   blk_mq_get_driver_tag
>>   ...
>>   nbd_queue_rq
>>    nbd_handle_cmd
>>     nbd_send_cmd
>>
>> 2) client receive respond message from server
>> recv_work
>>   nbd_read_stat
>>    blk_mq_tag_to_rq
>>
>> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
>> request, which might be freed. Thus convert to use
>> blk_mq_find_and_get_req() to make sure the returned request is not
>> freed.
> 
> But NBD_CMD_INFLIGHT has been added for checking if the reply is
> expected, do we still need blk_mq_find_and_get_req() for covering
> this issue? BTW, request and its payload is pre-allocated, so there
> isn't real use-after-free.

Hi, Ming

Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
not the other way round.

nbd_read_stat
  req = blk_mq_tag_to_rq()
  cmd = blk_mq_rq_to_pdu(req)
  mutex_lock(cmd->lock)
  checking NBD_CMD_INFLIGHT

The checking doesn't have any effect on blk_mq_tag_to_rq().

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 2/6] nbd: make sure request completion won't concurrent
  2021-09-14  0:57   ` Ming Lei
@ 2021-09-14  3:11     ` yukuai (C)
  0 siblings, 0 replies; 24+ messages in thread
From: yukuai (C) @ 2021-09-14  3:11 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On 2021/09/14 8:57, Ming Lei wrote:
> On Thu, Sep 09, 2021 at 10:12:52PM +0800, Yu Kuai wrote:
>> commit cddce0116058 ("nbd: Aovid double completion of a request")
>> try to fix that nbd_clear_que() and recv_work() can complete a
>> request concurrently. However, the problem still exists:
>>
>> t1                    t2                     t3
>>
>> nbd_disconnect_and_put
>>   flush_workqueue
>>                        recv_work
>>                         blk_mq_complete_request
>>                          blk_mq_complete_request_remote -> this is true
>>                           WRITE_ONCE(rq->state, MQ_RQ_COMPLETE)
>>                            blk_mq_raise_softirq
>>                                               blk_done_softirq
>>                                                blk_complete_reqs
>>                                                 nbd_complete_rq
>>                                                  blk_mq_end_request
>>                                                   blk_mq_free_request
>>                                                    WRITE_ONCE(rq->state, MQ_RQ_IDLE)
>>    nbd_clear_que
>>     blk_mq_tagset_busy_iter
>>      nbd_clear_req
>>                                                     __blk_mq_free_request
>>                                                      blk_mq_put_tag
>>       blk_mq_complete_request -> complete again
>>
>> There are three places where request can be completed in nbd:
>> recv_work(), nbd_clear_que() and nbd_xmit_timeout(). Since they
>> all hold cmd->lock before completing the request, it's easy to
>> avoid the problem by setting and checking a cmd flag.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/block/nbd.c | 11 +++++++++--
>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> index 04861b585b62..550c8dc438ac 100644
>> --- a/drivers/block/nbd.c
>> +++ b/drivers/block/nbd.c
>> @@ -406,7 +406,11 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
>>   	if (!mutex_trylock(&cmd->lock))
>>   		return BLK_EH_RESET_TIMER;
>>   
>> -	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
>> +	if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
>> +		mutex_unlock(&cmd->lock);
>> +		return BLK_EH_DONE;
>> +	}
>> +
>>   	if (!refcount_inc_not_zero(&nbd->config_refs)) {
>>   		cmd->status = BLK_STS_TIMEOUT;
>>   		mutex_unlock(&cmd->lock);
>> @@ -842,7 +846,10 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved)
>>   
>>   	mutex_lock(&cmd->lock);
>>   	cmd->status = BLK_STS_IOERR;
>> -	__clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
>> +	if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
>> +		mutex_unlock(&cmd->lock);
>> +		return true;
>> +	}
>>   	mutex_unlock(&cmd->lock);
> 
> If this request has completed from other code paths, ->status shouldn't be
> updated here, maybe it is done successfully.

Hi, Ming

Will change this in next iteration.

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-14  3:11     ` yukuai (C)
@ 2021-09-14  6:44       ` Ming Lei
  2021-09-14  7:13         ` yukuai (C)
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2021-09-14  6:44 UTC (permalink / raw)
  To: yukuai (C); +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
> On 2021/09/14 9:11, Ming Lei wrote:
> > On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
> > > blk_mq_tag_to_rq() can only ensure to return valid request in
> > > following situation:
> > > 
> > > 1) client send request message to server first
> > > submit_bio
> > > ...
> > >   blk_mq_get_tag
> > >   ...
> > >   blk_mq_get_driver_tag
> > >   ...
> > >   nbd_queue_rq
> > >    nbd_handle_cmd
> > >     nbd_send_cmd
> > > 
> > > 2) client receive respond message from server
> > > recv_work
> > >   nbd_read_stat
> > >    blk_mq_tag_to_rq
> > > 
> > > If step 1) is missing, blk_mq_tag_to_rq() will return a stale
> > > request, which might be freed. Thus convert to use
> > > blk_mq_find_and_get_req() to make sure the returned request is not
> > > freed.
> > 
> > But NBD_CMD_INFLIGHT has been added for checking if the reply is
> > expected, do we still need blk_mq_find_and_get_req() for covering
> > this issue? BTW, request and its payload is pre-allocated, so there
> > isn't real use-after-free.
> 
> Hi, Ming
> 
> Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
> not the other way round.
> 
> nbd_read_stat
>  req = blk_mq_tag_to_rq()
>  cmd = blk_mq_rq_to_pdu(req)
>  mutex_lock(cmd->lock)
>  checking NBD_CMD_INFLIGHT

Request and its payload is pre-allocated, and either req->ref or cmd->lock can
serve the same purpose here. Once cmd->lock is held, you can check if the cmd is
inflight or not. If it isn't inflight, just return -ENOENT. Is there any
problem to handle in this way?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-14  6:44       ` Ming Lei
@ 2021-09-14  7:13         ` yukuai (C)
  2021-09-14  7:46           ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: yukuai (C) @ 2021-09-14  7:13 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On 2021/09/14 14:44, Ming Lei wrote:
> On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
>> On 2021/09/14 9:11, Ming Lei wrote:
>>> On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
>>>> blk_mq_tag_to_rq() can only ensure to return valid request in
>>>> following situation:
>>>>
>>>> 1) client send request message to server first
>>>> submit_bio
>>>> ...
>>>>    blk_mq_get_tag
>>>>    ...
>>>>    blk_mq_get_driver_tag
>>>>    ...
>>>>    nbd_queue_rq
>>>>     nbd_handle_cmd
>>>>      nbd_send_cmd
>>>>
>>>> 2) client receive respond message from server
>>>> recv_work
>>>>    nbd_read_stat
>>>>     blk_mq_tag_to_rq
>>>>
>>>> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
>>>> request, which might be freed. Thus convert to use
>>>> blk_mq_find_and_get_req() to make sure the returned request is not
>>>> freed.
>>>
>>> But NBD_CMD_INFLIGHT has been added for checking if the reply is
>>> expected, do we still need blk_mq_find_and_get_req() for covering
>>> this issue? BTW, request and its payload is pre-allocated, so there
>>> isn't real use-after-free.
>>
>> Hi, Ming
>>
>> Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
>> not the other way round.
>>
>> nbd_read_stat
>>   req = blk_mq_tag_to_rq()
>>   cmd = blk_mq_rq_to_pdu(req)
>>   mutex_lock(cmd->lock)
>>   checking NBD_CMD_INFLIGHT
> 
> Request and its payload is pre-allocated, and either req->ref or cmd->lock can
> serve the same purpose here. Once cmd->lock is held, you can check if the cmd is
> inflight or not. If it isn't inflight, just return -ENOENT. Is there any
> problem to handle in this way?

Hi, Ming

in nbd_read_stat:

1) get a request by tag first
2) get nbd_cmd by the request
3) hold cmd->lock and check if cmd is inflight

If we want to check if the cmd is inflight in step 3), we have to do
setp 1) and 2) first. As I explained in patch 0, blk_mq_tag_to_rq()
can't make sure the returned request is not freed:

nbd_read_stat
			blk_mq_sched_free_requests
			 blk_mq_free_rqs
   blk_mq_tag_to_rq
   -> get rq before clear mapping
			  blk_mq_clear_rq_mapping
			  __free_pages -> rq is freed
   blk_mq_request_started -> UAF


Thanks,
Kuai





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-14  7:13         ` yukuai (C)
@ 2021-09-14  7:46           ` Ming Lei
  2021-09-14  9:08             ` yukuai (C)
  2021-09-14  9:19             ` yukuai (C)
  0 siblings, 2 replies; 24+ messages in thread
From: Ming Lei @ 2021-09-14  7:46 UTC (permalink / raw)
  To: yukuai (C); +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On Tue, Sep 14, 2021 at 03:13:38PM +0800, yukuai (C) wrote:
> On 2021/09/14 14:44, Ming Lei wrote:
> > On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
> > > On 2021/09/14 9:11, Ming Lei wrote:
> > > > On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
> > > > > blk_mq_tag_to_rq() can only ensure to return valid request in
> > > > > following situation:
> > > > > 
> > > > > 1) client send request message to server first
> > > > > submit_bio
> > > > > ...
> > > > >    blk_mq_get_tag
> > > > >    ...
> > > > >    blk_mq_get_driver_tag
> > > > >    ...
> > > > >    nbd_queue_rq
> > > > >     nbd_handle_cmd
> > > > >      nbd_send_cmd
> > > > > 
> > > > > 2) client receive respond message from server
> > > > > recv_work
> > > > >    nbd_read_stat
> > > > >     blk_mq_tag_to_rq
> > > > > 
> > > > > If step 1) is missing, blk_mq_tag_to_rq() will return a stale
> > > > > request, which might be freed. Thus convert to use
> > > > > blk_mq_find_and_get_req() to make sure the returned request is not
> > > > > freed.
> > > > 
> > > > But NBD_CMD_INFLIGHT has been added for checking if the reply is
> > > > expected, do we still need blk_mq_find_and_get_req() for covering
> > > > this issue? BTW, request and its payload is pre-allocated, so there
> > > > isn't real use-after-free.
> > > 
> > > Hi, Ming
> > > 
> > > Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
> > > not the other way round.
> > > 
> > > nbd_read_stat
> > >   req = blk_mq_tag_to_rq()
> > >   cmd = blk_mq_rq_to_pdu(req)
> > >   mutex_lock(cmd->lock)
> > >   checking NBD_CMD_INFLIGHT
> > 
> > Request and its payload is pre-allocated, and either req->ref or cmd->lock can
> > serve the same purpose here. Once cmd->lock is held, you can check if the cmd is
> > inflight or not. If it isn't inflight, just return -ENOENT. Is there any
> > problem to handle in this way?
> 
> Hi, Ming
> 
> in nbd_read_stat:
> 
> 1) get a request by tag first
> 2) get nbd_cmd by the request
> 3) hold cmd->lock and check if cmd is inflight
> 
> If we want to check if the cmd is inflight in step 3), we have to do
> setp 1) and 2) first. As I explained in patch 0, blk_mq_tag_to_rq()
> can't make sure the returned request is not freed:
> 
> nbd_read_stat
> 			blk_mq_sched_free_requests
> 			 blk_mq_free_rqs
>   blk_mq_tag_to_rq
>   -> get rq before clear mapping
> 			  blk_mq_clear_rq_mapping
> 			  __free_pages -> rq is freed
>   blk_mq_request_started -> UAF

If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
wondering why not take the following simpler way for avoiding the UAF?

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5170a630778d..dfa5cce71f66 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
 						     work);
 	struct nbd_device *nbd = args->nbd;
 	struct nbd_config *config = nbd->config;
+	struct request_queue *q = nbd->disk->queue;
 	struct nbd_cmd *cmd;
 	struct request *rq;
 
+	if (!percpu_ref_tryget(&q->q_usage_counter))
+                return;
+
 	while (1) {
 		cmd = nbd_read_stat(nbd, args->index);
 		if (IS_ERR(cmd)) {
@@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
 		if (likely(!blk_should_fake_timeout(rq->q)))
 			blk_mq_complete_request(rq);
 	}
+	blk_queue_exit(q);
 	nbd_config_put(nbd);
 	atomic_dec(&config->recv_threads);
 	wake_up(&config->recv_wq);

Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-14  7:46           ` Ming Lei
@ 2021-09-14  9:08             ` yukuai (C)
  2021-09-14  9:12               ` yukuai (C)
  2021-09-14 14:33               ` Ming Lei
  2021-09-14  9:19             ` yukuai (C)
  1 sibling, 2 replies; 24+ messages in thread
From: yukuai (C) @ 2021-09-14  9:08 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On 2021/09/14 15:46, Ming Lei wrote:
> On Tue, Sep 14, 2021 at 03:13:38PM +0800, yukuai (C) wrote:
>> On 2021/09/14 14:44, Ming Lei wrote:
>>> On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
>>>> On 2021/09/14 9:11, Ming Lei wrote:
>>>>> On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
>>>>>> blk_mq_tag_to_rq() can only ensure to return valid request in
>>>>>> following situation:
>>>>>>
>>>>>> 1) client send request message to server first
>>>>>> submit_bio
>>>>>> ...
>>>>>>     blk_mq_get_tag
>>>>>>     ...
>>>>>>     blk_mq_get_driver_tag
>>>>>>     ...
>>>>>>     nbd_queue_rq
>>>>>>      nbd_handle_cmd
>>>>>>       nbd_send_cmd
>>>>>>
>>>>>> 2) client receive respond message from server
>>>>>> recv_work
>>>>>>     nbd_read_stat
>>>>>>      blk_mq_tag_to_rq
>>>>>>
>>>>>> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
>>>>>> request, which might be freed. Thus convert to use
>>>>>> blk_mq_find_and_get_req() to make sure the returned request is not
>>>>>> freed.
>>>>>
>>>>> But NBD_CMD_INFLIGHT has been added for checking if the reply is
>>>>> expected, do we still need blk_mq_find_and_get_req() for covering
>>>>> this issue? BTW, request and its payload is pre-allocated, so there
>>>>> isn't real use-after-free.
>>>>
>>>> Hi, Ming
>>>>
>>>> Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
>>>> not the other way round.
>>>>
>>>> nbd_read_stat
>>>>    req = blk_mq_tag_to_rq()
>>>>    cmd = blk_mq_rq_to_pdu(req)
>>>>    mutex_lock(cmd->lock)
>>>>    checking NBD_CMD_INFLIGHT
>>>
>>> Request and its payload is pre-allocated, and either req->ref or cmd->lock can
>>> serve the same purpose here. Once cmd->lock is held, you can check if the cmd is
>>> inflight or not. If it isn't inflight, just return -ENOENT. Is there any
>>> problem to handle in this way?
>>
>> Hi, Ming
>>
>> in nbd_read_stat:
>>
>> 1) get a request by tag first
>> 2) get nbd_cmd by the request
>> 3) hold cmd->lock and check if cmd is inflight
>>
>> If we want to check if the cmd is inflight in step 3), we have to do
>> setp 1) and 2) first. As I explained in patch 0, blk_mq_tag_to_rq()
>> can't make sure the returned request is not freed:
>>
>> nbd_read_stat
>> 			blk_mq_sched_free_requests
>> 			 blk_mq_free_rqs
>>    blk_mq_tag_to_rq
>>    -> get rq before clear mapping
>> 			  blk_mq_clear_rq_mapping
>> 			  __free_pages -> rq is freed
>>    blk_mq_request_started -> UAF
> 
> If the above can happen, blk_mq_find_and_get_req() may not fix it too, just

Hi, Ming

Why can't blk_mq_find_and_get_req() fix it? I can't think of any
scenario that might have problem currently.

> wondering why not take the following simpler way for avoiding the UAF?
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 5170a630778d..dfa5cce71f66 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
>   						     work);
>   	struct nbd_device *nbd = args->nbd;
>   	struct nbd_config *config = nbd->config;
> +	struct request_queue *q = nbd->disk->queue;
>   	struct nbd_cmd *cmd;
>   	struct request *rq;
>   
> +	if (!percpu_ref_tryget(&q->q_usage_counter))
> +                return;
> +

We can't make sure freeze_queue is called before this, thus this approch
can't fix the problem, right?
  nbd_read_stat
     blk_mq_tag_to_rq
			elevator_switch
			 blk_mq_freeze_queue(q);
			 elevator_switch_mq
			  elevator_exit
			   blk_mq_sched_free_requests
     blk_mq_request_started -> UAF

Thanks,
Kuai

>   	while (1) {
>   		cmd = nbd_read_stat(nbd, args->index);
>   		if (IS_ERR(cmd)) {
> @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
>   		if (likely(!blk_should_fake_timeout(rq->q)))
>   			blk_mq_complete_request(rq);
>   	}
> +	blk_queue_exit(q);
>   	nbd_config_put(nbd);
>   	atomic_dec(&config->recv_threads);
>   	wake_up(&config->recv_wq);
> 
> Thanks,
> Ming
> 
> .
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-14  9:08             ` yukuai (C)
@ 2021-09-14  9:12               ` yukuai (C)
  2021-09-14 14:33               ` Ming Lei
  1 sibling, 0 replies; 24+ messages in thread
From: yukuai (C) @ 2021-09-14  9:12 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On 2021/09/14 17:08, yukuai (C) wrote:
> On 2021/09/14 15:46, Ming Lei wrote:
>> On Tue, Sep 14, 2021 at 03:13:38PM +0800, yukuai (C) wrote:
>>> On 2021/09/14 14:44, Ming Lei wrote:
>>>> On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
>>>>> On 2021/09/14 9:11, Ming Lei wrote:
>>>>>> On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
>>>>>>> blk_mq_tag_to_rq() can only ensure to return valid request in
>>>>>>> following situation:
>>>>>>>
>>>>>>> 1) client send request message to server first
>>>>>>> submit_bio
>>>>>>> ...
>>>>>>>     blk_mq_get_tag
>>>>>>>     ...
>>>>>>>     blk_mq_get_driver_tag
>>>>>>>     ...
>>>>>>>     nbd_queue_rq
>>>>>>>      nbd_handle_cmd
>>>>>>>       nbd_send_cmd
>>>>>>>
>>>>>>> 2) client receive respond message from server
>>>>>>> recv_work
>>>>>>>     nbd_read_stat
>>>>>>>      blk_mq_tag_to_rq
>>>>>>>
>>>>>>> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
>>>>>>> request, which might be freed. Thus convert to use
>>>>>>> blk_mq_find_and_get_req() to make sure the returned request is not
>>>>>>> freed.
>>>>>>
>>>>>> But NBD_CMD_INFLIGHT has been added for checking if the reply is
>>>>>> expected, do we still need blk_mq_find_and_get_req() for covering
>>>>>> this issue? BTW, request and its payload is pre-allocated, so there
>>>>>> isn't real use-after-free.
>>>>>
>>>>> Hi, Ming
>>>>>
>>>>> Checking NBD_CMD_INFLIGHT relied on the request founded by tag is 
>>>>> valid,
>>>>> not the other way round.
>>>>>
>>>>> nbd_read_stat
>>>>>    req = blk_mq_tag_to_rq()
>>>>>    cmd = blk_mq_rq_to_pdu(req)
>>>>>    mutex_lock(cmd->lock)
>>>>>    checking NBD_CMD_INFLIGHT
>>>>
>>>> Request and its payload is pre-allocated, and either req->ref or 
>>>> cmd->lock can
>>>> serve the same purpose here. Once cmd->lock is held, you can check 
>>>> if the cmd is
>>>> inflight or not. If it isn't inflight, just return -ENOENT. Is there 
>>>> any
>>>> problem to handle in this way?
>>>
>>> Hi, Ming
>>>
>>> in nbd_read_stat:
>>>
>>> 1) get a request by tag first
>>> 2) get nbd_cmd by the request
>>> 3) hold cmd->lock and check if cmd is inflight
>>>
>>> If we want to check if the cmd is inflight in step 3), we have to do
>>> setp 1) and 2) first. As I explained in patch 0, blk_mq_tag_to_rq()
>>> can't make sure the returned request is not freed:
>>>
>>> nbd_read_stat
>>>             blk_mq_sched_free_requests
>>>              blk_mq_free_rqs
>>>    blk_mq_tag_to_rq
>>>    -> get rq before clear mapping
>>>               blk_mq_clear_rq_mapping
>>>               __free_pages -> rq is freed
>>>    blk_mq_request_started -> UAF
>>
>> If the above can happen, blk_mq_find_and_get_req() may not fix it too, 
>> just
> 
> Hi, Ming
> 
> Why can't blk_mq_find_and_get_req() fix it? I can't think of any
> scenario that might have problem currently.
> 
>> wondering why not take the following simpler way for avoiding the UAF?
>>
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> index 5170a630778d..dfa5cce71f66 100644
>> --- a/drivers/block/nbd.c
>> +++ b/drivers/block/nbd.c
>> @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
>>                                work);
>>       struct nbd_device *nbd = args->nbd;
>>       struct nbd_config *config = nbd->config;
>> +    struct request_queue *q = nbd->disk->queue;
>>       struct nbd_cmd *cmd;
>>       struct request *rq;
>> +    if (!percpu_ref_tryget(&q->q_usage_counter))
>> +                return;
>> +
> 
> We can't make sure freeze_queue is called before this, thus this approch
> can't fix the problem, right?
>   nbd_read_stat
>      blk_mq_tag_to_rq
>              elevator_switch
>               blk_mq_freeze_queue(q);
>               elevator_switch_mq
>                elevator_exit
>                 blk_mq_sched_free_requests
>      blk_mq_request_started -> UAF

Hi, Ming

I forgot that if percpu_ref_tryget succeed here, blk_mq_free_queue()
will block untill blk_queue_exit() in nbd_read_stat().

Thanks,
Kuai
> 
> Thanks,
> Kuai
> 
>>       while (1) {
>>           cmd = nbd_read_stat(nbd, args->index);
>>           if (IS_ERR(cmd)) {
>> @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
>>           if (likely(!blk_should_fake_timeout(rq->q)))
>>               blk_mq_complete_request(rq);
>>       }
>> +    blk_queue_exit(q);
>>       nbd_config_put(nbd);
>>       atomic_dec(&config->recv_threads);
>>       wake_up(&config->recv_wq);
>>
>> Thanks,
>> Ming
>>
>> .
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-14  7:46           ` Ming Lei
  2021-09-14  9:08             ` yukuai (C)
@ 2021-09-14  9:19             ` yukuai (C)
  2021-09-14 14:37               ` Ming Lei
  1 sibling, 1 reply; 24+ messages in thread
From: yukuai (C) @ 2021-09-14  9:19 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On 在 2021/09/14 15:46, Ming Lei wrote:

> If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
> wondering why not take the following simpler way for avoiding the UAF?
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 5170a630778d..dfa5cce71f66 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
>   						     work);
>   	struct nbd_device *nbd = args->nbd;
>   	struct nbd_config *config = nbd->config;
> +	struct request_queue *q = nbd->disk->queue;
>   	struct nbd_cmd *cmd;
>   	struct request *rq;
>   
> +	if (!percpu_ref_tryget(&q->q_usage_counter))
> +                return;
> +
>   	while (1) {
>   		cmd = nbd_read_stat(nbd, args->index);
>   		if (IS_ERR(cmd)) {
> @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
>   		if (likely(!blk_should_fake_timeout(rq->q)))
>   			blk_mq_complete_request(rq);
>   	}
> +	blk_queue_exit(q);
>   	nbd_config_put(nbd);
>   	atomic_dec(&config->recv_threads);
>   	wake_up(&config->recv_wq);
> 

Hi, Ming

This apporch is wrong.

If blk_mq_freeze_queue() is called, and nbd is waiting for all
request to complete. percpu_ref_tryget() will fail here, and deadlock
will occur because request can't complete in recv_work().

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-14  9:08             ` yukuai (C)
  2021-09-14  9:12               ` yukuai (C)
@ 2021-09-14 14:33               ` Ming Lei
  1 sibling, 0 replies; 24+ messages in thread
From: Ming Lei @ 2021-09-14 14:33 UTC (permalink / raw)
  To: yukuai (C); +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On Tue, Sep 14, 2021 at 05:08:00PM +0800, yukuai (C) wrote:
> On 2021/09/14 15:46, Ming Lei wrote:
> > On Tue, Sep 14, 2021 at 03:13:38PM +0800, yukuai (C) wrote:
> > > On 2021/09/14 14:44, Ming Lei wrote:
> > > > On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
> > > > > On 2021/09/14 9:11, Ming Lei wrote:
> > > > > > On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
> > > > > > > blk_mq_tag_to_rq() can only ensure to return valid request in
> > > > > > > following situation:
> > > > > > > 
> > > > > > > 1) client send request message to server first
> > > > > > > submit_bio
> > > > > > > ...
> > > > > > >     blk_mq_get_tag
> > > > > > >     ...
> > > > > > >     blk_mq_get_driver_tag
> > > > > > >     ...
> > > > > > >     nbd_queue_rq
> > > > > > >      nbd_handle_cmd
> > > > > > >       nbd_send_cmd
> > > > > > > 
> > > > > > > 2) client receive respond message from server
> > > > > > > recv_work
> > > > > > >     nbd_read_stat
> > > > > > >      blk_mq_tag_to_rq
> > > > > > > 
> > > > > > > If step 1) is missing, blk_mq_tag_to_rq() will return a stale
> > > > > > > request, which might be freed. Thus convert to use
> > > > > > > blk_mq_find_and_get_req() to make sure the returned request is not
> > > > > > > freed.
> > > > > > 
> > > > > > But NBD_CMD_INFLIGHT has been added for checking if the reply is
> > > > > > expected, do we still need blk_mq_find_and_get_req() for covering
> > > > > > this issue? BTW, request and its payload is pre-allocated, so there
> > > > > > isn't real use-after-free.
> > > > > 
> > > > > Hi, Ming
> > > > > 
> > > > > Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
> > > > > not the other way round.
> > > > > 
> > > > > nbd_read_stat
> > > > >    req = blk_mq_tag_to_rq()
> > > > >    cmd = blk_mq_rq_to_pdu(req)
> > > > >    mutex_lock(cmd->lock)
> > > > >    checking NBD_CMD_INFLIGHT
> > > > 
> > > > Request and its payload is pre-allocated, and either req->ref or cmd->lock can
> > > > serve the same purpose here. Once cmd->lock is held, you can check if the cmd is
> > > > inflight or not. If it isn't inflight, just return -ENOENT. Is there any
> > > > problem to handle in this way?
> > > 
> > > Hi, Ming
> > > 
> > > in nbd_read_stat:
> > > 
> > > 1) get a request by tag first
> > > 2) get nbd_cmd by the request
> > > 3) hold cmd->lock and check if cmd is inflight
> > > 
> > > If we want to check if the cmd is inflight in step 3), we have to do
> > > setp 1) and 2) first. As I explained in patch 0, blk_mq_tag_to_rq()
> > > can't make sure the returned request is not freed:
> > > 
> > > nbd_read_stat
> > > 			blk_mq_sched_free_requests
> > > 			 blk_mq_free_rqs
> > >    blk_mq_tag_to_rq
> > >    -> get rq before clear mapping
> > > 			  blk_mq_clear_rq_mapping
> > > 			  __free_pages -> rq is freed
> > >    blk_mq_request_started -> UAF
> > 
> > If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
> 
> Hi, Ming
> 
> Why can't blk_mq_find_and_get_req() fix it? I can't think of any
> scenario that might have problem currently.

The principle behind blk_mq_find_and_get_req() is that if one request's
ref is grabbed, the queue's usage counter is guaranteed to be grabbed,
and this way isn't straight-forward.

Yeah, it can fix the issue, but I don't think it is good to call it in
fast path cause tags->lock is required.

> 
> > wondering why not take the following simpler way for avoiding the UAF?
> > 
> > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > index 5170a630778d..dfa5cce71f66 100644
> > --- a/drivers/block/nbd.c
> > +++ b/drivers/block/nbd.c
> > @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
> >   						     work);
> >   	struct nbd_device *nbd = args->nbd;
> >   	struct nbd_config *config = nbd->config;
> > +	struct request_queue *q = nbd->disk->queue;
> >   	struct nbd_cmd *cmd;
> >   	struct request *rq;
> > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > +                return;
> > +
> 
> We can't make sure freeze_queue is called before this, thus this approch
> can't fix the problem, right?
>  nbd_read_stat
>     blk_mq_tag_to_rq
> 			elevator_switch
> 			 blk_mq_freeze_queue(q);
> 			 elevator_switch_mq
> 			  elevator_exit
> 			   blk_mq_sched_free_requests
>     blk_mq_request_started -> UAF

No, blk_mq_freeze_queue() waits until .q_usage_counter becomes zero, so
there won't be any concurrent nbd_read_stat() during switching elevator
if ->q_usage_counter is grabbed in recv_work().

Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-14  9:19             ` yukuai (C)
@ 2021-09-14 14:37               ` Ming Lei
  2021-09-15  1:54                 ` yukuai (C)
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2021-09-14 14:37 UTC (permalink / raw)
  To: yukuai (C); +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On Tue, Sep 14, 2021 at 05:19:31PM +0800, yukuai (C) wrote:
> On 在 2021/09/14 15:46, Ming Lei wrote:
> 
> > If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
> > wondering why not take the following simpler way for avoiding the UAF?
> > 
> > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > index 5170a630778d..dfa5cce71f66 100644
> > --- a/drivers/block/nbd.c
> > +++ b/drivers/block/nbd.c
> > @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
> >   						     work);
> >   	struct nbd_device *nbd = args->nbd;
> >   	struct nbd_config *config = nbd->config;
> > +	struct request_queue *q = nbd->disk->queue;
> >   	struct nbd_cmd *cmd;
> >   	struct request *rq;
> > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > +                return;
> > +
> >   	while (1) {
> >   		cmd = nbd_read_stat(nbd, args->index);
> >   		if (IS_ERR(cmd)) {
> > @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
> >   		if (likely(!blk_should_fake_timeout(rq->q)))
> >   			blk_mq_complete_request(rq);
> >   	}
> > +	blk_queue_exit(q);
> >   	nbd_config_put(nbd);
> >   	atomic_dec(&config->recv_threads);
> >   	wake_up(&config->recv_wq);
> > 
> 
> Hi, Ming
> 
> This apporch is wrong.
> 
> If blk_mq_freeze_queue() is called, and nbd is waiting for all
> request to complete. percpu_ref_tryget() will fail here, and deadlock
> will occur because request can't complete in recv_work().

No, percpu_ref_tryget() won't fail until ->q_usage_counter is zero, when
it is perfectly fine to do nothing in recv_work().

Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-14 14:37               ` Ming Lei
@ 2021-09-15  1:54                 ` yukuai (C)
  2021-09-15  3:16                   ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: yukuai (C) @ 2021-09-15  1:54 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On 2021/09/14 22:37, Ming Lei wrote:
> On Tue, Sep 14, 2021 at 05:19:31PM +0800, yukuai (C) wrote:
>> On 在 2021/09/14 15:46, Ming Lei wrote:
>>
>>> If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
>>> wondering why not take the following simpler way for avoiding the UAF?
>>>
>>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>>> index 5170a630778d..dfa5cce71f66 100644
>>> --- a/drivers/block/nbd.c
>>> +++ b/drivers/block/nbd.c
>>> @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
>>>    						     work);
>>>    	struct nbd_device *nbd = args->nbd;
>>>    	struct nbd_config *config = nbd->config;
>>> +	struct request_queue *q = nbd->disk->queue;
>>>    	struct nbd_cmd *cmd;
>>>    	struct request *rq;
>>> +	if (!percpu_ref_tryget(&q->q_usage_counter))
>>> +                return;
>>> +
>>>    	while (1) {
>>>    		cmd = nbd_read_stat(nbd, args->index);
>>>    		if (IS_ERR(cmd)) {
>>> @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
>>>    		if (likely(!blk_should_fake_timeout(rq->q)))
>>>    			blk_mq_complete_request(rq);
>>>    	}
>>> +	blk_queue_exit(q);
>>>    	nbd_config_put(nbd);
>>>    	atomic_dec(&config->recv_threads);
>>>    	wake_up(&config->recv_wq);
>>>
>>
>> Hi, Ming
>>
>> This apporch is wrong.
>>
>> If blk_mq_freeze_queue() is called, and nbd is waiting for all
>> request to complete. percpu_ref_tryget() will fail here, and deadlock
>> will occur because request can't complete in recv_work().
> 
> No, percpu_ref_tryget() won't fail until ->q_usage_counter is zero, when
> it is perfectly fine to do nothing in recv_work().
> 

Hi Ming

This apporch is a good idea, however we should not get q_usage_counter
in reccv_work(), because It will block freeze queue.

How about get q_usage_counter in nbd_read_stat(), and put in error path
or after request completion?

Thanks
Kuai

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-15  1:54                 ` yukuai (C)
@ 2021-09-15  3:16                   ` Ming Lei
  2021-09-15  3:36                     ` yukuai (C)
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2021-09-15  3:16 UTC (permalink / raw)
  To: yukuai (C); +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On Wed, Sep 15, 2021 at 09:54:09AM +0800, yukuai (C) wrote:
> On 2021/09/14 22:37, Ming Lei wrote:
> > On Tue, Sep 14, 2021 at 05:19:31PM +0800, yukuai (C) wrote:
> > > On 在 2021/09/14 15:46, Ming Lei wrote:
> > > 
> > > > If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
> > > > wondering why not take the following simpler way for avoiding the UAF?
> > > > 
> > > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > > > index 5170a630778d..dfa5cce71f66 100644
> > > > --- a/drivers/block/nbd.c
> > > > +++ b/drivers/block/nbd.c
> > > > @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
> > > >    						     work);
> > > >    	struct nbd_device *nbd = args->nbd;
> > > >    	struct nbd_config *config = nbd->config;
> > > > +	struct request_queue *q = nbd->disk->queue;
> > > >    	struct nbd_cmd *cmd;
> > > >    	struct request *rq;
> > > > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > > > +                return;
> > > > +
> > > >    	while (1) {
> > > >    		cmd = nbd_read_stat(nbd, args->index);
> > > >    		if (IS_ERR(cmd)) {
> > > > @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
> > > >    		if (likely(!blk_should_fake_timeout(rq->q)))
> > > >    			blk_mq_complete_request(rq);
> > > >    	}
> > > > +	blk_queue_exit(q);
> > > >    	nbd_config_put(nbd);
> > > >    	atomic_dec(&config->recv_threads);
> > > >    	wake_up(&config->recv_wq);
> > > > 
> > > 
> > > Hi, Ming
> > > 
> > > This apporch is wrong.
> > > 
> > > If blk_mq_freeze_queue() is called, and nbd is waiting for all
> > > request to complete. percpu_ref_tryget() will fail here, and deadlock
> > > will occur because request can't complete in recv_work().
> > 
> > No, percpu_ref_tryget() won't fail until ->q_usage_counter is zero, when
> > it is perfectly fine to do nothing in recv_work().
> > 
> 
> Hi Ming
> 
> This apporch is a good idea, however we should not get q_usage_counter
> in reccv_work(), because It will block freeze queue.
> 
> How about get q_usage_counter in nbd_read_stat(), and put in error path
> or after request completion?

OK, looks I missed that nbd_read_stat() needs to wait for incoming reply
first, so how about the following change by partitioning nbd_read_stat()
into nbd_read_reply() and nbd_handle_reply()?

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5170a630778d..477fe057fc93 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -683,38 +683,47 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	return 0;
 }
 
-/* NULL returned = something went wrong, inform userspace */
-static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
+static int nbd_read_reply(struct nbd_device *nbd, int index,
+		struct nbd_reply *reply)
 {
-	struct nbd_config *config = nbd->config;
 	int result;
-	struct nbd_reply reply;
-	struct nbd_cmd *cmd;
-	struct request *req = NULL;
-	u64 handle;
-	u16 hwq;
-	u32 tag;
-	struct kvec iov = {.iov_base = &reply, .iov_len = sizeof(reply)};
+	struct kvec iov = {.iov_base = reply, .iov_len = sizeof(*reply)};
 	struct iov_iter to;
-	int ret = 0;
 
-	reply.magic = 0;
+	reply->magic = 0;
 	iov_iter_kvec(&to, READ, &iov, 1, sizeof(reply));
 	result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL);
-	if (result <= 0) {
-		if (!nbd_disconnected(config))
+	if (result < 0) {
+		if (!nbd_disconnected(nbd->config))
 			dev_err(disk_to_dev(nbd->disk),
 				"Receive control failed (result %d)\n", result);
-		return ERR_PTR(result);
+		return result;
 	}
 
-	if (ntohl(reply.magic) != NBD_REPLY_MAGIC) {
+	if (ntohl(reply->magic) != NBD_REPLY_MAGIC) {
 		dev_err(disk_to_dev(nbd->disk), "Wrong magic (0x%lx)\n",
-				(unsigned long)ntohl(reply.magic));
-		return ERR_PTR(-EPROTO);
+				(unsigned long)ntohl(reply->magic));
+		return -EPROTO;
 	}
 
-	memcpy(&handle, reply.handle, sizeof(handle));
+	return 0;
+}
+
+/* NULL returned = something went wrong, inform userspace */
+static struct nbd_cmd *nbd_handle_reply(struct nbd_device *nbd, int index,
+		struct nbd_reply *reply)
+{
+	struct nbd_config *config = nbd->config;
+	int result;
+	struct nbd_cmd *cmd;
+	struct request *req = NULL;
+	u64 handle;
+	u16 hwq;
+	u32 tag;
+	struct iov_iter to;
+	int ret = 0;
+
+	memcpy(&handle, reply->handle, sizeof(handle));
 	tag = nbd_handle_to_tag(handle);
 	hwq = blk_mq_unique_tag_to_hwq(tag);
 	if (hwq < nbd->tag_set.nr_hw_queues)
@@ -747,9 +756,9 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 		ret = -ENOENT;
 		goto out;
 	}
-	if (ntohl(reply.error)) {
+	if (ntohl(reply->error)) {
 		dev_err(disk_to_dev(nbd->disk), "Other side returned error (%d)\n",
-			ntohl(reply.error));
+			ntohl(reply->error));
 		cmd->status = BLK_STS_IOERR;
 		goto out;
 	}
@@ -795,24 +804,36 @@ static void recv_work(struct work_struct *work)
 						     work);
 	struct nbd_device *nbd = args->nbd;
 	struct nbd_config *config = nbd->config;
+	struct request_queue *q = nbd->disk->queue;
+	struct nbd_sock *nsock;
 	struct nbd_cmd *cmd;
 	struct request *rq;
 
 	while (1) {
-		cmd = nbd_read_stat(nbd, args->index);
-		if (IS_ERR(cmd)) {
-			struct nbd_sock *nsock = config->socks[args->index];
+		struct nbd_reply reply;
 
-			mutex_lock(&nsock->tx_lock);
-			nbd_mark_nsock_dead(nbd, nsock, 1);
-			mutex_unlock(&nsock->tx_lock);
+		if (nbd_read_reply(nbd, args->index, &reply))
 			break;
-		}
 
+		if (!percpu_ref_tryget(&q->q_usage_counter))
+			break;
+
+		cmd = nbd_handle_reply(nbd, args->index, &reply);
+		if (IS_ERR(cmd)) {
+			blk_queue_exit(q);
+			break;
+		}
 		rq = blk_mq_rq_from_pdu(cmd);
 		if (likely(!blk_should_fake_timeout(rq->q)))
 			blk_mq_complete_request(rq);
+		blk_queue_exit(q);
 	}
+
+	nsock = config->socks[args->index];
+	mutex_lock(&nsock->tx_lock);
+	nbd_mark_nsock_dead(nbd, nsock, 1);
+	mutex_unlock(&nsock->tx_lock);
+
 	nbd_config_put(nbd);
 	atomic_dec(&config->recv_threads);
 	wake_up(&config->recv_wq);


Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-15  3:16                   ` Ming Lei
@ 2021-09-15  3:36                     ` yukuai (C)
  2021-09-15  3:46                       ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: yukuai (C) @ 2021-09-15  3:36 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On 2021/09/15 11:16, Ming Lei wrote:
> On Wed, Sep 15, 2021 at 09:54:09AM +0800, yukuai (C) wrote:
>> On 2021/09/14 22:37, Ming Lei wrote:
>>> On Tue, Sep 14, 2021 at 05:19:31PM +0800, yukuai (C) wrote:
>>>> On 在 2021/09/14 15:46, Ming Lei wrote:
>>>>
>>>>> If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
>>>>> wondering why not take the following simpler way for avoiding the UAF?
>>>>>
>>>>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>>>>> index 5170a630778d..dfa5cce71f66 100644
>>>>> --- a/drivers/block/nbd.c
>>>>> +++ b/drivers/block/nbd.c
>>>>> @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
>>>>>     						     work);
>>>>>     	struct nbd_device *nbd = args->nbd;
>>>>>     	struct nbd_config *config = nbd->config;
>>>>> +	struct request_queue *q = nbd->disk->queue;
>>>>>     	struct nbd_cmd *cmd;
>>>>>     	struct request *rq;
>>>>> +	if (!percpu_ref_tryget(&q->q_usage_counter))
>>>>> +                return;
>>>>> +
>>>>>     	while (1) {
>>>>>     		cmd = nbd_read_stat(nbd, args->index);
>>>>>     		if (IS_ERR(cmd)) {
>>>>> @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
>>>>>     		if (likely(!blk_should_fake_timeout(rq->q)))
>>>>>     			blk_mq_complete_request(rq);
>>>>>     	}
>>>>> +	blk_queue_exit(q);
>>>>>     	nbd_config_put(nbd);
>>>>>     	atomic_dec(&config->recv_threads);
>>>>>     	wake_up(&config->recv_wq);
>>>>>
>>>>
>>>> Hi, Ming
>>>>
>>>> This apporch is wrong.
>>>>
>>>> If blk_mq_freeze_queue() is called, and nbd is waiting for all
>>>> request to complete. percpu_ref_tryget() will fail here, and deadlock
>>>> will occur because request can't complete in recv_work().
>>>
>>> No, percpu_ref_tryget() won't fail until ->q_usage_counter is zero, when
>>> it is perfectly fine to do nothing in recv_work().
>>>
>>
>> Hi Ming
>>
>> This apporch is a good idea, however we should not get q_usage_counter
>> in reccv_work(), because It will block freeze queue.
>>
>> How about get q_usage_counter in nbd_read_stat(), and put in error path
>> or after request completion?
> 
> OK, looks I missed that nbd_read_stat() needs to wait for incoming reply
> first, so how about the following change by partitioning nbd_read_stat()
> into nbd_read_reply() and nbd_handle_reply()?

Hi, Ming

The change looks good to me.

Do you want to send a patch to fix this?

Thanks,
Kuai
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 5170a630778d..477fe057fc93 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -683,38 +683,47 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
>   	return 0;
>   }
>   
> -/* NULL returned = something went wrong, inform userspace */
> -static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
> +static int nbd_read_reply(struct nbd_device *nbd, int index,
> +		struct nbd_reply *reply)
>   {
> -	struct nbd_config *config = nbd->config;
>   	int result;
> -	struct nbd_reply reply;
> -	struct nbd_cmd *cmd;
> -	struct request *req = NULL;
> -	u64 handle;
> -	u16 hwq;
> -	u32 tag;
> -	struct kvec iov = {.iov_base = &reply, .iov_len = sizeof(reply)};
> +	struct kvec iov = {.iov_base = reply, .iov_len = sizeof(*reply)};
>   	struct iov_iter to;
> -	int ret = 0;
>   
> -	reply.magic = 0;
> +	reply->magic = 0;
>   	iov_iter_kvec(&to, READ, &iov, 1, sizeof(reply));
>   	result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL);
> -	if (result <= 0) {
> -		if (!nbd_disconnected(config))
> +	if (result < 0) {
> +		if (!nbd_disconnected(nbd->config))
>   			dev_err(disk_to_dev(nbd->disk),
>   				"Receive control failed (result %d)\n", result);
> -		return ERR_PTR(result);
> +		return result;
>   	}
>   
> -	if (ntohl(reply.magic) != NBD_REPLY_MAGIC) {
> +	if (ntohl(reply->magic) != NBD_REPLY_MAGIC) {
>   		dev_err(disk_to_dev(nbd->disk), "Wrong magic (0x%lx)\n",
> -				(unsigned long)ntohl(reply.magic));
> -		return ERR_PTR(-EPROTO);
> +				(unsigned long)ntohl(reply->magic));
> +		return -EPROTO;
>   	}
>   
> -	memcpy(&handle, reply.handle, sizeof(handle));
> +	return 0;
> +}
> +
> +/* NULL returned = something went wrong, inform userspace */
> +static struct nbd_cmd *nbd_handle_reply(struct nbd_device *nbd, int index,
> +		struct nbd_reply *reply)
> +{
> +	struct nbd_config *config = nbd->config;
> +	int result;
> +	struct nbd_cmd *cmd;
> +	struct request *req = NULL;
> +	u64 handle;
> +	u16 hwq;
> +	u32 tag;
> +	struct iov_iter to;
> +	int ret = 0;
> +
> +	memcpy(&handle, reply->handle, sizeof(handle));
>   	tag = nbd_handle_to_tag(handle);
>   	hwq = blk_mq_unique_tag_to_hwq(tag);
>   	if (hwq < nbd->tag_set.nr_hw_queues)
> @@ -747,9 +756,9 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
>   		ret = -ENOENT;
>   		goto out;
>   	}
> -	if (ntohl(reply.error)) {
> +	if (ntohl(reply->error)) {
>   		dev_err(disk_to_dev(nbd->disk), "Other side returned error (%d)\n",
> -			ntohl(reply.error));
> +			ntohl(reply->error));
>   		cmd->status = BLK_STS_IOERR;
>   		goto out;
>   	}
> @@ -795,24 +804,36 @@ static void recv_work(struct work_struct *work)
>   						     work);
>   	struct nbd_device *nbd = args->nbd;
>   	struct nbd_config *config = nbd->config;
> +	struct request_queue *q = nbd->disk->queue;
> +	struct nbd_sock *nsock;
>   	struct nbd_cmd *cmd;
>   	struct request *rq;
>   
>   	while (1) {
> -		cmd = nbd_read_stat(nbd, args->index);
> -		if (IS_ERR(cmd)) {
> -			struct nbd_sock *nsock = config->socks[args->index];
> +		struct nbd_reply reply;
>   
> -			mutex_lock(&nsock->tx_lock);
> -			nbd_mark_nsock_dead(nbd, nsock, 1);
> -			mutex_unlock(&nsock->tx_lock);
> +		if (nbd_read_reply(nbd, args->index, &reply))
>   			break;
> -		}
>   
> +		if (!percpu_ref_tryget(&q->q_usage_counter))
> +			break;
> +
> +		cmd = nbd_handle_reply(nbd, args->index, &reply);
> +		if (IS_ERR(cmd)) {
> +			blk_queue_exit(q);
> +			break;
> +		}
>   		rq = blk_mq_rq_from_pdu(cmd);
>   		if (likely(!blk_should_fake_timeout(rq->q)))
>   			blk_mq_complete_request(rq);
> +		blk_queue_exit(q);
>   	}
> +
> +	nsock = config->socks[args->index];
> +	mutex_lock(&nsock->tx_lock);
> +	nbd_mark_nsock_dead(nbd, nsock, 1);
> +	mutex_unlock(&nsock->tx_lock);
> +
>   	nbd_config_put(nbd);
>   	atomic_dec(&config->recv_threads);
>   	wake_up(&config->recv_wq);
> 
> 
> Thanks,
> Ming
> 
> .
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req()
  2021-09-15  3:36                     ` yukuai (C)
@ 2021-09-15  3:46                       ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2021-09-15  3:46 UTC (permalink / raw)
  To: yukuai (C); +Cc: axboe, josef, hch, linux-block, linux-kernel, nbd, yi.zhang

On Wed, Sep 15, 2021 at 11:36:47AM +0800, yukuai (C) wrote:
> On 2021/09/15 11:16, Ming Lei wrote:
> > On Wed, Sep 15, 2021 at 09:54:09AM +0800, yukuai (C) wrote:
> > > On 2021/09/14 22:37, Ming Lei wrote:
> > > > On Tue, Sep 14, 2021 at 05:19:31PM +0800, yukuai (C) wrote:
> > > > > On 在 2021/09/14 15:46, Ming Lei wrote:
> > > > > 
> > > > > > If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
> > > > > > wondering why not take the following simpler way for avoiding the UAF?
> > > > > > 
> > > > > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > > > > > index 5170a630778d..dfa5cce71f66 100644
> > > > > > --- a/drivers/block/nbd.c
> > > > > > +++ b/drivers/block/nbd.c
> > > > > > @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
> > > > > >     						     work);
> > > > > >     	struct nbd_device *nbd = args->nbd;
> > > > > >     	struct nbd_config *config = nbd->config;
> > > > > > +	struct request_queue *q = nbd->disk->queue;
> > > > > >     	struct nbd_cmd *cmd;
> > > > > >     	struct request *rq;
> > > > > > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > > > > > +                return;
> > > > > > +
> > > > > >     	while (1) {
> > > > > >     		cmd = nbd_read_stat(nbd, args->index);
> > > > > >     		if (IS_ERR(cmd)) {
> > > > > > @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
> > > > > >     		if (likely(!blk_should_fake_timeout(rq->q)))
> > > > > >     			blk_mq_complete_request(rq);
> > > > > >     	}
> > > > > > +	blk_queue_exit(q);
> > > > > >     	nbd_config_put(nbd);
> > > > > >     	atomic_dec(&config->recv_threads);
> > > > > >     	wake_up(&config->recv_wq);
> > > > > > 
> > > > > 
> > > > > Hi, Ming
> > > > > 
> > > > > This apporch is wrong.
> > > > > 
> > > > > If blk_mq_freeze_queue() is called, and nbd is waiting for all
> > > > > request to complete. percpu_ref_tryget() will fail here, and deadlock
> > > > > will occur because request can't complete in recv_work().
> > > > 
> > > > No, percpu_ref_tryget() won't fail until ->q_usage_counter is zero, when
> > > > it is perfectly fine to do nothing in recv_work().
> > > > 
> > > 
> > > Hi Ming
> > > 
> > > This apporch is a good idea, however we should not get q_usage_counter
> > > in reccv_work(), because It will block freeze queue.
> > > 
> > > How about get q_usage_counter in nbd_read_stat(), and put in error path
> > > or after request completion?
> > 
> > OK, looks I missed that nbd_read_stat() needs to wait for incoming reply
> > first, so how about the following change by partitioning nbd_read_stat()
> > into nbd_read_reply() and nbd_handle_reply()?
> 
> Hi, Ming
> 
> The change looks good to me.
> 
> Do you want to send a patch to fix this?

I guess you may add inflight check or sort of change in nbd_read_stat(), so feel
free to fold it into your series.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-09-15  3:46 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-09 14:12 [PATCH v5 0/6] handle unexpected message from server Yu Kuai
2021-09-09 14:12 ` [PATCH v5 1/6] nbd: don't handle response without a corresponding request message Yu Kuai
2021-09-14  0:54   ` Ming Lei
2021-09-09 14:12 ` [PATCH v5 2/6] nbd: make sure request completion won't concurrent Yu Kuai
2021-09-14  0:57   ` Ming Lei
2021-09-14  3:11     ` yukuai (C)
2021-09-09 14:12 ` [PATCH v5 3/6] nbd: check sock index in nbd_read_stat() Yu Kuai
2021-09-09 14:12 ` [PATCH v5 4/6] blk-mq: export two symbols to get request by tag Yu Kuai
2021-09-09 14:12 ` [PATCH v5 5/6] nbd: convert to use blk_mq_find_and_get_req() Yu Kuai
2021-09-14  1:11   ` Ming Lei
2021-09-14  3:11     ` yukuai (C)
2021-09-14  6:44       ` Ming Lei
2021-09-14  7:13         ` yukuai (C)
2021-09-14  7:46           ` Ming Lei
2021-09-14  9:08             ` yukuai (C)
2021-09-14  9:12               ` yukuai (C)
2021-09-14 14:33               ` Ming Lei
2021-09-14  9:19             ` yukuai (C)
2021-09-14 14:37               ` Ming Lei
2021-09-15  1:54                 ` yukuai (C)
2021-09-15  3:16                   ` Ming Lei
2021-09-15  3:36                     ` yukuai (C)
2021-09-15  3:46                       ` Ming Lei
2021-09-09 14:12 ` [PATCH v5 6/6] nbd: don't start request if nbd_queue_rq() failed Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).