LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
@ 2015-01-23 23:57 Yan Liu
  2015-01-25 14:59 ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Yan Liu @ 2015-01-23 23:57 UTC (permalink / raw)
  To: Matthew Wilcox, linux-nvme; +Cc: linux-kernel, Yan Liu

When a passthrough IO command is issued with a specific block device file descriptor. It should be applied at
the namespace which is associated with that block device file descriptor. This patch makes such passthrough
command ignore nsid in nvme_passthru_cmd structure. Instead it takes the namespace ID asscoiated with the
block device descriptor.

Signed-off-by: Yan Liu <yan@purestorage.com>
---
 drivers/block/nvme-core.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index cb529e9..73c3c97 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -1682,6 +1682,27 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	return status;
 }
 
+static int nvme_namespace_sel(struct nvme_dev *dev, struct nvme_ns **ns,
+			struct nvme_passthru_cmd __user *ucmd)
+{
+	struct nvme_ns *ns_ptr;
+	struct nvme_passthru_cmd cmd;
+
+	if (list_empty(&dev->namespaces))
+		return -ENOTTY;
+
+	if (copy_from_user(&cmd, ucmd, sizeof(cmd)))
+		return -EFAULT;
+
+	list_for_each_entry(ns_ptr, &dev->namespaces, list) {
+		if (ns_ptr->ns_id == cmd.nsid) {
+			*ns = ns_ptr;
+			return 0;
+		}
+	}
+	return -EINVAL;
+}
+
 static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
 			struct nvme_passthru_cmd __user *ucmd)
 {
@@ -1699,7 +1720,7 @@ static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
 	memset(&c, 0, sizeof(c));
 	c.common.opcode = cmd.opcode;
 	c.common.flags = cmd.flags;
-	c.common.nsid = cpu_to_le32(cmd.nsid);
+	c.common.nsid = ns ? cpu_to_le32(ns->ns_id) : cpu_to_le32(cmd.nsid);
 	c.common.cdw2[0] = cpu_to_le32(cmd.cdw2);
 	c.common.cdw2[1] = cpu_to_le32(cmd.cdw3);
 	c.common.cdw10[0] = cpu_to_le32(cmd.cdw10);
@@ -2590,14 +2611,15 @@ static long nvme_dev_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 {
 	struct nvme_dev *dev = f->private_data;
 	struct nvme_ns *ns;
+	int ret;
 
 	switch (cmd) {
 	case NVME_IOCTL_ADMIN_CMD:
 		return nvme_user_cmd(dev, NULL, (void __user *)arg);
 	case NVME_IOCTL_IO_CMD:
-		if (list_empty(&dev->namespaces))
-			return -ENOTTY;
-		ns = list_first_entry(&dev->namespaces, struct nvme_ns, list);
+		ret = nvme_namespace_sel(dev, &ns, (void __user *)arg);
+		if (ret)
+			return ret;
 		return nvme_user_cmd(dev, ns, (void __user *)arg);
 	default:
 		return -ENOTTY;
-- 
1.9.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-23 23:57 [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor Yan Liu
@ 2015-01-25 14:59 ` Christoph Hellwig
  2015-01-26 18:02   ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-01-25 14:59 UTC (permalink / raw)
  To: Yan Liu; +Cc: Matthew Wilcox, linux-nvme, linux-kernel

On Fri, Jan 23, 2015 at 03:57:06PM -0800, Yan Liu wrote:
> When a passthrough IO command is issued with a specific block device file descriptor. It should be applied at
> the namespace which is associated with that block device file descriptor. This patch makes such passthrough
> command ignore nsid in nvme_passthru_cmd structure. Instead it takes the namespace ID asscoiated with the
> block device descriptor.

This looks good to me.  If Keith can point to a use case for magic or
hidden nsids we'll have to find a bypass for them, but this certainly
is the right short term fix:

Reviewed-by: Christoph Hellwig <hch@lst.de>

>  {
>  	struct nvme_dev *dev = f->private_data;
>  	struct nvme_ns *ns;
> +	int ret;
>  
>  	switch (cmd) {
>  	case NVME_IOCTL_ADMIN_CMD:
>  		return nvme_user_cmd(dev, NULL, (void __user *)arg);
>  	case NVME_IOCTL_IO_CMD:
> -		if (list_empty(&dev->namespaces))
> -			return -ENOTTY;
> -		ns = list_first_entry(&dev->namespaces, struct nvme_ns, list);
> +		ret = nvme_namespace_sel(dev, &ns, (void __user *)arg);
> +		if (ret)
> +			return ret;
>  		return nvme_user_cmd(dev, ns, (void __user *)arg);
>  	default:

This function would benefit from a local variable ala:

	void __user *argp = (void __user *)arg;

but that might be done in a follow up patch, or postponed for now.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-25 14:59 ` Christoph Hellwig
@ 2015-01-26 18:02   ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2015-01-26 18:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Yan Liu, Matthew Wilcox, linux-kernel, linux-nvme

On Sun, 25 Jan 2015, Christoph Hellwig wrote:
> On Fri, Jan 23, 2015 at 03:57:06PM -0800, Yan Liu wrote:
>> When a passthrough IO command is issued with a specific block device file descriptor. It should be applied at
>> the namespace which is associated with that block device file descriptor. This patch makes such passthrough
>> command ignore nsid in nvme_passthru_cmd structure. Instead it takes the namespace ID asscoiated with the
>> block device descriptor.
>
> This looks good to me.  If Keith can point to a use case for magic or
> hidden nsids we'll have to find a bypass for them, but this certainly
> is the right short term fix:

'Flush' is the only command from the spec where specifying all namespaces
might be desirable. Otherwise, I can't disclose vendor specific behavior
to gain concensus on operating within spec, silly as this desire may
seem. Why empower the driver to make a feature unreachable?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-23 17:50       ` Keith Busch
@ 2015-01-25 14:41         ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2015-01-25 14:41 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Matthew Wilcox, linux-kernel, linux-nvme, Yan Liu

On Fri, Jan 23, 2015 at 05:50:33PM +0000, Keith Busch wrote:
> No argument against removing the hidden attribute handling, but there
> are unadvertised NSID's that have special meaning. Like NSID 0xffffffff
> means to apply a command to all namespaces. Vendor specific commands
> may have other special NSID meanings as well.

What is the practical use of those?  Just because something is
theoretically possible we don't really need to support it.

(and yes, it's a really bad design - I wish the NVME designers had spent a
little more time with existing designs instead of applying the full NIH
mantra)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-23 17:27     ` Christoph Hellwig
@ 2015-01-23 17:50       ` Keith Busch
  2015-01-25 14:41         ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2015-01-23 17:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Matthew Wilcox, linux-kernel, linux-nvme, Yan Liu

On Fri, 23 Jan 2015, Christoph Hellwig wrote:
> On Fri, Jan 23, 2015 at 04:22:02PM +0000, Keith Busch wrote:
>> The namespace id should be enforced on block devices, but is there a
>> problem allowing arbitrary commands through the management char device?
>> I have a need for a pure passthrough, but the proposed patch requires
>> a matching namespace id all the time.
>>
>> I wrote and tested the one below to override nsid on block devices,
>> but doesn't require a visible namespace through the management device.
>
> Allowing requests to differetn namespaces through the admin interface
> doesn't sound too horrible in general, but I still don't like your patch
> below.  Instead of allocating another queue that allows arbitrary nsids
> we should simply look up the namespace when sent through the admin device,
> and still reject it if the namespace isn't valid.  If a namespaces
> is marked hidden we should still create a device for it in Linux,
> as that whole concept of hiding a namespace is silly.

No argument against removing the hidden attribute handling, but there
are unadvertised NSID's that have special meaning. Like NSID 0xffffffff
means to apply a command to all namespaces. Vendor specific commands
may have other special NSID meanings as well.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-23 16:22   ` Keith Busch
@ 2015-01-23 17:27     ` Christoph Hellwig
  2015-01-23 17:50       ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-01-23 17:27 UTC (permalink / raw)
  To: Keith Busch; +Cc: Matthew Wilcox, linux-kernel, linux-nvme, Yan Liu

On Fri, Jan 23, 2015 at 04:22:02PM +0000, Keith Busch wrote:
> The namespace id should be enforced on block devices, but is there a
> problem allowing arbitrary commands through the management char device?
> I have a need for a pure passthrough, but the proposed patch requires
> a matching namespace id all the time.
> 
> I wrote and tested the one below to override nsid on block devices,
> but doesn't require a visible namespace through the management device.

Allowing requests to differetn namespaces through the admin interface
doesn't sound too horrible in general, but I still don't like your patch
below.  Instead of allocating another queue that allows arbitrary nsids
we should simply look up the namespace when sent through the admin device,
and still reject it if the namespace isn't valid.  If a namespaces
is marked hidden we should still create a device for it in Linux,
as that whole concept of hiding a namespace is silly.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-23  7:57 ` Christoph Hellwig
@ 2015-01-23 16:22   ` Keith Busch
  2015-01-23 17:27     ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2015-01-23 16:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Yan Liu, Matthew Wilcox, linux-kernel, linux-nvme

On Thu, 22 Jan 2015, Christoph Hellwig wrote:
> On Thu, Jan 22, 2015 at 04:02:08PM -0800, Yan Liu wrote:
>> When a passthrough IO command is issued with a specific block device file descriptor. It should be applied at
>> the namespace which is associated with that block device file descriptor. This patch makes such passthrough
>> command ignore nsid in nvme_passthru_cmd structure. Instead it takes the namespace ID asscoiated with the
>> block device descriptor.
>>
>> Signed-off-by: Yan Liu <yan@purestorage.com>
>
> Please move the code to find the ns into the caller, or even better a
> seaprate helper used by the caller. instead of adding another argument to
> nvme_user_cmd.

The namespace id should be enforced on block devices, but is there a
problem allowing arbitrary commands through the management char device?
I have a need for a pure passthrough, but the proposed patch requires
a matching namespace id all the time.

I wrote and tested the one below to override nsid on block devices,
but doesn't require a visible namespace through the management device.

---
diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index cb529e9..bdec1d7 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -1682,7 +1682,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
  	return status;
  }

-static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
+static int nvme_user_cmd(struct nvme_dev *dev, struct request_queue *q,
  			struct nvme_passthru_cmd __user *ucmd)
  {
  	struct nvme_passthru_cmd cmd;
@@ -1690,6 +1690,8 @@ static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
  	int status, length;
  	struct nvme_iod *uninitialized_var(iod);
  	unsigned timeout;
+	struct request *req;
+	struct nvme_ns *ns = q->queuedata;

  	if (!capable(CAP_SYS_ADMIN))
  		return -EACCES;
@@ -1699,7 +1701,7 @@ static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
  	memset(&c, 0, sizeof(c));
  	c.common.opcode = cmd.opcode;
  	c.common.flags = cmd.flags;
-	c.common.nsid = cpu_to_le32(cmd.nsid);
+	c.common.nsid = ns ? cpu_to_le32(ns->ns_id) : cpu_to_le32(cmd.nsid);
  	c.common.cdw2[0] = cpu_to_le32(cmd.cdw2);
  	c.common.cdw2[1] = cpu_to_le32(cmd.cdw3);
  	c.common.cdw10[0] = cpu_to_le32(cmd.cdw10);
@@ -1725,21 +1727,15 @@ static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,

  	if (length != cmd.data_len)
  		status = -ENOMEM;
-	else if (ns) {
-		struct request *req;
-
-		req = blk_mq_alloc_request(ns->queue, WRITE,
-						(GFP_KERNEL|__GFP_WAIT), false);
-		if (IS_ERR(req))
-			status = PTR_ERR(req);
-		else {
-			status = nvme_submit_sync_cmd(req, &c, &cmd.result,
-								timeout);
-			blk_mq_free_request(req);
-		}
-	} else
-		status = __nvme_submit_admin_cmd(dev, &c, &cmd.result, timeout);

+	req = blk_mq_alloc_request(q, WRITE, (GFP_KERNEL|__GFP_WAIT), false);
+	if (IS_ERR(req)) {
+		status = PTR_ERR(req);
+		goto out;
+	}
+	status = nvme_submit_sync_cmd(req, &c, &cmd.result, timeout);
+	blk_mq_free_request(req);
+ out:
  	if (cmd.data_len) {
  		nvme_unmap_user_pages(dev, cmd.opcode & 1, iod);
  		nvme_free_iod(dev, iod);
@@ -1762,9 +1758,9 @@ static int nvme_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd,
  		force_successful_syscall_return();
  		return ns->ns_id;
  	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(ns->dev, NULL, (void __user *)arg);
+		return nvme_user_cmd(ns->dev, ns->dev->admin_q, (void __user *)arg);
  	case NVME_IOCTL_IO_CMD:
-		return nvme_user_cmd(ns->dev, ns, (void __user *)arg);
+		return nvme_user_cmd(ns->dev, ns->queue, (void __user *)arg);
  	case NVME_IOCTL_SUBMIT_IO:
  		return nvme_submit_io(ns, (void __user *)arg);
  	case SG_GET_VERSION_NUM:
@@ -2155,6 +2151,17 @@ static int nvme_dev_add(struct nvme_dev *dev)
  	if (blk_mq_alloc_tag_set(&dev->tagset))
  		goto out;

+	dev->io_q = blk_mq_init_queue(&dev->tagset);
+	if (IS_ERR(dev->io_q)) {
+		blk_mq_free_tag_set(&dev->tagset);
+		goto out;
+	}
+	if (!blk_get_queue(dev->io_q)) {
+		blk_cleanup_queue(dev->io_q);
+		blk_mq_free_tag_set(&dev->tagset);
+		goto out;
+	}
+
  	id_ns = mem;
  	for (i = 1; i <= nn; i++) {
  		res = nvme_identify(dev, i, 0, dma_addr);
@@ -2565,6 +2572,7 @@ static void nvme_free_dev(struct kref *kref)
  	nvme_release_instance(dev);
  	blk_mq_free_tag_set(&dev->tagset);
  	blk_put_queue(dev->admin_q);
+	blk_put_queue(dev->io_q);
  	kfree(dev->queues);
  	kfree(dev->entry);
  	kfree(dev);
@@ -2589,16 +2597,12 @@ static int nvme_dev_release(struct inode *inode, struct file *f)
  static long nvme_dev_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
  {
  	struct nvme_dev *dev = f->private_data;
-	struct nvme_ns *ns;

  	switch (cmd) {
  	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(dev, NULL, (void __user *)arg);
+		return nvme_user_cmd(dev, dev->admin_q, (void __user *)arg);
  	case NVME_IOCTL_IO_CMD:
-		if (list_empty(&dev->namespaces))
-			return -ENOTTY;
-		ns = list_first_entry(&dev->namespaces, struct nvme_ns, list);
-		return nvme_user_cmd(dev, ns, (void __user *)arg);
+		return nvme_user_cmd(dev, dev->io_q, (void __user *)arg);
  	default:
  		return -ENOTTY;
  	}
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 258945f..d3b467b 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -74,6 +74,7 @@ struct nvme_dev {
  	struct list_head node;
  	struct nvme_queue **queues;
  	struct request_queue *admin_q;
+	struct request_queue *io_q;
  	struct blk_mq_tag_set tagset;
  	struct blk_mq_tag_set admin_tagset;
  	u32 __iomem *dbs;
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-23  0:02 Yan Liu
@ 2015-01-23  7:57 ` Christoph Hellwig
  2015-01-23 16:22   ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-01-23  7:57 UTC (permalink / raw)
  To: Yan Liu; +Cc: Matthew Wilcox, linux-nvme, linux-kernel

On Thu, Jan 22, 2015 at 04:02:08PM -0800, Yan Liu wrote:
> When a passthrough IO command is issued with a specific block device file descriptor. It should be applied at
> the namespace which is associated with that block device file descriptor. This patch makes such passthrough
> command ignore nsid in nvme_passthru_cmd structure. Instead it takes the namespace ID asscoiated with the
> block device descriptor.
> 
> Signed-off-by: Yan Liu <yan@purestorage.com>

Please move the code to find the ns into the caller, or even better a
seaprate helper used by the caller. instead of adding another argument to
nvme_user_cmd.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
@ 2015-01-23  0:02 Yan Liu
  2015-01-23  7:57 ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Yan Liu @ 2015-01-23  0:02 UTC (permalink / raw)
  To: Matthew Wilcox, linux-nvme; +Cc: linux-kernel, Yan Liu

When a passthrough IO command is issued with a specific block device file descriptor. It should be applied at
the namespace which is associated with that block device file descriptor. This patch makes such passthrough
command ignore nsid in nvme_passthru_cmd structure. Instead it takes the namespace ID asscoiated with the
block device descriptor.

Signed-off-by: Yan Liu <yan@purestorage.com>
---
 drivers/block/nvme-core.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index cb529e9..8bb08ab 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -1683,7 +1683,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 }
 
 static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
-			struct nvme_passthru_cmd __user *ucmd)
+			struct nvme_passthru_cmd __user *ucmd, bool ioq)
 {
 	struct nvme_passthru_cmd cmd;
 	struct nvme_command c;
@@ -1699,7 +1699,7 @@ static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
 	memset(&c, 0, sizeof(c));
 	c.common.opcode = cmd.opcode;
 	c.common.flags = cmd.flags;
-	c.common.nsid = cpu_to_le32(cmd.nsid);
+	c.common.nsid = ns ? cpu_to_le32(ns->ns_id) : cpu_to_le32(cmd.nsid);
 	c.common.cdw2[0] = cpu_to_le32(cmd.cdw2);
 	c.common.cdw2[1] = cpu_to_le32(cmd.cdw3);
 	c.common.cdw10[0] = cpu_to_le32(cmd.cdw10);
@@ -1725,9 +1725,25 @@ static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
 
 	if (length != cmd.data_len)
 		status = -ENOMEM;
-	else if (ns) {
+	else if (ioq) {
 		struct request *req;
 
+		if (!ns) {
+			if (list_empty(&dev->namespaces)) {
+				return -ENOTTY;
+
+			struct nvme_ns *ns_ptr;
+
+			list_for_each_entry(ns_ptr, &dev->namespaces, list) {
+				if (ns_ptr->ns_id == cmd.nsid) {
+					ns = ns_ptr;
+					break;
+				}
+			}
+			if (!ns)
+				return -EINVAL;
+		}
+
 		req = blk_mq_alloc_request(ns->queue, WRITE,
 						(GFP_KERNEL|__GFP_WAIT), false);
 		if (IS_ERR(req))
@@ -1762,9 +1778,9 @@ static int nvme_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd,
 		force_successful_syscall_return();
 		return ns->ns_id;
 	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(ns->dev, NULL, (void __user *)arg);
+		return nvme_user_cmd(ns->dev, NULL, (void __user *)arg, false);
 	case NVME_IOCTL_IO_CMD:
-		return nvme_user_cmd(ns->dev, ns, (void __user *)arg);
+		return nvme_user_cmd(ns->dev, ns, (void __user *)arg, true);
 	case NVME_IOCTL_SUBMIT_IO:
 		return nvme_submit_io(ns, (void __user *)arg);
 	case SG_GET_VERSION_NUM:
@@ -2589,16 +2605,12 @@ static int nvme_dev_release(struct inode *inode, struct file *f)
 static long nvme_dev_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 {
 	struct nvme_dev *dev = f->private_data;
-	struct nvme_ns *ns;
 
 	switch (cmd) {
 	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(dev, NULL, (void __user *)arg);
+		return nvme_user_cmd(dev, NULL, (void __user *)arg, false);
 	case NVME_IOCTL_IO_CMD:
-		if (list_empty(&dev->namespaces))
-			return -ENOTTY;
-		ns = list_first_entry(&dev->namespaces, struct nvme_ns, list);
-		return nvme_user_cmd(dev, ns, (void __user *)arg);
+		return nvme_user_cmd(dev, NULL, (void __user *)arg, true);
 	default:
 		return -ENOTTY;
 	}
-- 
1.9.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-22 15:49       ` Christoph Hellwig
@ 2015-01-22 16:58         ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2015-01-22 16:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Yan Liu, Matthew Wilcox, linux-kernel, linux-nvme

On Thu, 22 Jan 2015, Christoph Hellwig wrote:
> On Thu, Jan 22, 2015 at 03:21:28PM +0000, Keith Busch wrote:
>> But if you really need to restrict namespace access, shouldn't that be
>> enforced on the target side with reservations or similar mechanism?
>
> Think for example about containers where we give eah container access
> to a single nvme namespace, including container root access.  Here you
> don't really want container A to be able to submit I/O for another
> container.  A similar case exists for virtualization where we had
> problems with SCSI passthrough from guests.


Okay, that's a great point.

Yan, we should apply this if you can submit a patch for the linux-block
tree.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-22 15:21     ` Keith Busch
@ 2015-01-22 15:49       ` Christoph Hellwig
  2015-01-22 16:58         ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-01-22 15:49 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Yan Liu, Matthew Wilcox, linux-kernel, linux-nvme

On Thu, Jan 22, 2015 at 03:21:28PM +0000, Keith Busch wrote:
> The case I considered was the "hidden" attribute in the NVMe LBA Range
> Type feature. It only indicates the storage should be hidden from the OS
> for general use, but the host may still use it for special purposes. In
> truth, the driver doesn't handle the hidden attribute very well and it
> doesn't seem like a well thought out feature in the spec anyway.

At least for Linux we should simply ignore that attribute.

> But if you really need to restrict namespace access, shouldn't that be
> enforced on the target side with reservations or similar mechanism?

Think for example about containers where we give eah container access
to a single nvme namespace, including container root access.  Here you
don't really want container A to be able to submit I/O for another
container.  A similar case exists for virtualization where we had
problems with SCSI passthrough from guests.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-22  8:45   ` Christoph Hellwig
@ 2015-01-22 15:21     ` Keith Busch
  2015-01-22 15:49       ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2015-01-22 15:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Yan Liu, Matthew Wilcox, linux-kernel, linux-nvme

On Thu, 22 Jan 2015, Christoph Hellwig wrote:
> On Thu, Jan 22, 2015 at 12:47:24AM +0000, Keith Busch wrote:
>> The IOCTL's purpose was to let someone submit completely arbitrary
>> commands on IO queues. This technically shouldn't even need a namespace
>> handle, but we don't have a request_queue associated to IO queues without
>> one like the admin queue has. In fact, we ought to fix that so we can
>> issue IO commands without namespaces.
>
> Honestly, this sounds like a horrible idea.  As namespaces aren't really
> any different from SCSI LUNs they should only be accessible through
> the device associated with the namespaces, and admin commands should
> only be allowed through the character device (if at all).

The case I considered was the "hidden" attribute in the NVMe LBA Range
Type feature. It only indicates the storage should be hidden from the OS
for general use, but the host may still use it for special purposes. In
truth, the driver doesn't handle the hidden attribute very well and it
doesn't seem like a well thought out feature in the spec anyway.

But if you really need to restrict namespace access, shouldn't that be
enforced on the target side with reservations or similar mechanism?

I agree on your last point. Admin commands through namespaces carried over
from before the management device existed, but removing it now will break
some customer tooling. There's probably a responsible way to migrate.

> For these security and usability reasons we did get rid of the
> SG_FLAG_LUN_INHIBIT flag in the SCSI passthrough interface, which
> allowed for similar horrible things in the distant past.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
       [not found]   ` <CADMsRTZjajAj682a5FH-AmpphoQ4vw5QxqnJiGEQ+Jg_f7TvoA@mail.gmail.com>
@ 2015-01-22 14:22     ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2015-01-22 14:22 UTC (permalink / raw)
  To: Yan Liu; +Cc: Keith Busch, Matthew Wilcox, linux-nvme, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 976 bytes --]

On Wed, 21 Jan 2015, Yan Liu wrote:
> For IO passthrough command, it uses an IO queue associated with the device. Actually, this patch does not modify that part.
> This patch is not really focused on io queues; instead, it is more about namespace protection from other namespace's user ios. The patch here doesn't prevent a user submitting completely arbitrary commands on IO queues. One still can
> do it through a char dev file descriptor. However, when a user explicitly chooses a namespace's file descriptor, it is unlikely that she/he tries to issue an io command to a different namespace. Maybe in that case someone should to
> use a new admin command.

Oh, I just realized your patch is not for the blk-mq version of this
driver. The mainline kernel uses blk-mq and is a little different in
this path. My mistake, you're right the command works the same as before
using the character device here. In that case, this seems very reasonable,
but doesn't merge upstream.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-22  0:47 ` Keith Busch
@ 2015-01-22  8:45   ` Christoph Hellwig
  2015-01-22 15:21     ` Keith Busch
       [not found]   ` <CADMsRTZjajAj682a5FH-AmpphoQ4vw5QxqnJiGEQ+Jg_f7TvoA@mail.gmail.com>
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-01-22  8:45 UTC (permalink / raw)
  To: Keith Busch; +Cc: Yan Liu, Matthew Wilcox, linux-kernel, linux-nvme

On Thu, Jan 22, 2015 at 12:47:24AM +0000, Keith Busch wrote:
> The IOCTL's purpose was to let someone submit completely arbitrary
> commands on IO queues. This technically shouldn't even need a namespace
> handle, but we don't have a request_queue associated to IO queues without
> one like the admin queue has. In fact, we ought to fix that so we can
> issue IO commands without namespaces.

Honestly, this sounds like a horrible idea.  As namespaces aren't really
any different from SCSI LUNs they should only be accessible through
the device associated with the namespaces, and admin commands should
only be allowed through the character device (if at all).

For these security and usability reasons we did get rid of the
SG_FLAG_LUN_INHIBIT flag in the SCSI passthrough interface, which
allowed for similar horrible things in the distant past.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
  2015-01-22  0:28 Yan Liu
@ 2015-01-22  0:47 ` Keith Busch
  2015-01-22  8:45   ` Christoph Hellwig
       [not found]   ` <CADMsRTZjajAj682a5FH-AmpphoQ4vw5QxqnJiGEQ+Jg_f7TvoA@mail.gmail.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Keith Busch @ 2015-01-22  0:47 UTC (permalink / raw)
  To: Yan Liu; +Cc: Matthew Wilcox, linux-nvme, linux-kernel

On Wed, 21 Jan 2015, Yan Liu wrote:
> When a passthrough IO command is issued with a specific block device file descriptor. It should be applied at
> the namespace which is associated with that block device file descriptor. This patch makes such passthrough
> command ingore nsid in nvme_passthru_cmd structure. Instead it takes the namespace ID asscoiated with the
> block device descriptor.
>
> Signed-off-by: Yan Liu <yan@purestorage.com>

Oh it doesn't look like you tested this through the character
handle. You've got it set to use the admin queue's request_queue for IO
passthrough commands, so that can't be right.

The IOCTL's purpose was to let someone submit completely arbitrary
commands on IO queues. This technically shouldn't even need a namespace
handle, but we don't have a request_queue associated to IO queues without
one like the admin queue has. In fact, we ought to fix that so we can
issue IO commands without namespaces.

Anyway, namespaces may be hidden or some vendor special NSID trickery,
who knows. Point is we don't want to tie the passthrough command's NSID
down to the namespace providing the request_queue. If it is your intention
to use that NSID, you can get the NSID using the NVME_IOCTL_ID prior to
setting up the passthrough command.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor
@ 2015-01-22  0:28 Yan Liu
  2015-01-22  0:47 ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Yan Liu @ 2015-01-22  0:28 UTC (permalink / raw)
  To: Matthew Wilcox, linux-nvme; +Cc: linux-kernel, Yan Liu

When a passthrough IO command is issued with a specific block device file descriptor. It should be applied at
the namespace which is associated with that block device file descriptor. This patch makes such passthrough
command ingore nsid in nvme_passthru_cmd structure. Instead it takes the namespace ID asscoiated with the
block device descriptor.

Signed-off-by: Yan Liu <yan@purestorage.com>
---
 drivers/block/nvme-core.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 00fa5d2..865baa6 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -1734,7 +1734,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	return status;
 }
 
-static int nvme_user_cmd(struct nvme_dev *dev,
+static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
 			struct nvme_passthru_cmd __user *ucmd, bool ioq)
 {
 	struct nvme_passthru_cmd cmd;
@@ -1751,7 +1751,7 @@ static int nvme_user_cmd(struct nvme_dev *dev,
 	memset(&c, 0, sizeof(c));
 	c.common.opcode = cmd.opcode;
 	c.common.flags = cmd.flags;
-	c.common.nsid = cpu_to_le32(cmd.nsid);
+	c.common.nsid = ns ? cpu_to_le32(ns->ns_id) : cpu_to_le32(cmd.nsid);
 	c.common.cdw2[0] = cpu_to_le32(cmd.cdw2);
 	c.common.cdw2[1] = cpu_to_le32(cmd.cdw3);
 	c.common.cdw10[0] = cpu_to_le32(cmd.cdw10);
@@ -1804,9 +1804,9 @@ static int nvme_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd,
 		force_successful_syscall_return();
 		return ns->ns_id;
 	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(ns->dev, (void __user *)arg, false);
+		return nvme_user_cmd(ns->dev, NULL, (void __user *)arg, false);
 	case NVME_IOCTL_IO_CMD:
-		return nvme_user_cmd(ns->dev, (void __user *)arg, true);
+		return nvme_user_cmd(ns->dev, ns, (void __user *)arg, true);
 	case NVME_IOCTL_SUBMIT_IO:
 		return nvme_submit_io(ns, (void __user *)arg);
 	case SG_GET_VERSION_NUM:
@@ -2762,9 +2762,9 @@ static long nvme_dev_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 	struct nvme_dev *dev = f->private_data;
 	switch (cmd) {
 	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(dev, (void __user *)arg, false);
+		return nvme_user_cmd(dev, NULL, (void __user *)arg, false);
 	case NVME_IOCTL_IO_CMD:
-		return nvme_user_cmd(dev, (void __user *)arg, true);
+		return nvme_user_cmd(dev, NULL, (void __user *)arg, true);
 	default:
 		return -ENOTTY;
 	}
-- 
1.9.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-01-26 18:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-23 23:57 [PATCH 1/1] NVMe: Do not take nsid while a passthrough IO command is being issued via a block device file descriptor Yan Liu
2015-01-25 14:59 ` Christoph Hellwig
2015-01-26 18:02   ` Keith Busch
  -- strict thread matches above, loose matches on Subject: below --
2015-01-23  0:02 Yan Liu
2015-01-23  7:57 ` Christoph Hellwig
2015-01-23 16:22   ` Keith Busch
2015-01-23 17:27     ` Christoph Hellwig
2015-01-23 17:50       ` Keith Busch
2015-01-25 14:41         ` Christoph Hellwig
2015-01-22  0:28 Yan Liu
2015-01-22  0:47 ` Keith Busch
2015-01-22  8:45   ` Christoph Hellwig
2015-01-22 15:21     ` Keith Busch
2015-01-22 15:49       ` Christoph Hellwig
2015-01-22 16:58         ` Keith Busch
     [not found]   ` <CADMsRTZjajAj682a5FH-AmpphoQ4vw5QxqnJiGEQ+Jg_f7TvoA@mail.gmail.com>
2015-01-22 14:22     ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).