LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] bsg : Add support for io vectors in bsg
@ 2008-01-04 16:17 Deepak Colluru
  2008-01-05  5:01 ` FUJITA Tomonori
  0 siblings, 1 reply; 12+ messages in thread
From: Deepak Colluru @ 2008-01-04 16:17 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel

From: Deepak Colluru <deepakrc@gmail.com>

Add support for io vectors in bsg.

Signed-off-by: Deepak Colluru <deepakrc@gmail.com>
---
  bsg.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++++---
  1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 8e181ab..78f47ed 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -245,7 +245,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
  	struct request_queue *q = bd->queue;
  	struct request *rq, *next_rq = NULL;
  	int ret, rw;
-	unsigned int dxfer_len;
+	unsigned int dxfer_len, iovec_count = 0;
  	void *dxferp = NULL;

  	dprintk("map hdr %llx/%u %llx/%u\n", (unsigned long long) hdr->dout_xferp,
@@ -281,7 +281,31 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
  		rq->next_rq = next_rq;

  		dxferp = (void*)(unsigned long)hdr->din_xferp;
-		ret =  blk_rq_map_user(q, next_rq, dxferp, hdr->din_xfer_len);
+		iovec_count = hdr->din_iovec_count;
+		dxfer_len = hdr->din_xfer_len;
+
+		if (iovec_count) {
+			const int size = sizeof(struct sg_iovec) * iovec_count;
+			struct sg_iovec *iov;
+
+			iov = kmalloc(size, GFP_KERNEL);
+			if (!iov) {
+				ret = -ENOMEM;
+				goto out;
+			}
+
+			if (copy_from_user(iov, dxferp, size)) {
+				kfree(iov);
+				ret = -EFAULT;
+				goto out;
+			}
+			ret = blk_rq_map_user_iov(q, rq, iov, iovec_count,
+								dxfer_len);
+			kfree(iov);
+		} else {
+			ret = blk_rq_map_user(q, rq, dxferp, dxfer_len);
+		}
+
  		if (ret)
  			goto out;
  	}
@@ -289,14 +313,36 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
  	if (hdr->dout_xfer_len) {
  		dxfer_len = hdr->dout_xfer_len;
  		dxferp = (void*)(unsigned long)hdr->dout_xferp;
+		iovec_count = hdr->dout_iovec_count;
  	} else if (hdr->din_xfer_len) {
  		dxfer_len = hdr->din_xfer_len;
  		dxferp = (void*)(unsigned long)hdr->din_xferp;
+		iovec_count = hdr->din_iovec_count;
  	} else
  		dxfer_len = 0;

  	if (dxfer_len) {
-		ret = blk_rq_map_user(q, rq, dxferp, dxfer_len);
+		if (iovec_count) {
+			const int size = sizeof(struct sg_iovec) * iovec_count;
+			struct sg_iovec *iov;
+
+			iov = kmalloc(size, GFP_KERNEL);
+			if (!iov) {
+				ret = -ENOMEM;
+				goto out;
+			}
+
+			if (copy_from_user(iov, dxferp, size)) {
+				kfree(iov);
+				ret = -EFAULT;
+				goto out;
+			}
+			ret = blk_rq_map_user_iov(q, rq, iov, iovec_count,
+								dxfer_len);
+			kfree(iov);
+		} else {
+			ret = blk_rq_map_user(q, rq, dxferp, dxfer_len);
+		}
  		if (ret)
  			goto out;
  	}

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-04 16:17 [PATCH] bsg : Add support for io vectors in bsg Deepak Colluru
@ 2008-01-05  5:01 ` FUJITA Tomonori
  2008-01-08 22:09   ` Pete Wyckoff
  0 siblings, 1 reply; 12+ messages in thread
From: FUJITA Tomonori @ 2008-01-05  5:01 UTC (permalink / raw)
  To: deepakrc; +Cc: linux-scsi, linux-kernel, fujita.tomonori

From: Deepak Colluru <deepakrc@gmail.com>
Subject: [PATCH] bsg : Add support for io vectors in bsg
Date: Fri, 4 Jan 2008 21:47:34 +0530 (IST)

> From: Deepak Colluru <deepakrc@gmail.com>
> 
> Add support for io vectors in bsg.
> 
> Signed-off-by: Deepak Colluru <deepakrc@gmail.com>
> ---
>   bsg.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 49 insertions(+), 3 deletions(-)

Thanks, but I have to NACK this.

You can find the discussion about bsg io vector support and a similar
patch in linux-scsi archive. I have no plan to support it since it
needs the compat hack.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-05  5:01 ` FUJITA Tomonori
@ 2008-01-08 22:09   ` Pete Wyckoff
  2008-01-09  0:11     ` FUJITA Tomonori
  0 siblings, 1 reply; 12+ messages in thread
From: Pete Wyckoff @ 2008-01-08 22:09 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: deepakrc, linux-scsi, linux-kernel, fujita.tomonori

tomof@acm.org wrote on Sat, 05 Jan 2008 14:01 +0900:
> From: Deepak Colluru <deepakrc@gmail.com>
> Subject: [PATCH] bsg : Add support for io vectors in bsg
> Date: Fri, 4 Jan 2008 21:47:34 +0530 (IST)
> 
> > From: Deepak Colluru <deepakrc@gmail.com>
> > 
> > Add support for io vectors in bsg.
> > 
> > Signed-off-by: Deepak Colluru <deepakrc@gmail.com>
> > ---
> >   bsg.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++++---
> >   1 file changed, 49 insertions(+), 3 deletions(-)
> 
> Thanks, but I have to NACK this.
> 
> You can find the discussion about bsg io vector support and a similar
> patch in linux-scsi archive. I have no plan to support it since it
> needs the compat hack.

You may recall this is one of the patches I need to use bsg with OSD
devices.  OSDs overload the SCSI buffer model to put mulitple fields
in dataout and datain.  Some is user data, but some is more
logically created by a library.  Memcpying in userspace to wedge all
the segments into a single buffer is painful, and is required both
on outgoing and incoming data buffers.

There are two approaches to add iovec to bsg.

1.  Define a new sg_iovec_v4 that uses constant width types.  Both
    32- and 64-bit userspace would hand arrays of this to the kernel.

    struct sg_v4_iovec {
	    __u64 iov_base;
	    __u32 iov_len;
	    __u32 __pad1;
    };

    Old patch here:  http://article.gmane.org/gmane.linux.scsi/30461/


2.  Do as Deepak has done, using the existing sg_iovec, but then
    also work around the compat issue.  Old v3 sg_iovec is:

    typedef struct sg_iovec /* same structure as used by readv() Linux system */
    {                       /* call. It defines one scatter-gather element. */
	void __user *iov_base;      /* Starting address  */
	size_t iov_len;             /* Length in bytes  */
    } sg_iovec_t;

    Old patch here:  http://article.gmane.org/gmane.linux.scsi/30460/

I took another look at the compat approach, to see if it is feasible
to keep the compat handling somewhere else, without the use of #ifdef
CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
The use of iovec is within a write operation on a char device.  It's
not amenable to a compat_sys_ or a .compat_ioctl approach.

I'm partial to #1 because the use of architecture-independent fields
matches the rest of struct sg_io_v4.  But if you don't want to have
another iovec type in the kernel, could we do #2 but just return
-EINVAL if the need for compat is detected?  I.e. change
dout_iovec_count to dout_iovec_length and do the math?

		-- Pete

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-08 22:09   ` Pete Wyckoff
@ 2008-01-09  0:11     ` FUJITA Tomonori
  2008-01-10 20:43       ` Pete Wyckoff
  0 siblings, 1 reply; 12+ messages in thread
From: FUJITA Tomonori @ 2008-01-09  0:11 UTC (permalink / raw)
  To: pw; +Cc: tomof, deepakrc, linux-scsi, linux-kernel, fujita.tomonori

On Tue, 8 Jan 2008 17:09:18 -0500
Pete Wyckoff <pw@osc.edu> wrote:

> tomof@acm.org wrote on Sat, 05 Jan 2008 14:01 +0900:
> > From: Deepak Colluru <deepakrc@gmail.com>
> > Subject: [PATCH] bsg : Add support for io vectors in bsg
> > Date: Fri, 4 Jan 2008 21:47:34 +0530 (IST)
> > 
> > > From: Deepak Colluru <deepakrc@gmail.com>
> > > 
> > > Add support for io vectors in bsg.
> > > 
> > > Signed-off-by: Deepak Colluru <deepakrc@gmail.com>
> > > ---
> > >   bsg.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++++---
> > >   1 file changed, 49 insertions(+), 3 deletions(-)
> > 
> > Thanks, but I have to NACK this.
> > 
> > You can find the discussion about bsg io vector support and a similar
> > patch in linux-scsi archive. I have no plan to support it since it
> > needs the compat hack.
> 
> You may recall this is one of the patches I need to use bsg with OSD
> devices.  OSDs overload the SCSI buffer model to put mulitple fields
> in dataout and datain.  Some is user data, but some is more
> logically created by a library.  Memcpying in userspace to wedge all
> the segments into a single buffer is painful, and is required both
> on outgoing and incoming data buffers.
> 
> There are two approaches to add iovec to bsg.
> 
> 1.  Define a new sg_iovec_v4 that uses constant width types.  Both
>     32- and 64-bit userspace would hand arrays of this to the kernel.
> 
>     struct sg_v4_iovec {
> 	    __u64 iov_base;
> 	    __u32 iov_len;
> 	    __u32 __pad1;
>     };
> 
>     Old patch here:  http://article.gmane.org/gmane.linux.scsi/30461/

As I said before, I don't think that inventing a new "iovec" is a good
idea. sgv3 use the common "iovec". In addition, sg_io_v4 can be used
by other OSes like sg_io_v3.


> 2.  Do as Deepak has done, using the existing sg_iovec, but then
>     also work around the compat issue.  Old v3 sg_iovec is:
> 
>     typedef struct sg_iovec /* same structure as used by readv() Linux system */
>     {                       /* call. It defines one scatter-gather element. */
> 	void __user *iov_base;      /* Starting address  */
> 	size_t iov_len;             /* Length in bytes  */
>     } sg_iovec_t;
> 
>     Old patch here:  http://article.gmane.org/gmane.linux.scsi/30460/
> 
> I took another look at the compat approach, to see if it is feasible
> to keep the compat handling somewhere else, without the use of #ifdef
> CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
> The use of iovec is within a write operation on a char device.  It's
> not amenable to a compat_sys_ or a .compat_ioctl approach.
> 
> I'm partial to #1 because the use of architecture-independent fields
> matches the rest of struct sg_io_v4.  But if you don't want to have
> another iovec type in the kernel, could we do #2 but just return
> -EINVAL if the need for compat is detected?  I.e. change
> dout_iovec_count to dout_iovec_length and do the math?

If you are ok with removing the write/read interface and just have
ioctl, we could can handle comapt stuff like others do. But I think
that you (OSD people) really want to keep the write/read
interface. Sorry, I think that there is no workaround to support iovec
in bsg.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-09  0:11     ` FUJITA Tomonori
@ 2008-01-10 20:43       ` Pete Wyckoff
  2008-01-10 20:55         ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Pete Wyckoff @ 2008-01-10 20:43 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: tomof, deepakrc, linux-scsi, linux-kernel

fujita.tomonori@lab.ntt.co.jp wrote on Wed, 09 Jan 2008 09:11 +0900:
> On Tue, 8 Jan 2008 17:09:18 -0500
> Pete Wyckoff <pw@osc.edu> wrote:
> > I took another look at the compat approach, to see if it is feasible
> > to keep the compat handling somewhere else, without the use of #ifdef
> > CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
> > The use of iovec is within a write operation on a char device.  It's
> > not amenable to a compat_sys_ or a .compat_ioctl approach.
> > 
> > I'm partial to #1 because the use of architecture-independent fields
> > matches the rest of struct sg_io_v4.  But if you don't want to have
> > another iovec type in the kernel, could we do #2 but just return
> > -EINVAL if the need for compat is detected?  I.e. change
> > dout_iovec_count to dout_iovec_length and do the math?
> 
> If you are ok with removing the write/read interface and just have
> ioctl, we could can handle comapt stuff like others do. But I think
> that you (OSD people) really want to keep the write/read
> interface. Sorry, I think that there is no workaround to support iovec
> in bsg.

I don't care about read/write in particular.  But we do need some
way to launch asynchronous SCSI commands, and currently read/write
are the only way to do that in bsg.  The reason is to keep multiple
spindles busy at the same time.

How about these new ioctls instead of read/write:

    SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
    SG_IO_TEST   - complete and return a previous req
    SG_IO_WAIT   - wait for a req to finish, interruptibly

Then old write users will instead do ioctl SUBMIT.  Read users will
do TEST for non-blocking fd, or WAIT for blocking.  And SG_IO could
be implemented as SUBMIT + WAIT.

Then we can do compat_ioctl and convert up iovecs out-of-line before
calling the normal functions.

Let me know if you want a patch for this.

		-- Pete

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-10 20:43       ` Pete Wyckoff
@ 2008-01-10 20:55         ` James Bottomley
  2008-01-10 21:46           ` Pete Wyckoff
  0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2008-01-10 20:55 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: FUJITA Tomonori, tomof, deepakrc, linux-scsi, linux-kernel


On Thu, 2008-01-10 at 15:43 -0500, Pete Wyckoff wrote:
> fujita.tomonori@lab.ntt.co.jp wrote on Wed, 09 Jan 2008 09:11 +0900:
> > On Tue, 8 Jan 2008 17:09:18 -0500
> > Pete Wyckoff <pw@osc.edu> wrote:
> > > I took another look at the compat approach, to see if it is feasible
> > > to keep the compat handling somewhere else, without the use of #ifdef
> > > CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
> > > The use of iovec is within a write operation on a char device.  It's
> > > not amenable to a compat_sys_ or a .compat_ioctl approach.
> > > 
> > > I'm partial to #1 because the use of architecture-independent fields
> > > matches the rest of struct sg_io_v4.  But if you don't want to have
> > > another iovec type in the kernel, could we do #2 but just return
> > > -EINVAL if the need for compat is detected?  I.e. change
> > > dout_iovec_count to dout_iovec_length and do the math?
> > 
> > If you are ok with removing the write/read interface and just have
> > ioctl, we could can handle comapt stuff like others do. But I think
> > that you (OSD people) really want to keep the write/read
> > interface. Sorry, I think that there is no workaround to support iovec
> > in bsg.
> 
> I don't care about read/write in particular.  But we do need some
> way to launch asynchronous SCSI commands, and currently read/write
> are the only way to do that in bsg.  The reason is to keep multiple
> spindles busy at the same time.

Won't multi-threading the ioctl calls achieve the same effect?  Or do
you trip over BKL there?

> How about these new ioctls instead of read/write:
> 
>     SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
>     SG_IO_TEST   - complete and return a previous req
>     SG_IO_WAIT   - wait for a req to finish, interruptibly
> 
> Then old write users will instead do ioctl SUBMIT.  Read users will
> do TEST for non-blocking fd, or WAIT for blocking.  And SG_IO could
> be implemented as SUBMIT + WAIT.
> 
> Then we can do compat_ioctl and convert up iovecs out-of-line before
> calling the normal functions.
> 
> Let me know if you want a patch for this.

Really, the thought of re-inventing yet another async I/O interface
isn't very appealing.

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-10 20:55         ` James Bottomley
@ 2008-01-10 21:46           ` Pete Wyckoff
  2008-01-10 21:54             ` James Bottomley
                               ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Pete Wyckoff @ 2008-01-10 21:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: FUJITA Tomonori, tomof, deepakrc, linux-scsi, linux-kernel

James.Bottomley@HansenPartnership.com wrote on Thu, 10 Jan 2008 14:55 -0600:
> On Thu, 2008-01-10 at 15:43 -0500, Pete Wyckoff wrote:
> > fujita.tomonori@lab.ntt.co.jp wrote on Wed, 09 Jan 2008 09:11 +0900:
> > > On Tue, 8 Jan 2008 17:09:18 -0500
> > > Pete Wyckoff <pw@osc.edu> wrote:
> > > > I took another look at the compat approach, to see if it is feasible
> > > > to keep the compat handling somewhere else, without the use of #ifdef
> > > > CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
> > > > The use of iovec is within a write operation on a char device.  It's
> > > > not amenable to a compat_sys_ or a .compat_ioctl approach.
> > > > 
> > > > I'm partial to #1 because the use of architecture-independent fields
> > > > matches the rest of struct sg_io_v4.  But if you don't want to have
> > > > another iovec type in the kernel, could we do #2 but just return
> > > > -EINVAL if the need for compat is detected?  I.e. change
> > > > dout_iovec_count to dout_iovec_length and do the math?
> > > 
> > > If you are ok with removing the write/read interface and just have
> > > ioctl, we could can handle comapt stuff like others do. But I think
> > > that you (OSD people) really want to keep the write/read
> > > interface. Sorry, I think that there is no workaround to support iovec
> > > in bsg.
> > 
> > I don't care about read/write in particular.  But we do need some
> > way to launch asynchronous SCSI commands, and currently read/write
> > are the only way to do that in bsg.  The reason is to keep multiple
> > spindles busy at the same time.
> 
> Won't multi-threading the ioctl calls achieve the same effect?  Or do
> you trip over BKL there?

There's no BKL on (new) ioctls anymore, at least.  A thread per
device would be feasible perhaps.  But if you want any sort of
pipelining out of the device, esp. in the remote iSCSI case, you
need to have a good number of commands outstanding to each device.
So a thread per command per device.  Typical iSCSI queue depth of
128 times 16 devices for a small setup is a lot of threads.

The pthread/pipe latency overhead is not insignificant for fast
storage networks too.

> > How about these new ioctls instead of read/write:
> > 
> >     SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
> >     SG_IO_TEST   - complete and return a previous req
> >     SG_IO_WAIT   - wait for a req to finish, interruptibly
> > 
> > Then old write users will instead do ioctl SUBMIT.  Read users will
> > do TEST for non-blocking fd, or WAIT for blocking.  And SG_IO could
> > be implemented as SUBMIT + WAIT.
> > 
> > Then we can do compat_ioctl and convert up iovecs out-of-line before
> > calling the normal functions.
> > 
> > Let me know if you want a patch for this.
> 
> Really, the thought of re-inventing yet another async I/O interface
> isn't very appealing.

I'm fine with read/write, except Tomo is against handling iovecs
because of the compat complexity with struct iovec being different
on 32- vs 64-bit.  There is a standard way to do "compat" ioctl that
hides this handling in a different file (not bsg.c), which is the
only reason I'm even considering these ioctls.  I don't care about
compat setups per se.

Is there another async I/O mechanism?  Userspace builds the CDBs,
just needs some way to drop them in SCSI ML.  BSG is almost perfect
for this, but doesn't do iovec, leading to lots of memcpy.

		-- Pete

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-10 21:46           ` Pete Wyckoff
@ 2008-01-10 21:54             ` James Bottomley
  2008-01-12  0:16               ` Douglas Gilbert
  2008-01-10 22:33             ` Mark Rustad
  2008-01-11  5:42             ` FUJITA Tomonori
  2 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2008-01-10 21:54 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: FUJITA Tomonori, tomof, deepakrc, linux-scsi, linux-kernel


On Thu, 2008-01-10 at 16:46 -0500, Pete Wyckoff wrote:
> James.Bottomley@HansenPartnership.com wrote on Thu, 10 Jan 2008 14:55 -0600:
> > On Thu, 2008-01-10 at 15:43 -0500, Pete Wyckoff wrote:
> > > fujita.tomonori@lab.ntt.co.jp wrote on Wed, 09 Jan 2008 09:11 +0900:
> > > > On Tue, 8 Jan 2008 17:09:18 -0500
> > > > Pete Wyckoff <pw@osc.edu> wrote:
> > > > > I took another look at the compat approach, to see if it is feasible
> > > > > to keep the compat handling somewhere else, without the use of #ifdef
> > > > > CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
> > > > > The use of iovec is within a write operation on a char device.  It's
> > > > > not amenable to a compat_sys_ or a .compat_ioctl approach.
> > > > > 
> > > > > I'm partial to #1 because the use of architecture-independent fields
> > > > > matches the rest of struct sg_io_v4.  But if you don't want to have
> > > > > another iovec type in the kernel, could we do #2 but just return
> > > > > -EINVAL if the need for compat is detected?  I.e. change
> > > > > dout_iovec_count to dout_iovec_length and do the math?
> > > > 
> > > > If you are ok with removing the write/read interface and just have
> > > > ioctl, we could can handle comapt stuff like others do. But I think
> > > > that you (OSD people) really want to keep the write/read
> > > > interface. Sorry, I think that there is no workaround to support iovec
> > > > in bsg.
> > > 
> > > I don't care about read/write in particular.  But we do need some
> > > way to launch asynchronous SCSI commands, and currently read/write
> > > are the only way to do that in bsg.  The reason is to keep multiple
> > > spindles busy at the same time.
> > 
> > Won't multi-threading the ioctl calls achieve the same effect?  Or do
> > you trip over BKL there?
> 
> There's no BKL on (new) ioctls anymore, at least.  A thread per
> device would be feasible perhaps.  But if you want any sort of
> pipelining out of the device, esp. in the remote iSCSI case, you
> need to have a good number of commands outstanding to each device.
> So a thread per command per device.  Typical iSCSI queue depth of
> 128 times 16 devices for a small setup is a lot of threads.

I was actually thinking of a thread per outstanding command.

> The pthread/pipe latency overhead is not insignificant for fast
> storage networks too.
> 
> > > How about these new ioctls instead of read/write:
> > > 
> > >     SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
> > >     SG_IO_TEST   - complete and return a previous req
> > >     SG_IO_WAIT   - wait for a req to finish, interruptibly
> > > 
> > > Then old write users will instead do ioctl SUBMIT.  Read users will
> > > do TEST for non-blocking fd, or WAIT for blocking.  And SG_IO could
> > > be implemented as SUBMIT + WAIT.
> > > 
> > > Then we can do compat_ioctl and convert up iovecs out-of-line before
> > > calling the normal functions.
> > > 
> > > Let me know if you want a patch for this.
> > 
> > Really, the thought of re-inventing yet another async I/O interface
> > isn't very appealing.
> 
> I'm fine with read/write, except Tomo is against handling iovecs
> because of the compat complexity with struct iovec being different
> on 32- vs 64-bit.  There is a standard way to do "compat" ioctl that
> hides this handling in a different file (not bsg.c), which is the
> only reason I'm even considering these ioctls.  I don't care about
> compat setups per se.
> 
> Is there another async I/O mechanism?  Userspace builds the CDBs,
> just needs some way to drop them in SCSI ML.  BSG is almost perfect
> for this, but doesn't do iovec, leading to lots of memcpy.

No, it's just that async interfaces in Linux have a long and fairly
unhappy history.

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-10 21:46           ` Pete Wyckoff
  2008-01-10 21:54             ` James Bottomley
@ 2008-01-10 22:33             ` Mark Rustad
  2008-01-11  5:42             ` FUJITA Tomonori
  2 siblings, 0 replies; 12+ messages in thread
From: Mark Rustad @ 2008-01-10 22:33 UTC (permalink / raw)
  To: Pete Wyckoff
  Cc: James Bottomley, FUJITA Tomonori, tomof, deepakrc, linux-scsi,
	linux-kernel

On Jan 10, 2008, at 3:46 PM, Pete Wyckoff wrote:

> I'm fine with read/write, except Tomo is against handling iovecs
> because of the compat complexity with struct iovec being different
> on 32- vs 64-bit.  There is a standard way to do "compat" ioctl that
> hides this handling in a different file (not bsg.c), which is the
> only reason I'm even considering these ioctls.  I don't care about
> compat setups per se.

That is what I was thinking. Is it really necessary to support  
something like bsg for 32 on 64 bit? Yes, it is a userspace interface,  
but it isn't the kind of thing that normal user programs would use. It  
is a new interface for newly-written programs and I would expect those  
to be native. At least it doesn't strike me as overly-restrictive for  
that to be the case for this kind of interface.

> Is there another async I/O mechanism?  Userspace builds the CDBs,
> just needs some way to drop them in SCSI ML.  BSG is almost perfect
> for this, but doesn't do iovec, leading to lots of memcpy.


Precisely.

-- 
Mark Rustad, MRustad@gmail.com



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-10 21:46           ` Pete Wyckoff
  2008-01-10 21:54             ` James Bottomley
  2008-01-10 22:33             ` Mark Rustad
@ 2008-01-11  5:42             ` FUJITA Tomonori
  2 siblings, 0 replies; 12+ messages in thread
From: FUJITA Tomonori @ 2008-01-11  5:42 UTC (permalink / raw)
  To: pw
  Cc: James.Bottomley, fujita.tomonori, tomof, deepakrc, linux-scsi,
	linux-kernel

On Thu, 10 Jan 2008 16:46:05 -0500
Pete Wyckoff <pw@osc.edu> wrote:

> Is there another async I/O mechanism?  Userspace builds the CDBs,
> just needs some way to drop them in SCSI ML.  BSG is almost perfect
> for this, but doesn't do iovec, leading to lots of memcpy.

syslets?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-10 21:54             ` James Bottomley
@ 2008-01-12  0:16               ` Douglas Gilbert
  2008-01-14 16:18                 ` Pete Wyckoff
  0 siblings, 1 reply; 12+ messages in thread
From: Douglas Gilbert @ 2008-01-12  0:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pete Wyckoff, FUJITA Tomonori, tomof, deepakrc, linux-scsi, linux-kernel

James Bottomley wrote:
> On Thu, 2008-01-10 at 16:46 -0500, Pete Wyckoff wrote:
>> James.Bottomley@HansenPartnership.com wrote on Thu, 10 Jan 2008 14:55 -0600:
>>> On Thu, 2008-01-10 at 15:43 -0500, Pete Wyckoff wrote:
>>>> fujita.tomonori@lab.ntt.co.jp wrote on Wed, 09 Jan 2008 09:11 +0900:
>>>>> On Tue, 8 Jan 2008 17:09:18 -0500
>>>>> Pete Wyckoff <pw@osc.edu> wrote:
>>>>>> I took another look at the compat approach, to see if it is feasible
>>>>>> to keep the compat handling somewhere else, without the use of #ifdef
>>>>>> CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
>>>>>> The use of iovec is within a write operation on a char device.  It's
>>>>>> not amenable to a compat_sys_ or a .compat_ioctl approach.
>>>>>>
>>>>>> I'm partial to #1 because the use of architecture-independent fields
>>>>>> matches the rest of struct sg_io_v4.  But if you don't want to have
>>>>>> another iovec type in the kernel, could we do #2 but just return
>>>>>> -EINVAL if the need for compat is detected?  I.e. change
>>>>>> dout_iovec_count to dout_iovec_length and do the math?
>>>>> If you are ok with removing the write/read interface and just have
>>>>> ioctl, we could can handle comapt stuff like others do. But I think
>>>>> that you (OSD people) really want to keep the write/read
>>>>> interface. Sorry, I think that there is no workaround to support iovec
>>>>> in bsg.
>>>> I don't care about read/write in particular.  But we do need some
>>>> way to launch asynchronous SCSI commands, and currently read/write
>>>> are the only way to do that in bsg.  The reason is to keep multiple
>>>> spindles busy at the same time.
>>> Won't multi-threading the ioctl calls achieve the same effect?  Or do
>>> you trip over BKL there?
>> There's no BKL on (new) ioctls anymore, at least.  A thread per
>> device would be feasible perhaps.  But if you want any sort of
>> pipelining out of the device, esp. in the remote iSCSI case, you
>> need to have a good number of commands outstanding to each device.
>> So a thread per command per device.  Typical iSCSI queue depth of
>> 128 times 16 devices for a small setup is a lot of threads.
> 
> I was actually thinking of a thread per outstanding command.
> 
>> The pthread/pipe latency overhead is not insignificant for fast
>> storage networks too.
>>
>>>> How about these new ioctls instead of read/write:
>>>>
>>>>     SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
>>>>     SG_IO_TEST   - complete and return a previous req
>>>>     SG_IO_WAIT   - wait for a req to finish, interruptibly
>>>>
>>>> Then old write users will instead do ioctl SUBMIT.  Read users will
>>>> do TEST for non-blocking fd, or WAIT for blocking.  And SG_IO could
>>>> be implemented as SUBMIT + WAIT.
>>>>
>>>> Then we can do compat_ioctl and convert up iovecs out-of-line before
>>>> calling the normal functions.
>>>>
>>>> Let me know if you want a patch for this.
>>> Really, the thought of re-inventing yet another async I/O interface
>>> isn't very appealing.
>> I'm fine with read/write, except Tomo is against handling iovecs
>> because of the compat complexity with struct iovec being different
>> on 32- vs 64-bit.  There is a standard way to do "compat" ioctl that
>> hides this handling in a different file (not bsg.c), which is the
>> only reason I'm even considering these ioctls.  I don't care about
>> compat setups per se.
>>
>> Is there another async I/O mechanism?  Userspace builds the CDBs,
>> just needs some way to drop them in SCSI ML.  BSG is almost perfect
>> for this, but doesn't do iovec, leading to lots of memcpy.
> 
> No, it's just that async interfaces in Linux have a long and fairly
> unhappy history.

The sg driver's async interface has been pretty stable for
a long time. The sync SG_IO ioctl is built on top of the
async interface. That makes the async interface extremely
well tested.

The write()/read() async interface in sg does have one
problem: when a command is dispatched via a write()
it would be very useful to get back a tag but that
violates write()'s second argument: 'const void * buf'.
That tag could be useful both for identification of the
response and by task management functions.

I was hoping that the 'flags' field in sgv4 could be used
to implement the variants:
     SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
     SG_IO_TEST   - complete and return a previous req
     SG_IO_WAIT   - wait for a req to finish, interruptibly

that way the existing SG_IO ioctl is sufficient.

And if Tomo doesn't want to do it in the bsg driver,
then it could be done it the sg driver.

Doug Gilbert

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bsg : Add support for io vectors in bsg
  2008-01-12  0:16               ` Douglas Gilbert
@ 2008-01-14 16:18                 ` Pete Wyckoff
  0 siblings, 0 replies; 12+ messages in thread
From: Pete Wyckoff @ 2008-01-14 16:18 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: James Bottomley, FUJITA Tomonori, tomof, deepakrc, linux-scsi,
	linux-kernel

dougg@torque.net wrote on Fri, 11 Jan 2008 19:16 -0500:
> James Bottomley wrote:
>> On Thu, 2008-01-10 at 16:46 -0500, Pete Wyckoff wrote:
>>> James.Bottomley@HansenPartnership.com wrote on Thu, 10 Jan 2008 14:55 -0600:
>>>> On Thu, 2008-01-10 at 15:43 -0500, Pete Wyckoff wrote:
>>>>> fujita.tomonori@lab.ntt.co.jp wrote on Wed, 09 Jan 2008 09:11 +0900:
>>>>>> On Tue, 8 Jan 2008 17:09:18 -0500
>>>>>> Pete Wyckoff <pw@osc.edu> wrote:
>>>>>>> I took another look at the compat approach, to see if it is feasible
>>>>>>> to keep the compat handling somewhere else, without the use of #ifdef
>>>>>>> CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
>>>>>>> The use of iovec is within a write operation on a char device.  It's
>>>>>>> not amenable to a compat_sys_ or a .compat_ioctl approach.
>>>>>>>
>>>>>>> I'm partial to #1 because the use of architecture-independent fields
>>>>>>> matches the rest of struct sg_io_v4.  But if you don't want to have
>>>>>>> another iovec type in the kernel, could we do #2 but just return
>>>>>>> -EINVAL if the need for compat is detected?  I.e. change
>>>>>>> dout_iovec_count to dout_iovec_length and do the math?
>>>>>> If you are ok with removing the write/read interface and just have
>>>>>> ioctl, we could can handle comapt stuff like others do. But I think
>>>>>> that you (OSD people) really want to keep the write/read
>>>>>> interface. Sorry, I think that there is no workaround to support iovec
>>>>>> in bsg.
>>>>> I don't care about read/write in particular.  But we do need some
>>>>> way to launch asynchronous SCSI commands, and currently read/write
>>>>> are the only way to do that in bsg.  The reason is to keep multiple
>>>>> spindles busy at the same time.
>>>> Won't multi-threading the ioctl calls achieve the same effect?  Or do
>>>> you trip over BKL there?
>>> There's no BKL on (new) ioctls anymore, at least.  A thread per
>>> device would be feasible perhaps.  But if you want any sort of
>>> pipelining out of the device, esp. in the remote iSCSI case, you
>>> need to have a good number of commands outstanding to each device.
>>> So a thread per command per device.  Typical iSCSI queue depth of
>>> 128 times 16 devices for a small setup is a lot of threads.
>>
>> I was actually thinking of a thread per outstanding command.
>>
>>> The pthread/pipe latency overhead is not insignificant for fast
>>> storage networks too.
>>>
>>>>> How about these new ioctls instead of read/write:
>>>>>
>>>>>     SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
>>>>>     SG_IO_TEST   - complete and return a previous req
>>>>>     SG_IO_WAIT   - wait for a req to finish, interruptibly
>>>>>
>>>>> Then old write users will instead do ioctl SUBMIT.  Read users will
>>>>> do TEST for non-blocking fd, or WAIT for blocking.  And SG_IO could
>>>>> be implemented as SUBMIT + WAIT.
>>>>>
>>>>> Then we can do compat_ioctl and convert up iovecs out-of-line before
>>>>> calling the normal functions.
>>>>>
>>>>> Let me know if you want a patch for this.
>>>> Really, the thought of re-inventing yet another async I/O interface
>>>> isn't very appealing.
>>> I'm fine with read/write, except Tomo is against handling iovecs
>>> because of the compat complexity with struct iovec being different
>>> on 32- vs 64-bit.  There is a standard way to do "compat" ioctl that
>>> hides this handling in a different file (not bsg.c), which is the
>>> only reason I'm even considering these ioctls.  I don't care about
>>> compat setups per se.
>>>
>>> Is there another async I/O mechanism?  Userspace builds the CDBs,
>>> just needs some way to drop them in SCSI ML.  BSG is almost perfect
>>> for this, but doesn't do iovec, leading to lots of memcpy.
>>
>> No, it's just that async interfaces in Linux have a long and fairly
>> unhappy history.
>
> The sg driver's async interface has been pretty stable for
> a long time. The sync SG_IO ioctl is built on top of the
> async interface. That makes the async interface extremely
> well tested.
>
> The write()/read() async interface in sg does have one
> problem: when a command is dispatched via a write()
> it would be very useful to get back a tag but that
> violates write()'s second argument: 'const void * buf'.
> That tag could be useful both for identification of the
> response and by task management functions.
>
> I was hoping that the 'flags' field in sgv4 could be used
> to implement the variants:
>     SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
>     SG_IO_TEST   - complete and return a previous req
>     SG_IO_WAIT   - wait for a req to finish, interruptibly
>
> that way the existing SG_IO ioctl is sufficient.
>
> And if Tomo doesn't want to do it in the bsg driver,
> then it could be done it the sg driver.

The sg driver already has async via read/write, and it works fine.
Perhaps someone wants the ioctl versions too, but that's not my main
goal here.

I think that sg doesn't bother with compat iovec handling.  It just
uses SZ_SG_IOVEC (defined as sizeof(sg_iovec_t)) and doesn't check
if the userspace iovec happens to be smaller.  Sg does have a compat
ioctl function; it just doesn't support SG_IO.  So, on 64-bit
kernel, read/write from 32-bit userspace with iovec will get
undefined results due to the mis-interpretation of the iovec fields,
while ioctl from 32-bit will fail with ENOIOCTLCMD (EINVAL to
userspace).  Doesn't bother me at all, just an observation.  I'd
love to be able to take a similar approach with bsg: only support
iovec in 32-32 or 64-64 environments where kernel iovec == user
iovec.

I'm not going to patch up sg to add the SG_IO async ioctls.  My
greedy need requires bidirectional transfers and big CDBs, both of
which could be hacked into sg_io_hdr_t and sg, but it sure wouldn't
be pretty.  These were some of the reasons you proposed sgv4 as I
recall.

		-- Pete

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-01-14 16:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-04 16:17 [PATCH] bsg : Add support for io vectors in bsg Deepak Colluru
2008-01-05  5:01 ` FUJITA Tomonori
2008-01-08 22:09   ` Pete Wyckoff
2008-01-09  0:11     ` FUJITA Tomonori
2008-01-10 20:43       ` Pete Wyckoff
2008-01-10 20:55         ` James Bottomley
2008-01-10 21:46           ` Pete Wyckoff
2008-01-10 21:54             ` James Bottomley
2008-01-12  0:16               ` Douglas Gilbert
2008-01-14 16:18                 ` Pete Wyckoff
2008-01-10 22:33             ` Mark Rustad
2008-01-11  5:42             ` FUJITA Tomonori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).