LKML Archive on
help / color / mirror / Atom feed
* [RFC 0/6] bidi support: block, SCSI, and iSCSI layers
@ 2007-01-21 23:04 Boaz Harrosh
  0 siblings, 0 replies; only message in thread
From: Boaz Harrosh @ 2007-01-21 23:04 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Mike Christie
  Cc: linux-scsi, linux-kernel, open-iscsi, Daniel.E.Messinger,
	Liran Schour, Benny Halevy

Following are 6 (large) patches that introduce support
for bidirectional requests in the kernel.
Since all this is going to interfere with everyone's
work, let us all comment on the implementation, naming,
and future directions. (or forever hold your peace).

The submitted work is against linux-2.6-block tree of
2007/01/15. It compiles with make allmodconfig in both
i386 and x86_64, and it boots and runs on my x86_64
test machines. The target kernel is able to compile a
Linux-kernel, burn and read cd, and pass iSCSI and OSD
tests. OSD tests are the ones who exercise BIDI I/O.

Patches summary:
1. data direction
	- add support for enum dma_data_direction in struct request
	  in preparation to full bidi support.
	- removed rq_data_dir() and added other APIs for querying request's direction.
	- fix usage of rq_data_dir() and peeking at req->cmd_flags & REQ_RW to using
	  new api.
	- clean-up bad usage of DMA_BIDIRECTIONAL and bzero of "black" requests,
	  to use the new blk_rq_init_unqueued_req()

2. request_io_part
	- extract io related fields in struct request into struct request_io_part
	  in preparation to full bidi support.
	- use new API for accessing the new sub-structure

3. bidi request
	- add one more request_io_part member for bidi support in struct request.
	- add block layer API functions for mapping and accessing bidi data buffers
	  and for ending a block request as a whole (end_that_request_block())

4. bidi-scsi
	- add bidi support at the scsi layer
	- new scsi api functions to generate bidi capable requests
	- new scsi api for accessing scsi_cmnd_buff (similar to request_io_part
	  but not yet implemented as a sub-structure)

5. varlen cdb
	- add support for variable length (or just longer than 16 bytes) SCSI CDBs
	- add scsi device type for OSD devices (will be separated in actual patchset)

6. iscsi bidi varlen
	- add support for bidi and variable length commands at the iscsi layer
	  for the tcp transport. (will also be separated and sent through the open-iscsi project)

Developer comments: (long, can be skipped)
1. This part borrows from struct scsi_cmnd use of an enum dma_data_direction member, to keep
everything the same.

1.A. It could be done in a more backward compatible way but it was a good chance for some clean-up
  (see the mess with BIO flags that used to be the same but no longer are). I also believe
  that two ways to describe the same thing will always come to haunt you later.

1.B. Speaking of which, the PCI_DMA_XXX constants are a pain being the same constants but #defined and
  untyped. It confused me in my bidi-bug hunting and it seems I'm not the only one.
  I would suggest boldly inverting enum dma_data_direction from its active-low DMA logic and
  change its name to just enum data_direction. Then, define the PCI_DMA_XXX constants as a new enum,
  and accept that new enum at all the dma_XXX mapping APIs, instead of the current u32. Note
  that this is cardinal now because bidirectional means __different_things__ in struct request
  and in DMA (or memory) mapping. the first means bidirectional access to __same__ buffer, the later
  means __two_sets__ of buffers at one request. (each buffer mapped for uni-directional access).

2. Separation of IO members into two sub structures: There are 2 possible approaches here and each approach
can have a few implementations.

2.A. First approach is to keep everything backward compatible and have a bidi sub-structure on the side.
  This approach can be seen in the attached patch for bidi support in SCSI layer. It keeps the patch to
  a minimum but is hell on readability & maintenance. It also puts a performance penalty on users like iscsi
  who wants to use a generic bidi-api.

2.B. Second approach is what is seen here. Separate all I/O members to a sub-structure, craft an
  API to access each part and a UNI api that wants to access buffers knowing they are uni-

3.C. The second approach can be implemented as I did it: one member for uni/bidi-write and second
  for bidi-read or alternatively, using one sub-structure for "read" and another for "write". I hope
  I have built it in  such a way that this choice is merely an "implementation detail" only concerning
  the block layer and hidden behind proper access functions.

3.D. I have currently decided on the first approach for performance reasons, since 99.99% of kernel
  is uni-directional (only exception is proposed iscsi). What will be in the future? I'm not sure,
  If most bus protocols are like iSCSI (and SCSI) where there is a clear separation, and concurrency,
  between write and read states. then second approach will be the logical choice. Or maybe it is all
  exact same programming safe the toggle of a direction bit in a register, then the first approach
  wins. In any way the second approach needs a code change.
Boaz, I don't know what to do with this paragraph... I'm not sure I understand what you mean here...

3.E. One more thing I want to comment on this patch is that most changes are indications of old/badly
  coded drivers. There are plenty of good BIO and block/request APIs for proper IO access and
  advancement. There is no real need in the kernel today to directly access these members (And
  most new code does not). All these drivers need to be cleaned up.

4. SCSI layer: This is mostly the bidi implementation I sent 2 months ago but now simplified
  since the actual bidi support is done in struct request so what's left is more of an API addition. I have
  kept it like this so people can see the "backward compatible" approach for implementing bidi support.
  I must warn here, going the above "request" way, will produce a much bigger patch than all above 4 put together.
  Which is not a bad thing, only that I would like to do it one at a time, sort of sneak it in more slowly.
  (and also let people comment on what they like better).

4.A. The bidi API is just there to serve IBM's OSD driver and is not a must. One can bypass scsi-lib
  and just queue a bidi request in a generic way and if that device happens to be SCSI than it will all
  just work. But on the other hand there is nothing bad in an API that makes everything nice &

4.B. TODO: Implementation of bidi residual count.

4.C. TODO: API change (and clean-up) of the done function on the async route. In my opinion the API should
  receive a request pointer and user will extract any info he needs.

5. OSD && Varlen.

5.A. This patch implements support for variable length cdbs and large (>16) vendor specific commands. The
  actual support is at the request level, and is used by the SCSI API. Unlike regular commands
  a user of this fixture needs to keep the command's memory buffer valid until the end of the request execution.
  just like it does with its read/write and sense buffers.

5.B. Includes also a 2 liner to support OSD type SCSI devices.

6. iSCSI bidi support.
  It is given here as an example of a bidi/varlen-cdb supporting SCSI transport and driver.
  The actual patch will go through the open-iscsi project, after the appropriate SCSI and block
  support for bidi will exist in the kernel. At which time it will be broken into 3 separate patches.

6.A. Side note: The iSCSI tests pass only when I comment out line 1643 in libiscsi.c at
  iscsi_conn_start() function. (second if). This is an old issue with iscsi tests and the iscsitarget
  project. I will send a separate e-mail about this to the open-iscsi mailing list.


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2007-01-22  1:00 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-21 23:04 [RFC 0/6] bidi support: block, SCSI, and iSCSI layers Boaz Harrosh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).