LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* New 2.6.24.2 SG_IO SCSI problems
@ 2008-02-21 15:15 Mark Hounschell
  2008-02-21 15:41 ` James Bottomley
  2008-02-22 16:50 ` Mike Christie
  0 siblings, 2 replies; 15+ messages in thread
From: Mark Hounschell @ 2008-02-21 15:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel

I seem to have run into some sort of regression in the SG_IO interface of 2.6.24.2. 
I have an application that up until 2.6.24 worked fine. The 2.6.23.16 kernel works fine.

During reads I get these kernel messages. Writes and other functions _seem_ OK. Actually basic
reads  are working. Its with large BC reads using an io_vec list that the problem shows up.

Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 09:27:51 harley kernel: sg[0] - Addr 0x06256100 : Length 256
Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 09:27:51 harley kernel: sg[0] - Addr 0x06256100 : Length 256
Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 09:27:51 harley kernel: sg[0] - Addr 0x06256100 : Length 256
Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 09:27:51 harley kernel: sg[0] - Addr 0x06256100 : Length 256
.
.
.
.
.
.

The status elements of the sg_io_hdr_t structure used in the application returns
 status = 0x0 msg_status 0x0 host_status = 0x7 driver_status = 0x0

The hardware in use on this particular machine is an simple Adaptec AHA-2930CU talking to an old
IMPRIMIS 94601-15 1.2GB disk drive.

Again, all this works fine with the 2.6.23.11 kernel

Any help would be appreciated

Regards
Mark Hounschell


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-21 15:15 New 2.6.24.2 SG_IO SCSI problems Mark Hounschell
@ 2008-02-21 15:41 ` James Bottomley
  2008-02-21 16:21   ` Mark Hounschell
  2008-02-22 16:50 ` Mike Christie
  1 sibling, 1 reply; 15+ messages in thread
From: James Bottomley @ 2008-02-21 15:41 UTC (permalink / raw)
  To: markh; +Cc: linux-scsi, linux-kernel

On Thu, 2008-02-21 at 10:15 -0500, Mark Hounschell wrote:
> I seem to have run into some sort of regression in the SG_IO interface of 2.6.24.2. 
> I have an application that up until 2.6.24 worked fine. The 2.6.23.16 kernel works fine.
> 
> During reads I get these kernel messages. Writes and other functions _seem_ OK. Actually basic
> reads  are working. Its with large BC reads using an io_vec list that the problem shows up.
> 
> Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
> Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
> Feb 21 09:27:51 harley kernel: sg[0] - Addr 0x06256100 : Length 256

Help me a little here.  What was the io_vec and command you sent in to
produce this?  The aic debugging information implies a single element sg
list for a 256 byte read.

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-21 15:41 ` James Bottomley
@ 2008-02-21 16:21   ` Mark Hounschell
  2008-02-22 10:03     ` Mark Hounschell
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Hounschell @ 2008-02-21 16:21 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-kernel

James Bottomley wrote:
> On Thu, 2008-02-21 at 10:15 -0500, Mark Hounschell wrote:
>> I seem to have run into some sort of regression in the SG_IO interface of 2.6.24.2. 
>> I have an application that up until 2.6.24 worked fine. The 2.6.23.16 kernel works fine.
>>
>> During reads I get these kernel messages. Writes and other functions _seem_ OK. Actually basic
>> reads  are working. Its with large BC reads using an io_vec list that the problem shows up.
>>
>> Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
>> Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
>> Feb 21 09:27:51 harley kernel: sg[0] - Addr 0x06256100 : Length 256
> 
> Help me a little here.  What was the io_vec and command you sent in to
> produce this?  The aic debugging information implies a single element sg
> list for a 256 byte read.
> 
> James
> 
> 
> 

Well, I did no 256 byte xfer at all.  My failing io_vec list has 6 elements.
The first 5 are for byte counts of 0xfffc and the last 0x6114. See below.

This is some debug info from within my app of the commands leading up the failure:
If you need actual values of the io_vec lists I will need to add some additional debug
info into the app. I will do if needed.

The disk BTW is formated at 768 byte sector size.
 
The first read has a 2 element io_vec list and reports no error:
ScsiDev_thread_7e00: Read CBD = 0x08 0x00 0x00 0x00 0x01 0x00
ScsiDev_thread_7e00: Read1(1) bc = 0x0078 addr = 0x000000 Skip 0
ScsiDev_thread_7e00: Read(2) bc = 0x0288 addr = 0xb6cea368 Skip 1
ScsiDev_thread_7e00: SRead  DC ops = 2 short_bc = 288

The second read has a 4 element io_vec list and reports no error:
ScsiDev_thread_7e00: Read CBD = 0x08 0x00 0x00 0x00 0x05 0x00
ScsiDev_thread_7e00: Read1(1) bc = 0x0780 addr = 0x000000 Skip 0
ScsiDev_thread_7e00: Read1(2) bc = 0x0670 addr = 0x000000 Skip 1
ScsiDev_thread_7e00: Read1(3) bc = 0x003c addr = 0x000200 Skip 0
ScsiDev_thread_7e00: SRead(4) bc = 0x022c addr = 0xb6cea368 Skip 1
ScsiDev_thread_7e00: Read  DC ops = 4 short_bc = 22c

There is a seek here that reports no error:
ScsiDev_thread_7e00: Seek address 2752 BPS 768


This read has a 6 element io_vec list and reports  the error to the app:
ScsiDev_thread_7e00: ReadX CBD = 0x28 0x00 0x00 0x00 0x27 0x52 0x00 0x01 0xcb 0x00
ScsiDev_thread_7e00: Read1(1) bc = 0xfffc addr = 0x000784 Skip 0
ScsiDev_thread_7e00: Read1(2) bc = 0xfffc addr = 0x010780 Skip 0
ScsiDev_thread_7e00: Read1(3) bc = 0xfffc addr = 0x02077c Skip 0
ScsiDev_thread_7e00: Read1(4) bc = 0xfffc addr = 0x030778 Skip 0
ScsiDev_thread_7e00: Read1(5) bc = 0xfffc addr = 0x040774 Skip 0
ScsiDev_thread_7e00: Read1(6) bc = 0x6114 addr = 0x050770 Skip 0
ScsiDev_thread_7e00: Read  DC ops = 6 short_bc = 0

ScsiDev_thread_7e00: scsi = 0x0 msg 0x0 host = 0x00000007 driver = 0x00000000
ScsiDev_thread_7e00: Read error: sns = 0x00 residual = 0x0000
ScsiDev_thread_7e00: Posting IPL status 0x00000090 0x000e0000 for Suba 0000 to loc 0x000000
ScsiDev_thread_7e00: Sleeping!!


Below is the complete dump of kernel messages for the above. I assume they are
a result of the last failing read but there is an awfull lot there just for that 6 element
io_vec list. Sorry to put all this in there but wanted to you to get the idea.


Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256

Again, sorry for all that.

regards
Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-21 16:21   ` Mark Hounschell
@ 2008-02-22 10:03     ` Mark Hounschell
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Hounschell @ 2008-02-22 10:03 UTC (permalink / raw)
  To: markh; +Cc: James Bottomley, linux-scsi, linux-kernel

Mark Hounschell wrote:
> James Bottomley wrote:
>> On Thu, 2008-02-21 at 10:15 -0500, Mark Hounschell wrote:
>>> I seem to have run into some sort of regression in the SG_IO interface of 2.6.24.2. 
>>> I have an application that up until 2.6.24 worked fine. The 2.6.23.16 kernel works fine.
>>>
>>> During reads I get these kernel messages. Writes and other functions _seem_ OK. Actually basic
>>> reads  are working. Its with large BC reads using an io_vec list that the problem shows up.
>>>
>>> Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
>>> Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
>>> Feb 21 09:27:51 harley kernel: sg[0] - Addr 0x06256100 : Length 256
>> Help me a little here.  What was the io_vec and command you sent in to
>> produce this?  The aic debugging information implies a single element sg
>> list for a 256 byte read.
>>
>> James
>>
>>
>>
> 
> Well, I did no 256 byte xfer at all.  My failing io_vec list has 6 elements.
> The first 5 are for byte counts of 0xfffc and the last 0x6114. See below.
> 
> This is some debug info from within my app of the commands leading up the failure:
> If you need actual values of the io_vec lists I will need to add some additional debug
> info into the app. I will do if needed.
> 
> The disk BTW is formated at 768 byte sector size.
>  
> The first read has a 2 element io_vec list and reports no error:
> ScsiDev_thread_7e00: Read CBD = 0x08 0x00 0x00 0x00 0x01 0x00
> ScsiDev_thread_7e00: Read1(1) bc = 0x0078 addr = 0x000000 Skip 0
> ScsiDev_thread_7e00: Read(2) bc = 0x0288 addr = 0xb6cea368 Skip 1
> ScsiDev_thread_7e00: SRead  DC ops = 2 short_bc = 288
> 
> The second read has a 4 element io_vec list and reports no error:
> ScsiDev_thread_7e00: Read CBD = 0x08 0x00 0x00 0x00 0x05 0x00
> ScsiDev_thread_7e00: Read1(1) bc = 0x0780 addr = 0x000000 Skip 0
> ScsiDev_thread_7e00: Read1(2) bc = 0x0670 addr = 0x000000 Skip 1
> ScsiDev_thread_7e00: Read1(3) bc = 0x003c addr = 0x000200 Skip 0
> ScsiDev_thread_7e00: SRead(4) bc = 0x022c addr = 0xb6cea368 Skip 1
> ScsiDev_thread_7e00: Read  DC ops = 4 short_bc = 22c
> 
> There is a seek here that reports no error:
> ScsiDev_thread_7e00: Seek address 2752 BPS 768
> 
> 
> This read has a 6 element io_vec list and reports  the error to the app:
> ScsiDev_thread_7e00: ReadX CBD = 0x28 0x00 0x00 0x00 0x27 0x52 0x00 0x01 0xcb 0x00
> ScsiDev_thread_7e00: Read1(1) bc = 0xfffc addr = 0x000784 Skip 0
> ScsiDev_thread_7e00: Read1(2) bc = 0xfffc addr = 0x010780 Skip 0
> ScsiDev_thread_7e00: Read1(3) bc = 0xfffc addr = 0x02077c Skip 0
> ScsiDev_thread_7e00: Read1(4) bc = 0xfffc addr = 0x030778 Skip 0
> ScsiDev_thread_7e00: Read1(5) bc = 0xfffc addr = 0x040774 Skip 0
> ScsiDev_thread_7e00: Read1(6) bc = 0x6114 addr = 0x050770 Skip 0
> ScsiDev_thread_7e00: Read  DC ops = 6 short_bc = 0
> 
> ScsiDev_thread_7e00: scsi = 0x0 msg 0x0 host = 0x00000007 driver = 0x00000000
> ScsiDev_thread_7e00: Read error: sns = 0x00 residual = 0x0000
> ScsiDev_thread_7e00: Posting IPL status 0x00000090 0x000e0000 for Suba 0000 to loc 0x000000
> ScsiDev_thread_7e00: Sleeping!!
> 
> 
> Below is the complete dump of kernel messages for the above. I assume they are
> a result of the last failing read but there is an awfull lot there just for that 6 element
> io_vec list. Sorry to put all this in there but wanted to you to get the idea.
> 
> 
> Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
> Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
> Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
.
. Snip
.
> Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in Data-in phase.  Tag == 0x1.
> Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 256.  NumSGs = 1.
> Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
> 
> Again, sorry for all that.
> 

FWIW, on a different machine doing the same thing I get only a single
kernel message and NOT the tons shown above.

Regards
Mark


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-21 15:15 New 2.6.24.2 SG_IO SCSI problems Mark Hounschell
  2008-02-21 15:41 ` James Bottomley
@ 2008-02-22 16:50 ` Mike Christie
  2008-02-22 16:59   ` Mike Christie
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Christie @ 2008-02-22 16:50 UTC (permalink / raw)
  To: markh; +Cc: linux-scsi, linux-kernel

Mark Hounschell wrote:
> I seem to have run into some sort of regression in the SG_IO interface of 2.6.24.2. 
> I have an application that up until 2.6.24 worked fine. The 2.6.23.16 kernel works fine.
> 
> During reads I get these kernel messages. Writes and other functions _seem_ OK. Actually basic
> reads  are working. Its with large BC reads using an io_vec list that the problem shows up.
> 

Are you doing SG_IO to the sg device (/dev/sg*) or to the block device 
(/dev/sdX)?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-22 16:50 ` Mike Christie
@ 2008-02-22 16:59   ` Mike Christie
  2008-02-22 17:56     ` Mark Hounschell
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Christie @ 2008-02-22 16:59 UTC (permalink / raw)
  To: markh; +Cc: linux-scsi, linux-kernel

Mike Christie wrote:
> Mark Hounschell wrote:
>> I seem to have run into some sort of regression in the SG_IO interface 
>> of 2.6.24.2. I have an application that up until 2.6.24 worked fine. 
>> The 2.6.23.16 kernel works fine.
>>
>> During reads I get these kernel messages. Writes and other functions 
>> _seem_ OK. Actually basic
>> reads  are working. Its with large BC reads using an io_vec list that 
>> the problem shows up.
>>
> 
> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device 
> (/dev/sdX)?

If you are doing SG_IO to the sg device, then I know of one regression 
(well not regression exactly, but I fixed a bug but the patch got 
partially overwritten by another patch and that caused a new bug). Both 
bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing 
SG_IO to the sg device.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-22 16:59   ` Mike Christie
@ 2008-02-22 17:56     ` Mark Hounschell
  2008-02-22 21:38       ` Mark Hounschell
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Hounschell @ 2008-02-22 17:56 UTC (permalink / raw)
  To: Mike Christie; +Cc: linux-scsi, linux-kernel

Mike Christie wrote:
> Mike Christie wrote:
>> Mark Hounschell wrote:
>>> I seem to have run into some sort of regression in the SG_IO
>>> interface of 2.6.24.2. I have an application that up until 2.6.24
>>> worked fine. The 2.6.23.16 kernel works fine.
>>>
>>> During reads I get these kernel messages. Writes and other functions
>>> _seem_ OK. Actually basic
>>> reads  are working. Its with large BC reads using an io_vec list that
>>> the problem shows up.
>>>
>>
>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
>> (/dev/sdX)?
> 
> If you are doing SG_IO to the sg device, then I know of one regression
> (well not regression exactly, but I fixed a bug but the patch got
> partially overwritten by another patch and that caused a new bug). Both
> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
> SG_IO to the sg device.
> 

Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.

Thanks
Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-22 17:56     ` Mark Hounschell
@ 2008-02-22 21:38       ` Mark Hounschell
  2008-02-22 22:25         ` Mike Christie
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Hounschell @ 2008-02-22 21:38 UTC (permalink / raw)
  To: markh; +Cc: Mike Christie, linux-scsi, linux-kernel

Mark Hounschell wrote:
> Mike Christie wrote:
>> Mike Christie wrote:
>>> Mark Hounschell wrote:
>>>> I seem to have run into some sort of regression in the SG_IO
>>>> interface of 2.6.24.2. I have an application that up until 2.6.24
>>>> worked fine. The 2.6.23.16 kernel works fine.
>>>>
>>>> During reads I get these kernel messages. Writes and other functions
>>>> _seem_ OK. Actually basic
>>>> reads  are working. Its with large BC reads using an io_vec list that
>>>> the problem shows up.
>>>>
>>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
>>> (/dev/sdX)?
>> If you are doing SG_IO to the sg device, then I know of one regression
>> (well not regression exactly, but I fixed a bug but the patch got
>> partially overwritten by another patch and that caused a new bug). Both
>> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
>> SG_IO to the sg device.
>>
> 
> Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.
> 
> Thanks
> Mark
> -
 
2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a patch
lying around for 2.6.24.2??

Thanks
Mark


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-22 21:38       ` Mark Hounschell
@ 2008-02-22 22:25         ` Mike Christie
  2008-02-22 22:48           ` Tony Battersby
                             ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mike Christie @ 2008-02-22 22:25 UTC (permalink / raw)
  To: markh; +Cc: linux-scsi, linux-kernel, Tony Battersby

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

Mark Hounschell wrote:
> Mark Hounschell wrote:
>> Mike Christie wrote:
>>> Mike Christie wrote:
>>>> Mark Hounschell wrote:
>>>>> I seem to have run into some sort of regression in the SG_IO
>>>>> interface of 2.6.24.2. I have an application that up until 2.6.24
>>>>> worked fine. The 2.6.23.16 kernel works fine.
>>>>>
>>>>> During reads I get these kernel messages. Writes and other functions
>>>>> _seem_ OK. Actually basic
>>>>> reads  are working. Its with large BC reads using an io_vec list that
>>>>> the problem shows up.
>>>>>
>>>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
>>>> (/dev/sdX)?
>>> If you are doing SG_IO to the sg device, then I know of one regression
>>> (well not regression exactly, but I fixed a bug but the patch got
>>> partially overwritten by another patch and that caused a new bug). Both
>>> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
>>> SG_IO to the sg device.
>>>
>> Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.
>>
>> Thanks
>> Mark
>> -
>  
> 2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a patch
> lying around for 2.6.24.2??
> 

I attached a backport of the patch from Tony (added as cc) that is in 
2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it 
was this patch, then we can send it to stable.

[-- Attachment #2: fix-passthrough-bufflen.patch --]
[-- Type: text/x-patch, Size: 1443 bytes --]

Backport
76d78300a6eb8b7f08e47703b7e68a659ffc2053
to 2.6.24

>From Tony Battersby:

When sending a SCSI command to a tape drive via the SCSI Generic (sg)
driver, if the command has a data transfer length more than
scatter_elem_sz (32 KB default) and not a multiple of 512, then I either
hit BUG_ON(!valid_dma_direction(direction)) in dma_unmap_sg() or else
the command never completes (depending on the LLDD).

When constructing scatterlists, the sg driver rounds up the scatterlist
element sizes to be a multiple of 512.  This can result in
sum(scatterlist lengths) > bufflen.  In this case, scsi_req_map_sg()
incorrectly sets bio->bi_size to sum(scatterlist lengths) rather than to
bufflen.  When the command completes, req_bio_endio() detects that
bio->bi_size != 0, and so it doesn't call bio_endio().  This causes the
command to be resubmitted, resulting in BUG_ON or the command never
completing.

This patch makes scsi_req_map_sg() set bio->bi_size to bufflen rather
than to sum(scatterlist lengths), which fixes the problem.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>

--- linux-2.6.24.2/drivers/scsi/scsi_lib.c	2008-02-10 23:51:11.000000000 -0600
+++ linux-2.6.24.2.work/drivers/scsi/scsi_lib.c	2008-02-22 16:20:09.000000000 -0600
@@ -298,7 +298,6 @@ static int scsi_req_map_sg(struct reques
 		page = sg_page(sg);
 		off = sg->offset;
 		len = sg->length;
- 		data_len += len;
 
 		while (len > 0 && data_len > 0) {
 			/*

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-22 22:25         ` Mike Christie
@ 2008-02-22 22:48           ` Tony Battersby
  2008-02-23 11:16           ` Mark Hounschell
  2008-03-05 11:58           ` Mark Hounschell
  2 siblings, 0 replies; 15+ messages in thread
From: Tony Battersby @ 2008-02-22 22:48 UTC (permalink / raw)
  To: Mike Christie; +Cc: markh, linux-scsi, linux-kernel


> I attached a backport of the patch from Tony (added as cc) that is in 
> 2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it 
> was this patch, then we can send it to stable.
>   

Yes, I had wanted to send this patch to -stable, but got distracted with
other bugs.  So please do so, by all means.

Tony


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-22 22:25         ` Mike Christie
  2008-02-22 22:48           ` Tony Battersby
@ 2008-02-23 11:16           ` Mark Hounschell
  2008-03-05 11:58           ` Mark Hounschell
  2 siblings, 0 replies; 15+ messages in thread
From: Mark Hounschell @ 2008-02-23 11:16 UTC (permalink / raw)
  To: Mike Christie; +Cc: markh, linux-scsi, linux-kernel, Tony Battersby

Mike Christie wrote:
> Mark Hounschell wrote:
>> Mark Hounschell wrote:
>>> Mike Christie wrote:
>>>> Mike Christie wrote:
>>>>> Mark Hounschell wrote:
>>>>>> I seem to have run into some sort of regression in the SG_IO
>>>>>> interface of 2.6.24.2. I have an application that up until 2.6.24
>>>>>> worked fine. The 2.6.23.16 kernel works fine.
>>>>>>
>>>>>> During reads I get these kernel messages. Writes and other functions
>>>>>> _seem_ OK. Actually basic
>>>>>> reads  are working. Its with large BC reads using an io_vec list that
>>>>>> the problem shows up.
>>>>>>
>>>>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
>>>>> (/dev/sdX)?
>>>> If you are doing SG_IO to the sg device, then I know of one regression
>>>> (well not regression exactly, but I fixed a bug but the patch got
>>>> partially overwritten by another patch and that caused a new bug). Both
>>>> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
>>>> SG_IO to the sg device.
>>>>
>>> Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.
>>>
>>> Thanks
>>> Mark
>>> -
>>  
>> 2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a
>> patch
>> lying around for 2.6.24.2??
>>
> 
> I attached a backport of the patch from Tony (added as cc) that is in
> 2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it
> was this patch, then we can send it to stable.
> 

Sorry it took so long. This does fix my problem. I hope it's not to late
for 2.6.24.3

Regards
Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-02-22 22:25         ` Mike Christie
  2008-02-22 22:48           ` Tony Battersby
  2008-02-23 11:16           ` Mark Hounschell
@ 2008-03-05 11:58           ` Mark Hounschell
  2008-03-05 15:44             ` James Bottomley
  2 siblings, 1 reply; 15+ messages in thread
From: Mark Hounschell @ 2008-03-05 11:58 UTC (permalink / raw)
  To: Mike Christie; +Cc: markh, linux-scsi, linux-kernel, Tony Battersby

Mike Christie wrote:
> Mark Hounschell wrote:
>> Mark Hounschell wrote:
>>> Mike Christie wrote:
>>>> Mike Christie wrote:
>>>>> Mark Hounschell wrote:
>>>>>> I seem to have run into some sort of regression in the SG_IO
>>>>>> interface of 2.6.24.2. I have an application that up until 2.6.24
>>>>>> worked fine. The 2.6.23.16 kernel works fine.
>>>>>>
>>>>>> During reads I get these kernel messages. Writes and other functions
>>>>>> _seem_ OK. Actually basic
>>>>>> reads  are working. Its with large BC reads using an io_vec list that
>>>>>> the problem shows up.
>>>>>>
>>>>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
>>>>> (/dev/sdX)?
>>>> If you are doing SG_IO to the sg device, then I know of one regression
>>>> (well not regression exactly, but I fixed a bug but the patch got
>>>> partially overwritten by another patch and that caused a new bug). Both
>>>> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
>>>> SG_IO to the sg device.
>>>>
>>> Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.
>>>
>>> Thanks
>>> Mark
>>> -
>>  
>> 2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a
>> patch
>> lying around for 2.6.24.2??
>>
> 
> I attached a backport of the patch from Tony (added as cc) that is in
> 2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it
> was this patch, then we can send it to stable.
> 

>Mark Hounschell wrote:
>
>Sorry it took so long. This does fix my problem. I hope it's not to
>late for 2.6.24.3
>

Backport
76d78300a6eb8b7f08e47703b7e68a659ffc2053
to 2.6.24

>From Tony Battersby:

When sending a SCSI command to a tape drive via the SCSI Generic (sg)
driver, if the command has a data transfer length more than
scatter_elem_sz (32 KB default) and not a multiple of 512, then I either
hit BUG_ON(!valid_dma_direction(direction)) in dma_unmap_sg() or else
the command never completes (depending on the LLDD).

When constructing scatterlists, the sg driver rounds up the scatterlist
element sizes to be a multiple of 512.  This can result in
sum(scatterlist lengths) > bufflen.  In this case, scsi_req_map_sg()
incorrectly sets bio->bi_size to sum(scatterlist lengths) rather than to
bufflen.  When the command completes, req_bio_endio() detects that
bio->bi_size != 0, and so it doesn't call bio_endio().  This causes the
command to be resubmitted, resulting in BUG_ON or the command never
completing.

This patch makes scsi_req_map_sg() set bio->bi_size to bufflen rather
than to sum(scatterlist lengths), which fixes the problem.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>

--- linux-2.6.24.2/drivers/scsi/scsi_lib.c	2008-02-10 23:51:11.000000000
-0600
+++ linux-2.6.24.2.work/drivers/scsi/scsi_lib.c	2008-02-22
16:20:09.000000000 -0600
@@ -298,7 +298,6 @@ static int scsi_req_map_sg(struct reques
 		page = sg_page(sg);
 		off = sg->offset;
 		len = sg->length;
- 		data_len += len;

 		while (len > 0 && data_len > 0) {
 			/*


Did this ever get sent to the stable team?

Regards
Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-03-05 11:58           ` Mark Hounschell
@ 2008-03-05 15:44             ` James Bottomley
  2008-03-05 16:28               ` Mark Hounschell
  2008-03-05 17:13               ` Mike Christie
  0 siblings, 2 replies; 15+ messages in thread
From: James Bottomley @ 2008-03-05 15:44 UTC (permalink / raw)
  To: Mark Hounschell
  Cc: Mike Christie, markh, linux-scsi, linux-kernel, Tony Battersby

On Wed, 2008-03-05 at 06:58 -0500, Mark Hounschell wrote:
> Mike Christie wrote:
> > Mark Hounschell wrote:
> >> Mark Hounschell wrote:
> >>> Mike Christie wrote:
> >>>> Mike Christie wrote:
> >>>>> Mark Hounschell wrote:
> >>>>>> I seem to have run into some sort of regression in the SG_IO
> >>>>>> interface of 2.6.24.2. I have an application that up until 2.6.24
> >>>>>> worked fine. The 2.6.23.16 kernel works fine.
> >>>>>>
> >>>>>> During reads I get these kernel messages. Writes and other functions
> >>>>>> _seem_ OK. Actually basic
> >>>>>> reads  are working. Its with large BC reads using an io_vec list that
> >>>>>> the problem shows up.
> >>>>>>
> >>>>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
> >>>>> (/dev/sdX)?
> >>>> If you are doing SG_IO to the sg device, then I know of one regression
> >>>> (well not regression exactly, but I fixed a bug but the patch got
> >>>> partially overwritten by another patch and that caused a new bug). Both
> >>>> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
> >>>> SG_IO to the sg device.
> >>>>
> >>> Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.
> >>>
> >>> Thanks
> >>> Mark
> >>> -
> >>  
> >> 2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a
> >> patch
> >> lying around for 2.6.24.2??
> >>
> > 
> > I attached a backport of the patch from Tony (added as cc) that is in
> > 2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it
> > was this patch, then we can send it to stable.
> > 
> 
> >Mark Hounschell wrote:
> >
> >Sorry it took so long. This does fix my problem. I hope it's not to
> >late for 2.6.24.3
> >
> 
> Backport
> 76d78300a6eb8b7f08e47703b7e68a659ffc2053
> to 2.6.24

Erm, I think you mean:

commit 4d2de3a50ce19af2008a90636436a1bf5b3b697b
Author: Tony Battersby <tonyb@cybernetics.com>
Date:   Tue Feb 5 10:36:10 2008 -0500

    [SCSI] fix BUG when sum(scatterlist) > bufflen

I can send it ... I thought the error was introduced post 2.6.24, but it
was actually in 2.6.24-rc1

James


> >From Tony Battersby:
> 
> When sending a SCSI command to a tape drive via the SCSI Generic (sg)
> driver, if the command has a data transfer length more than
> scatter_elem_sz (32 KB default) and not a multiple of 512, then I either
> hit BUG_ON(!valid_dma_direction(direction)) in dma_unmap_sg() or else
> the command never completes (depending on the LLDD).
> 
> When constructing scatterlists, the sg driver rounds up the scatterlist
> element sizes to be a multiple of 512.  This can result in
> sum(scatterlist lengths) > bufflen.  In this case, scsi_req_map_sg()
> incorrectly sets bio->bi_size to sum(scatterlist lengths) rather than to
> bufflen.  When the command completes, req_bio_endio() detects that
> bio->bi_size != 0, and so it doesn't call bio_endio().  This causes the
> command to be resubmitted, resulting in BUG_ON or the command never
> completing.
> 
> This patch makes scsi_req_map_sg() set bio->bi_size to bufflen rather
> than to sum(scatterlist lengths), which fixes the problem.
> 
> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
> 
> --- linux-2.6.24.2/drivers/scsi/scsi_lib.c	2008-02-10 23:51:11.000000000
> -0600
> +++ linux-2.6.24.2.work/drivers/scsi/scsi_lib.c	2008-02-22
> 16:20:09.000000000 -0600
> @@ -298,7 +298,6 @@ static int scsi_req_map_sg(struct reques
>  		page = sg_page(sg);
>  		off = sg->offset;
>  		len = sg->length;
> - 		data_len += len;
> 
>  		while (len > 0 && data_len > 0) {
>  			/*
> 
> 
> Did this ever get sent to the stable team?
> 
> Regards
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-03-05 15:44             ` James Bottomley
@ 2008-03-05 16:28               ` Mark Hounschell
  2008-03-05 17:13               ` Mike Christie
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Hounschell @ 2008-03-05 16:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Hounschell, Mike Christie, linux-scsi, linux-kernel, Tony Battersby

James Bottomley wrote:
> On Wed, 2008-03-05 at 06:58 -0500, Mark Hounschell wrote:
>> Mike Christie wrote:
>>> Mark Hounschell wrote:
>>>> Mark Hounschell wrote:
>>>>> Mike Christie wrote:
>>>>>> Mike Christie wrote:
>>>>>>> Mark Hounschell wrote:
>>>>>>>> I seem to have run into some sort of regression in the SG_IO
>>>>>>>> interface of 2.6.24.2. I have an application that up until 2.6.24
>>>>>>>> worked fine. The 2.6.23.16 kernel works fine.
>>>>>>>>
>>>>>>>> During reads I get these kernel messages. Writes and other functions
>>>>>>>> _seem_ OK. Actually basic
>>>>>>>> reads  are working. Its with large BC reads using an io_vec list that
>>>>>>>> the problem shows up.
>>>>>>>>
>>>>>>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
>>>>>>> (/dev/sdX)?
>>>>>> If you are doing SG_IO to the sg device, then I know of one regression
>>>>>> (well not regression exactly, but I fixed a bug but the patch got
>>>>>> partially overwritten by another patch and that caused a new bug). Both
>>>>>> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
>>>>>> SG_IO to the sg device.
>>>>>>
>>>>> Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.
>>>>>
>>>>> Thanks
>>>>> Mark
>>>>> -
>>>>  
>>>> 2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a
>>>> patch
>>>> lying around for 2.6.24.2??
>>>>
>>> I attached a backport of the patch from Tony (added as cc) that is in
>>> 2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it
>>> was this patch, then we can send it to stable.
>>>
>>> Mark Hounschell wrote:
>>>
>>> Sorry it took so long. This does fix my problem. I hope it's not to
>>> late for 2.6.24.3
>>>
>> Backport
>> 76d78300a6eb8b7f08e47703b7e68a659ffc2053
>> to 2.6.24
> 
> Erm, I think you mean:
> 
> commit 4d2de3a50ce19af2008a90636436a1bf5b3b697b
> Author: Tony Battersby <tonyb@cybernetics.com>
> Date:   Tue Feb 5 10:36:10 2008 -0500
> 
>     [SCSI] fix BUG when sum(scatterlist) > bufflen
> 
> I can send it ... I thought the error was introduced post 2.6.24, but it
> was actually in 2.6.24-rc1
> 
> James
> 

I just cut and pasted from Mike's previous email. It would be great if this 
could get into the 2.6.24-stable tree.

Thanks
Mark

> 
>> >From Tony Battersby:
>>
>> When sending a SCSI command to a tape drive via the SCSI Generic (sg)
>> driver, if the command has a data transfer length more than
>> scatter_elem_sz (32 KB default) and not a multiple of 512, then I either
>> hit BUG_ON(!valid_dma_direction(direction)) in dma_unmap_sg() or else
>> the command never completes (depending on the LLDD).
>>
>> When constructing scatterlists, the sg driver rounds up the scatterlist
>> element sizes to be a multiple of 512.  This can result in
>> sum(scatterlist lengths) > bufflen.  In this case, scsi_req_map_sg()
>> incorrectly sets bio->bi_size to sum(scatterlist lengths) rather than to
>> bufflen.  When the command completes, req_bio_endio() detects that
>> bio->bi_size != 0, and so it doesn't call bio_endio().  This causes the
>> command to be resubmitted, resulting in BUG_ON or the command never
>> completing.
>>
>> This patch makes scsi_req_map_sg() set bio->bi_size to bufflen rather
>> than to sum(scatterlist lengths), which fixes the problem.
>>
>> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
>>
>> --- linux-2.6.24.2/drivers/scsi/scsi_lib.c	2008-02-10 23:51:11.000000000
>> -0600
>> +++ linux-2.6.24.2.work/drivers/scsi/scsi_lib.c	2008-02-22
>> 16:20:09.000000000 -0600
>> @@ -298,7 +298,6 @@ static int scsi_req_map_sg(struct reques
>>  		page = sg_page(sg);
>>  		off = sg->offset;
>>  		len = sg->length;
>> - 		data_len += len;
>>
>>  		while (len > 0 && data_len > 0) {
>>  			/*
>>
>>
>> Did this ever get sent to the stable team?
>>
>> Regards
>> Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New 2.6.24.2 SG_IO SCSI problems
  2008-03-05 15:44             ` James Bottomley
  2008-03-05 16:28               ` Mark Hounschell
@ 2008-03-05 17:13               ` Mike Christie
  1 sibling, 0 replies; 15+ messages in thread
From: Mike Christie @ 2008-03-05 17:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Hounschell, markh, linux-scsi, linux-kernel, Tony Battersby

James Bottomley wrote:
> On Wed, 2008-03-05 at 06:58 -0500, Mark Hounschell wrote:
>> Mike Christie wrote:
>>> Mark Hounschell wrote:
>>>> Mark Hounschell wrote:
>>>>> Mike Christie wrote:
>>>>>> Mike Christie wrote:
>>>>>>> Mark Hounschell wrote:
>>>>>>>> I seem to have run into some sort of regression in the SG_IO
>>>>>>>> interface of 2.6.24.2. I have an application that up until 2.6.24
>>>>>>>> worked fine. The 2.6.23.16 kernel works fine.
>>>>>>>>
>>>>>>>> During reads I get these kernel messages. Writes and other functions
>>>>>>>> _seem_ OK. Actually basic
>>>>>>>> reads  are working. Its with large BC reads using an io_vec list that
>>>>>>>> the problem shows up.
>>>>>>>>
>>>>>>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
>>>>>>> (/dev/sdX)?
>>>>>> If you are doing SG_IO to the sg device, then I know of one regression
>>>>>> (well not regression exactly, but I fixed a bug but the patch got
>>>>>> partially overwritten by another patch and that caused a new bug). Both
>>>>>> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
>>>>>> SG_IO to the sg device.
>>>>>>
>>>>> Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.
>>>>>
>>>>> Thanks
>>>>> Mark
>>>>> -
>>>>  
>>>> 2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a
>>>> patch
>>>> lying around for 2.6.24.2??
>>>>
>>> I attached a backport of the patch from Tony (added as cc) that is in
>>> 2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it
>>> was this patch, then we can send it to stable.
>>>
>>> Mark Hounschell wrote:
>>>
>>> Sorry it took so long. This does fix my problem. I hope it's not to
>>> late for 2.6.24.3
>>>
>> Backport
>> 76d78300a6eb8b7f08e47703b7e68a659ffc2053
>> to 2.6.24
> 
> Erm, I think you mean:

You are right.

> 
> commit 4d2de3a50ce19af2008a90636436a1bf5b3b697b
> Author: Tony Battersby <tonyb@cybernetics.com>
> Date:   Tue Feb 5 10:36:10 2008 -0500
> 
>     [SCSI] fix BUG when sum(scatterlist) > bufflen
> 
> I can send it ... I thought the error was introduced post 2.6.24, but it
> was actually in 2.6.24-rc1
> 

Ok thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-03-05 17:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-21 15:15 New 2.6.24.2 SG_IO SCSI problems Mark Hounschell
2008-02-21 15:41 ` James Bottomley
2008-02-21 16:21   ` Mark Hounschell
2008-02-22 10:03     ` Mark Hounschell
2008-02-22 16:50 ` Mike Christie
2008-02-22 16:59   ` Mike Christie
2008-02-22 17:56     ` Mark Hounschell
2008-02-22 21:38       ` Mark Hounschell
2008-02-22 22:25         ` Mike Christie
2008-02-22 22:48           ` Tony Battersby
2008-02-23 11:16           ` Mark Hounschell
2008-03-05 11:58           ` Mark Hounschell
2008-03-05 15:44             ` James Bottomley
2008-03-05 16:28               ` Mark Hounschell
2008-03-05 17:13               ` Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).