LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* sata_nv - ADMA issues with 2.6.20
@ 2007-02-09 21:11 David R
  0 siblings, 0 replies; 6+ messages in thread
From: David R @ 2007-02-09 21:11 UTC (permalink / raw)
  To: linux-kernel

I've just upgraded my home server to 2.6.20. It's got an Athlon64 on an ASUS
nForce-4 motherboard running a 32 bit kernel. I've had to fall back to using
sata_nv.adma=0 on the kernel command line. One of the NCQ capable drives
repeatedly produced the following errors. There wasn't much disk IO going on
at the time. It's perfectly happy now with ADMA disabled. Strange thing is the
other identical drive ata8 showed no problems (they're both part of a software
raid1)

Some clues follow.

Cheers
David

>> Feb  9 18:40:27 server kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400
>> Feb  9 18:40:27 server kernel: ata7: CPB 0: ctl_flags 0x1f, resp_flags 0x0
>> Feb  9 18:40:27 server kernel: ata7: CPB 1: ctl_flags 0x1f, resp_flags 0x1
>> Feb  9 18:40:27 server kernel: ata7: CPB 2: ctl_flags 0x1f, resp_flags 0x1
>> Feb  9 18:40:27 server kernel: ata7: CPB 3: ctl_flags 0x1f, resp_flags 0x1
etc etc..
>> Feb  9 18:40:29 server kernel: ata7: CPB 27: ctl_flags 0x1f, resp_flags 0x1
>> Feb  9 18:40:29 server kernel: ata7: CPB 28: ctl_flags 0x1f, resp_flags 0x1
>> Feb  9 18:40:29 server kernel: ata7: CPB 29: ctl_flags 0x1f, resp_flags 0x1
>> Feb  9 18:40:29 server kernel: ata7: CPB 30: ctl_flags 0x1f, resp_flags 0x1
>> Feb  9 18:40:29 server kernel: ata7: Resetting port
>> Feb  9 18:40:29 server kernel: ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
>> Feb  9 18:40:29 server kernel: ata7.00: cmd 61/08:00:1f:e4:50/00:00:09:00:00/40 tag 0 cdb 0x0 data 4096 out
>> Feb  9 18:40:29 server kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>> Feb  9 18:40:29 server kernel: ata7: soft resetting port
>> Feb  9 18:40:29 server kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> Feb  9 18:40:29 server kernel: ata7.00: configured for UDMA/133
>> Feb  9 18:40:29 server kernel: ata7: EH complete
>> Feb  9 18:40:29 server kernel: SCSI device sdg: 156301488 512-byte hdwr sectors (80026 MB)
>> Feb  9 18:40:29 server kernel: sdg: Write Protect is off
>> Feb  9 18:40:29 server kernel: sdg: Mode Sense: 00 3a 00 00
>> Feb  9 18:40:29 server kernel: SCSI device sdg: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

eventually bringing the speed down

>> Feb  9 18:47:36 server kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400
>> Feb  9 18:47:36 server kernel: ata7: CPB 0: ctl_flags 0x1f, resp_flags 0x0
>> Feb  9 18:47:36 server kernel: ata7: CPB 1: ctl_flags 0x1f, resp_flags 0x1
> ....
>> Feb  9 18:47:36 server kernel: ata7: CPB 29: ctl_flags 0x1f, resp_flags 0x1
>> Feb  9 18:47:36 server kernel: ata7: CPB 30: ctl_flags 0x1f, resp_flags 0x1
>> Feb  9 18:47:36 server kernel: ata7: Resetting port
>> Feb  9 18:47:36 server kernel: ata7.00: limiting speed to UDMA/100
>> Feb  9 18:47:36 server kernel: ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
>> Feb  9 18:47:36 server kernel: ata7.00: cmd 61/08:00:1f:e4:50/00:00:09:00:00/40 tag 0 cdb 0x0 data 4096 out
>> Feb  9 18:47:36 server kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>> Feb  9 18:47:36 server kernel: ata7: soft resetting port
>> Feb  9 18:47:36 server kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> Feb  9 18:47:36 server kernel: ata7.00: configured for UDMA/100
>> Feb  9 18:47:36 server kernel: ata7: EH complete
>> Feb  9 18:47:36 server kernel: SCSI device sdg: 156301488 512-byte hdwr sectors (80026 MB)
>> Feb  9 18:47:36 server kernel: sdg: Write Protect is off
>> Feb  9 18:47:36 server kernel: sdg: Mode Sense: 00 3a 00 00
>> Feb  9 18:47:36 server kernel: SCSI device sdg: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

All the way to UDMA/33

>> Feb  9 19:47:38 server kernel: ata7: Resetting port
>> Feb  9 19:47:38 server kernel: ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
>> Feb  9 19:47:38 server kernel: ata7.00: cmd 61/08:00:1f:e4:50/00:00:09:00:00/40 tag 0 cdb 0x0 data 4096 out
>> Feb  9 19:47:38 server kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>> Feb  9 19:47:38 server kernel: ata7: soft resetting port
>> Feb  9 19:47:39 server kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> Feb  9 19:47:39 server kernel: ata7.00: configured for UDMA/33
>> Feb  9 19:47:39 server kernel: ata7: EH complete
>> Feb  9 19:47:39 server kernel: SCSI device sdg: 156301488 512-byte hdwr sectors (80026 MB)
>> Feb  9 19:47:39 server kernel: sdg: Write Protect is off
>> Feb  9 19:47:39 server kernel: sdg: Mode Sense: 00 3a 00 00
>> Feb  9 19:47:39 server kernel: SCSI device sdg: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

relevant dmesg follows

>> <6>ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
>> <6>NFORCE-CK804: IDE controller at PCI slot 0000:00:06.0
>> <6>NFORCE-CK804: chipset revision 242
>> <6>NFORCE-CK804: not 100%% native mode: will probe irqs later
>> <6>NFORCE-CK804: 0000:00:06.0 (rev f2) UDMA133 controller
>> <6>    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
>> <6>    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
>> <7>Probing IDE interface ide0...
>> <4>hda: PIONEER DVD-RW DVR-107D, ATAPI CD/DVD-ROM drive
>> <4>ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
>> <7>Probing IDE interface ide1...
>> <7>Probing IDE interface ide1...
>> <6>hda: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive, 2000kB Cache, UDMA(33)
>> <6>Uniform CD-ROM driver Revision: 3.20
>> <6>megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
>> <6>megaraid: 2.20.4.9 (Release Date: Sun Jul 16 12:27:22 EST 2006)
>> <7>sata_sil 0000:05:0a.0: version 2.0
>> <6>ACPI: PCI Interrupt 0000:05:0a.0[A] -> Link [APC4] -> GSI 19 (level, low) -> IRQ 20
>> <6>sata_sil 0000:05:0a.0: Applying R_ERR on DMA activate FIS errata fix
>> <6>ata1: SATA max UDMA/100 cmd 0xF882E080 ctl 0xF882E08A bmdma 0xF882E000 irq 20
>> <6>ata2: SATA max UDMA/100 cmd 0xF882E0C0 ctl 0xF882E0CA bmdma 0xF882E008 irq 20
>> <6>ata3: SATA max UDMA/100 cmd 0xF882E280 ctl 0xF882E28A bmdma 0xF882E200 irq 20
>> <6>ata4: SATA max UDMA/100 cmd 0xF882E2C0 ctl 0xF882E2CA bmdma 0xF882E208 irq 20
>> <6>scsi0 : sata_sil
>> <6>ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> <6>ata1.00: ATA-7, max UDMA/133, 586114704 sectors: LBA48 NCQ (depth 0/32)
>> <6>ata1.00: ata1: dev 0 multi count 16
>> <6>ata1.00: configured for UDMA/100
>> <6>scsi1 : sata_sil
>> <6>ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> <6>ata2.00: ATA-7, max UDMA/133, 586072368 sectors: LBA48 NCQ (depth 0/32)
>> <6>ata2.00: ata2: dev 0 multi count 16
>> <6>ata2.00: configured for UDMA/100
>> <6>scsi2 : sata_sil
>> <6>ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> <6>ata3.00: ATA-7, max UDMA/133, 586072368 sectors: LBA48 NCQ (depth 0/32)
>> <6>ata3.00: ata3: dev 0 multi count 16
>> <6>ata3.00: configured for UDMA/100
>> <6>scsi3 : sata_sil
>> <6>ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> <6>ata4.00: ATA-7, max UDMA/133, 586114704 sectors: LBA48 NCQ (depth 0/32)
>> <6>ata4.00: ata4: dev 0 multi count 16
>> <6>ata4.00: configured for UDMA/100
>> <5>scsi 0:0:0:0: Direct-Access     ATA      Maxtor 6L300S0   BACE PQ: 0 ANSI: 5
>> <5>SCSI device sda: 586114704 512-byte hdwr sectors (300091 MB)
>> <5>sda: Write Protect is off
>> <7>sda: Mode Sense: 00 3a 00 00
>> <5>SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <5>SCSI device sda: 586114704 512-byte hdwr sectors (300091 MB)
>> <5>sda: Write Protect is off
>> <7>sda: Mode Sense: 00 3a 00 00
>> <5>SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <6> sda: sda1
>> <5>sd 0:0:0:0: Attached scsi disk sda
>> <5>sd 0:0:0:0: Attached scsi generic sg0 type 0
>> <5>scsi 1:0:0:0: Direct-Access     ATA      ST3300831AS      3.03 PQ: 0 ANSI: 5
>> <5>SCSI device sdb: 586072368 512-byte hdwr sectors (300069 MB)
>> <5>sdb: Write Protect is off
>> <7>sdb: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <5>SCSI device sdb: 586072368 512-byte hdwr sectors (300069 MB)
>> <5>sdb: Write Protect is off
>> <7>sdb: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <6> sdb: sdb1
>> <5>sd 1:0:0:0: Attached scsi disk sdb
>> <5>sd 1:0:0:0: Attached scsi generic sg1 type 0
>> <5>scsi 2:0:0:0: Direct-Access     ATA      ST3300831AS      3.03 PQ: 0 ANSI: 5
>> <5>SCSI device sdc: 586072368 512-byte hdwr sectors (300069 MB)
>> <5>sdc: Write Protect is off
>> <7>sdc: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <5>SCSI device sdc: 586072368 512-byte hdwr sectors (300069 MB)
>> <5>sdc: Write Protect is off
>> <7>sdc: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <6> sdc: sdc1
>> <5>sd 2:0:0:0: Attached scsi disk sdc
>> <5>sd 2:0:0:0: Attached scsi generic sg2 type 0
>> <5>scsi 3:0:0:0: Direct-Access     ATA      Maxtor 6L300S0   BACE PQ: 0 ANSI: 5
>> <5>SCSI device sdd: 586114704 512-byte hdwr sectors (300091 MB)
>> <5>sdd: Write Protect is off
>> <7>sdd: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <5>SCSI device sdd: 586114704 512-byte hdwr sectors (300091 MB)
>> <5>sdd: Write Protect is off
>> <7>sdd: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <6> sdd: sdd1
>> <5>sd 3:0:0:0: Attached scsi disk sdd
>> <5>sd 3:0:0:0: Attached scsi generic sg3 type 0
>> <7>sata_nv 0000:00:07.0: version 3.2
>> <4>ACPI: PCI Interrupt Link [APSI] enabled at IRQ 22
>> <6>ACPI: PCI Interrupt 0000:00:07.0[A] -> Link [APSI] -> GSI 22 (level, low) -> IRQ 21
>> <5>sata_nv 0000:00:07.0: Using ADMA mode
>> <7>PCI: Setting latency timer of device 0000:00:07.0 to 64
>> <6>ata5: SATA max UDMA/133 cmd 0xF8836480 ctl 0xF88364A0 bmdma 0xD800 irq 21
>> <6>ata6: SATA max UDMA/133 cmd 0xF8836580 ctl 0xF88365A0 bmdma 0xD808 irq 21
>> <6>scsi4 : sata_nv
>> <6>ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> <6>ata5.00: ATA-7, max UDMA/133, 586114704 sectors: LBA48 NCQ (depth 31/32)
>> <6>ata5.00: ata5: dev 0 multi count 1
>> <6>ata5.00: configured for UDMA/133
>> <6>scsi5 : sata_nv
>> <6>ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> <6>ata6.00: ATA-6, max UDMA/100, 488397168 sectors: LBA48
>> <6>ata6.00: ata6: dev 0 multi count 1
>> <6>ata6.00: configured for UDMA/100
>> <5>scsi 4:0:0:0: Direct-Access     ATA      Maxtor 6L300S0   BACE PQ: 0 ANSI: 5
>> <6>ata5: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>> <5>SCSI device sde: 586114704 512-byte hdwr sectors (300091 MB)
>> <5>sde: Write Protect is off
>> <7>sde: Mode Sense: 00 3a 00 00
>> <5>SCSI device sde: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <5>SCSI device sde: 586114704 512-byte hdwr sectors (300091 MB)
>> <5>sde: Write Protect is off
>> <7>sde: Mode Sense: 00 3a 00 00
>> <5>SCSI device sde: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <6> sde: sde1
>> <5>sd 4:0:0:0: Attached scsi disk sde
>> <5>sd 4:0:0:0: Attached scsi generic sg4 type 0
>> <5>scsi 5:0:0:0: Direct-Access     ATA      HDS722525VLSA80  V36O PQ: 0 ANSI: 5
>> <6>ata6: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>> <5>SCSI device sdf: 488397168 512-byte hdwr sectors (250059 MB)
>> <5>sdf: Write Protect is off
>> <7>sdf: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdf: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <5>SCSI device sdf: 488397168 512-byte hdwr sectors (250059 MB)
>> <5>sdf: Write Protect is off
>> <7>sdf: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdf: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <6> sdf: sdf1
>> <5>sd 5:0:0:0: Attached scsi disk sdf
>> <5>sd 5:0:0:0: Attached scsi generic sg5 type 0
>> <4>ACPI: PCI Interrupt Link [APSJ] enabled at IRQ 21
>> <6>ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [APSJ] -> GSI 21 (level, low) -> IRQ 22
>> <5>sata_nv 0000:00:08.0: Using ADMA mode
>> <7>PCI: Setting latency timer of device 0000:00:08.0 to 64
>> <6>ata7: SATA max UDMA/133 cmd 0xF883E480 ctl 0xF883E4A0 bmdma 0xC400 irq 22
>> <6>ata8: SATA max UDMA/133 cmd 0xF883E580 ctl 0xF883E5A0 bmdma 0xC408 irq 22
>> <6>scsi6 : sata_nv
>> <6>ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> <6>ata7.00: ATA-6, max UDMA/133, 156301488 sectors: LBA48 NCQ (depth 31/32)
>> <6>ata7.00: ata7: dev 0 multi count 1
>> <6>ata7.00: configured for UDMA/133
>> <6>scsi7 : sata_nv
>> <6>ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> <6>ata8.00: ATA-6, max UDMA/133, 156301488 sectors: LBA48 NCQ (depth 31/32)
>> <6>ata8.00: ata8: dev 0 multi count 1
>> <6>ata8.00: configured for UDMA/133
>> <5>scsi 6:0:0:0: Direct-Access     ATA      ST380817AS       3.42 PQ: 0 ANSI: 5
>> <6>ata7: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>> <5>SCSI device sdg: 156301488 512-byte hdwr sectors (80026 MB)
>> <5>sdg: Write Protect is off
>> <7>sdg: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdg: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <5>SCSI device sdg: 156301488 512-byte hdwr sectors (80026 MB)
>> <5>sdg: Write Protect is off
>> <7>sdg: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdg: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <6> sdg: sdg1 sdg2 sdg3
>> <5>sd 6:0:0:0: Attached scsi disk sdg
>> <5>sd 6:0:0:0: Attached scsi generic sg6 type 0
>> <5>scsi 7:0:0:0: Direct-Access     ATA      ST380817AS       3.42 PQ: 0 ANSI: 5
>> <6>ata8: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>> <5>SCSI device sdh: 156301488 512-byte hdwr sectors (80026 MB)
>> <5>sdh: Write Protect is off
>> <7>sdh: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdh: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <5>SCSI device sdh: 156301488 512-byte hdwr sectors (80026 MB)
>> <5>sdh: Write Protect is off
>> <7>sdh: Mode Sense: 00 3a 00 00
>> <5>SCSI device sdh: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> <6> sdh: sdh1 sdh2 sdh3
>> <5>sd 7:0:0:0: Attached scsi disk sdh
>> <5>sd 7:0:0:0: Attached scsi generic sg7 type 0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sata_nv - ADMA issues with 2.6.20
       [not found]     ` <fa.skzjVxsBWVzk9Yzhsb2UIA1VDGY@ifi.uio.no>
@ 2007-02-11 23:04       ` Robert Hancock
  0 siblings, 0 replies; 6+ messages in thread
From: Robert Hancock @ 2007-02-11 23:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: Björn Steinbrink, Neil Schemenauer, david

Björn Steinbrink wrote:
> If the look like this, you might want to try a few patches that are in
> -mm.
> 
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 out
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata1: soft resetting port
> 
> We thought that Robert had fixed these with some changes that went into
> -rc6. But they reappeared a few days later, the -mm patches seem to have
>  finally cured it.
> 
> The original patches are here:
> 
> http://www2.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm3/broken-out/sata_nv-cleanup-adma-error-handling-v2.patch
> http://www2.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm3/broken-out/sata_nv-cleanup-adma-error-handling-v2-cleanup.patch
> http://www2.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm3/broken-out/sata_nv-use-adma-for-nodata-commands.patch
> 
> As they had a few rejects against 2.6.20-rc7, I'm attaching my fixed all-in-one version of these patches, I guess it should apply to 2.6.20 just fine.

This isn't quite the same as that problem we were seeing, these are an 
actual NCQ read/write that is timing out and not a cache flush command. 
Nevertheless it wouldn't hurt for people having this problem to test out 
the latest and greatest sata_nv patches. In particular there was one 
that I resurrected from the debugging of that problem, "sata_nv: wait 
for response on entering/leaving ADMA mode" which though it didn't end 
up fixing it, seemed like a good thing to be doing anyway, and which 
potentially might have some effect on this problem here.

All the patches I have for sata_nv are in Linus' git tree, that is 
probably a simpler way to test them than -mm right now - applying the 
2.6.20-git6 patch on top of 2.6.20 should do it.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sata_nv - ADMA issues with 2.6.20
  2007-02-11 19:55   ` Neil Schemenauer
@ 2007-02-11 22:23     ` Björn Steinbrink
  0 siblings, 0 replies; 6+ messages in thread
From: Björn Steinbrink @ 2007-02-11 22:23 UTC (permalink / raw)
  To: Neil Schemenauer; +Cc: linux-kernel

On 2007.02.11 19:55:37 +0000, Neil Schemenauer wrote:
> > David R wrote:
> >>>> Feb  9 18:40:29 server kernel: ata7: Resetting port
> >>>> Feb  9 18:40:29 server kernel: ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
> >>>> Feb  9 18:40:29 server kernel: ata7.00: cmd 61/08:00:1f:e4:50/00:00:09:00:00/40 tag 0 cdb 0x0 data 4096 out
> >>>> Feb  9 18:40:29 server kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> 
> I'm setting similar errors on my machine (AMD64 CPU, MSI board with
> NForce chipset).  Linux 2.6.19.2 seems to be okay.

If the look like this, you might want to try a few patches that are in
-mm.

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: soft resetting port

We thought that Robert had fixed these with some changes that went into
-rc6. But they reappeared a few days later, the -mm patches seem to have
 finally cured it.

The original patches are here:

http://www2.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm3/broken-out/sata_nv-cleanup-adma-error-handling-v2.patch
http://www2.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm3/broken-out/sata_nv-cleanup-adma-error-handling-v2-cleanup.patch
http://www2.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm3/broken-out/sata_nv-use-adma-for-nodata-commands.patch

As they had a few rejects against 2.6.20-rc7, I'm attaching my fixed all-in-one version of these patches, I guess it should apply to 2.6.20 just fine.

HTH
Björn


diff -NurpP --minimal a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
--- a/drivers/ata/sata_nv.c	2007-01-26 19:25:31.000000000 +0100
+++ b/drivers/ata/sata_nv.c	2007-02-04 01:49:54.000000000 +0100
@@ -648,53 +648,62 @@ static unsigned int nv_adma_tf_to_cpb(st
 	return idx;
 }
 
-static void nv_adma_check_cpb(struct ata_port *ap, int cpb_num, int force_err)
+static int nv_adma_check_cpb(struct ata_port *ap, int cpb_num, int force_err)
 {
 	struct nv_adma_port_priv *pp = ap->private_data;
-	int complete = 0, have_err = 0;
 	u8 flags = pp->cpb[cpb_num].resp_flags;
 
 	VPRINTK("CPB %d, flags=0x%x\n", cpb_num, flags);
 
-	if (flags & NV_CPB_RESP_DONE) {
-		VPRINTK("CPB flags done, flags=0x%x\n", flags);
-		complete = 1;
-	}
-	if (flags & NV_CPB_RESP_ATA_ERR) {
-		ata_port_printk(ap, KERN_ERR, "CPB flags ATA err, flags=0x%x\n", flags);
-		have_err = 1;
-		complete = 1;
-	}
-	if (flags & NV_CPB_RESP_CMD_ERR) {
-		ata_port_printk(ap, KERN_ERR, "CPB flags CMD err, flags=0x%x\n", flags);
-		have_err = 1;
-		complete = 1;
-	}
-	if (flags & NV_CPB_RESP_CPB_ERR) {
-		ata_port_printk(ap, KERN_ERR, "CPB flags CPB err, flags=0x%x\n", flags);
-		have_err = 1;
-		complete = 1;
+	if (unlikely((force_err ||
+		     flags & (NV_CPB_RESP_ATA_ERR |
+			      NV_CPB_RESP_CMD_ERR |
+			      NV_CPB_RESP_CPB_ERR)))) {
+		struct ata_eh_info *ehi = &ap->eh_info;
+		int freeze = 0;
+
+		ata_ehi_clear_desc(ehi);
+		ata_ehi_push_desc(ehi, "CPB resp_flags 0x%x", flags );
+		if (flags & NV_CPB_RESP_ATA_ERR) {
+			ata_ehi_push_desc(ehi, ": ATA error");
+			ehi->err_mask |= AC_ERR_DEV;
+		} else if (flags & NV_CPB_RESP_CMD_ERR) {
+			ata_ehi_push_desc(ehi, ": CMD error");
+			ehi->err_mask |= AC_ERR_DEV;
+		} else if (flags & NV_CPB_RESP_CPB_ERR) {
+			ata_ehi_push_desc(ehi, ": CPB error");
+			ehi->err_mask |= AC_ERR_SYSTEM;
+			freeze = 1;
+		} else {
+			/* notifier error, but no error in CPB flags? */
+			ehi->err_mask |= AC_ERR_OTHER;
+			freeze = 1;
+		}
+		/* Kill all commands. EH will determine what actually failed. */
+		if (freeze)
+			ata_port_freeze(ap);
+		else
+			ata_port_abort(ap);
+		return 1;
 	}
-	if(complete || force_err)
-	{
+
+	if (flags & NV_CPB_RESP_DONE) {
 		struct ata_queued_cmd *qc = ata_qc_from_tag(ap, cpb_num);
-		if(likely(qc)) {
-			u8 ata_status = 0;
-			/* Only use the ATA port status for non-NCQ commands.
+		VPRINTK("CPB flags done, flags=0x%x\n", flags);
+		if (likely(qc)) {
+			/* Grab the ATA port status for non-NCQ commands.
 			   For NCQ commands the current status may have nothing to do with
 			   the command just completed. */
-			if(qc->tf.protocol != ATA_PROT_NCQ)
-				ata_status = readb(nv_adma_ctl_block(ap) + (ATA_REG_STATUS * 4));
-
-			if(have_err || force_err)
-				ata_status |= ATA_ERR;
-
-			qc->err_mask |= ac_err_mask(ata_status);
+			if (qc->tf.protocol != ATA_PROT_NCQ) {
+				u8 ata_status = readb(nv_adma_ctl_block(ap) + (ATA_REG_STATUS * 4));
+				qc->err_mask |= ac_err_mask(ata_status);
+			}
 			DPRINTK("Completing qc from tag %d with err_mask %u\n",cpb_num,
 				qc->err_mask);
 			ata_qc_complete(qc);
 		}
 	}
+	return 0;
 }
 
 static int nv_host_intr(struct ata_port *ap, u8 irq_stat)
@@ -738,7 +747,6 @@ static irqreturn_t nv_adma_interrupt(int
 			void __iomem *mmio = nv_adma_ctl_block(ap);
 			u16 status;
 			u32 gen_ctl;
-			int have_global_err = 0;
 			u32 notifier, notifier_error;
 
 			/* if in ATA register mode, use standard ata interrupt handler */
@@ -774,42 +782,50 @@ static irqreturn_t nv_adma_interrupt(int
 			readw(mmio + NV_ADMA_STAT); /* flush posted write */
 			rmb();
 
-			/* freeze if hotplugged */
-			if (unlikely(status & (NV_ADMA_STAT_HOTPLUG | NV_ADMA_STAT_HOTUNPLUG))) {
-				ata_port_printk(ap, KERN_NOTICE, "Hotplug event, freezing\n");
+			handled++; /* irq handled if we got here */
+
+			/* freeze if hotplugged or controller error */
+			if (unlikely(status & (NV_ADMA_STAT_HOTPLUG |
+					       NV_ADMA_STAT_HOTUNPLUG |
+					       NV_ADMA_STAT_TIMEOUT))) {
+				struct ata_eh_info *ehi = &ap->eh_info;
+
+				ata_ehi_clear_desc(ehi);
+				ata_ehi_push_desc(ehi, "ADMA status 0x%08x", status );
+				if (status & NV_ADMA_STAT_TIMEOUT) {
+					ehi->err_mask |= AC_ERR_SYSTEM;
+					ata_ehi_push_desc(ehi, ": timeout");
+				} else if (status & NV_ADMA_STAT_HOTPLUG) {
+					ata_ehi_hotplugged(ehi);
+					ata_ehi_push_desc(ehi, ": hotplug");
+				} else if (status & NV_ADMA_STAT_HOTUNPLUG) {
+					ata_ehi_hotplugged(ehi);
+					ata_ehi_push_desc(ehi, ": hot unplug");
+				}
 				ata_port_freeze(ap);
-				handled++;
 				continue;
 			}
 
-			if (status & NV_ADMA_STAT_TIMEOUT) {
-				ata_port_printk(ap, KERN_ERR, "timeout, stat=0x%x\n", status);
-				have_global_err = 1;
-			}
-			if (status & NV_ADMA_STAT_CPBERR) {
-				ata_port_printk(ap, KERN_ERR, "CPB error, stat=0x%x\n", status);
-				have_global_err = 1;
-			}
-			if ((status & NV_ADMA_STAT_DONE) || have_global_err) {
+			if (status & (NV_ADMA_STAT_DONE |
+				      NV_ADMA_STAT_CPBERR)) {
 				/** Check CPBs for completed commands */
 
-				if(ata_tag_valid(ap->active_tag))
+				if (ata_tag_valid(ap->active_tag)) {
 					/* Non-NCQ command */
-					nv_adma_check_cpb(ap, ap->active_tag, have_global_err ||
-						(notifier_error & (1 << ap->active_tag)));
-				else {
-					int pos;
+					nv_adma_check_cpb(ap, ap->active_tag,
+						notifier_error & (1 << ap->active_tag));
+				} else {
+					int pos, error = 0;
 					u32 active = ap->sactive;
-					while( (pos = ffs(active)) ) {
+
+					while ((pos = ffs(active)) && !error) {
 						pos--;
-						nv_adma_check_cpb(ap, pos, have_global_err ||
-							(notifier_error & (1 << pos)) );
+						error = nv_adma_check_cpb(ap, pos,
+							notifier_error & (1 << pos) );
 						active &= ~(1 << pos );
 					}
 				}
 			}
-
-			handled++; /* irq handled if we got here */
 		}
 	}
 	
@@ -1110,18 +1126,33 @@ static void nv_adma_fill_sg(struct ata_q
 		cpb->next_aprd = cpu_to_le64(((u64)(pp->aprd_dma + NV_ADMA_SGTBL_SZ * qc->tag)));
 }
 
+static int nv_adma_use_reg_mode(struct ata_queued_cmd *qc)
+{
+	struct nv_adma_port_priv *pp = qc->ap->private_data;
+
+	/* ADMA engine can only be used for non-ATAPI DMA commands,
+	   or interrupt-driven no-data commands. */
+	if((pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) ||
+	   (qc->tf.flags & ATA_TFLAG_POLLING))
+		return 1;
+
+	if((qc->flags & ATA_QCFLAG_DMAMAP) ||
+	   (qc->tf.protocol == ATA_PROT_NODATA))
+		return 0;
+
+	return 1;
+}
+
 static void nv_adma_qc_prep(struct ata_queued_cmd *qc)
 {
 	struct nv_adma_port_priv *pp = qc->ap->private_data;
 	struct nv_adma_cpb *cpb = &pp->cpb[qc->tag];
 	u8 ctl_flags = NV_CPB_CTL_CPB_VALID |
-		       NV_CPB_CTL_APRD_VALID |
 		       NV_CPB_CTL_IEN;
 
 	VPRINTK("qc->flags = 0x%lx\n", qc->flags);
 
-	if (!(qc->flags & ATA_QCFLAG_DMAMAP) ||
-	     (pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) {
+	if (nv_adma_use_reg_mode(qc)) {
 		nv_adma_register_mode(qc->ap);
 		ata_qc_prep(qc);
 		return;
@@ -1139,7 +1170,11 @@ static void nv_adma_qc_prep(struct ata_q
 
 	nv_adma_tf_to_cpb(&qc->tf, cpb->tf);
 
-	nv_adma_fill_sg(qc, cpb);
+	if(qc->flags & ATA_QCFLAG_DMAMAP) {
+		nv_adma_fill_sg(qc, cpb);
+		ctl_flags |= NV_CPB_CTL_APRD_VALID;
+	} else
+		memset(&cpb->aprd[0], 0, sizeof(struct nv_adma_prd) * 5);
 
 	/* Be paranoid and don't let the device see NV_CPB_CTL_CPB_VALID until we are
 	   finished filling in all of the contents */
@@ -1154,10 +1189,9 @@ static unsigned int nv_adma_qc_issue(str
 
 	VPRINTK("ENTER\n");
 
-	if (!(qc->flags & ATA_QCFLAG_DMAMAP) ||
-	     (pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) {
+	if (nv_adma_use_reg_mode(qc)) {
 		/* use ATA register mode */
-		VPRINTK("no dmamap or ATAPI, using ATA register mode: 0x%lx\n", qc->flags);
+		VPRINTK("using ATA register mode: 0x%lx\n", qc->flags);
 		nv_adma_register_mode(qc->ap);
 		return ata_qc_issue_prot(qc);
 	} else
@@ -1339,28 +1373,9 @@ static void nv_adma_error_handler(struct
 		int i;
 		u16 tmp;
 
-		u32 notifier = readl(mmio + NV_ADMA_NOTIFIER);
-		u32 notifier_error = readl(mmio + NV_ADMA_NOTIFIER_ERROR);
-		u32 gen_ctl = readl(nv_adma_gen_block(ap) + NV_ADMA_GEN_CTL);
-		u32 status = readw(mmio + NV_ADMA_STAT);
-
-		ata_port_printk(ap, KERN_ERR, "EH in ADMA mode, notifier 0x%X "
-			"notifier_error 0x%X gen_ctl 0x%X status 0x%X\n",
-			notifier, notifier_error, gen_ctl, status);
-
-		for( i=0;i<NV_ADMA_MAX_CPBS;i++) {
-			struct nv_adma_cpb *cpb = &pp->cpb[i];
-			if( cpb->ctl_flags || cpb->resp_flags )
-				ata_port_printk(ap, KERN_ERR,
-					"CPB %d: ctl_flags 0x%x, resp_flags 0x%x\n",
-					i, cpb->ctl_flags, cpb->resp_flags);
-		}
-
 		/* Push us back into port register mode for error handling. */
 		nv_adma_register_mode(ap);
 
-		ata_port_printk(ap, KERN_ERR, "Resetting port\n");
-
 		/* Mark all of the CPBs as invalid to prevent them from being executed */
 		for( i=0;i<NV_ADMA_MAX_CPBS;i++)
 			pp->cpb[i].ctl_flags &= ~NV_CPB_CTL_CPB_VALID;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sata_nv - ADMA issues with 2.6.20
  2007-02-10  1:33 ` Robert Hancock
  2007-02-10  9:34   ` David R
@ 2007-02-11 19:55   ` Neil Schemenauer
  2007-02-11 22:23     ` Björn Steinbrink
  1 sibling, 1 reply; 6+ messages in thread
From: Neil Schemenauer @ 2007-02-11 19:55 UTC (permalink / raw)
  To: linux-kernel

> David R wrote:
>>>> Feb  9 18:40:29 server kernel: ata7: Resetting port
>>>> Feb  9 18:40:29 server kernel: ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
>>>> Feb  9 18:40:29 server kernel: ata7.00: cmd 61/08:00:1f:e4:50/00:00:09:00:00/40 tag 0 cdb 0x0 data 4096 out
>>>> Feb  9 18:40:29 server kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

I'm setting similar errors on my machine (AMD64 CPU, MSI board with
NForce chipset).  Linux 2.6.19.2 seems to be okay.

  Neil


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sata_nv - ADMA issues with 2.6.20
  2007-02-10  1:33 ` Robert Hancock
@ 2007-02-10  9:34   ` David R
  2007-02-11 19:55   ` Neil Schemenauer
  1 sibling, 0 replies; 6+ messages in thread
From: David R @ 2007-02-10  9:34 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel

Robert Hancock wrote:

> So it was tag 0 that timed out , and according to the CPBs the
> controller indeed believes the command is still outstanding, i.e. we
> didn't lose an interrupt. I'm suspicious of the fact that only one of
> two identical drives produced this error.. some kind of hardware-related
> problem perhaps? 30 seconds is an awfully long time for a drive to take

That was my first thought. Maybe the drive has some fault that only shows up
with NCQ in use. I'll keep an eye on it.

> You can also try disabling NCQ without disabling ADMA and see what that
> does:
> 
> echo 1 > /sys/block/sdX/device/queue_depth

I'll give this a go when I next reboot.

Thanks
David



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sata_nv - ADMA issues with 2.6.20
       [not found] <fa.uoZWqSFxPjjB7zobQOwk/2zqG5A@ifi.uio.no>
@ 2007-02-10  1:33 ` Robert Hancock
  2007-02-10  9:34   ` David R
  2007-02-11 19:55   ` Neil Schemenauer
  0 siblings, 2 replies; 6+ messages in thread
From: Robert Hancock @ 2007-02-10  1:33 UTC (permalink / raw)
  To: David R; +Cc: linux-kernel

David R wrote:
> I've just upgraded my home server to 2.6.20. It's got an Athlon64 on an ASUS
> nForce-4 motherboard running a 32 bit kernel. I've had to fall back to using
> sata_nv.adma=0 on the kernel command line. One of the NCQ capable drives
> repeatedly produced the following errors. There wasn't much disk IO going on
> at the time. It's perfectly happy now with ADMA disabled. Strange thing is the
> other identical drive ata8 showed no problems (they're both part of a software
> raid1)
> 
> Some clues follow.
> 
> Cheers
> David
> 
>>> Feb  9 18:40:27 server kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400
>>> Feb  9 18:40:27 server kernel: ata7: CPB 0: ctl_flags 0x1f, resp_flags 0x0
>>> Feb  9 18:40:27 server kernel: ata7: CPB 1: ctl_flags 0x1f, resp_flags 0x1
>>> Feb  9 18:40:27 server kernel: ata7: CPB 2: ctl_flags 0x1f, resp_flags 0x1
>>> Feb  9 18:40:27 server kernel: ata7: CPB 3: ctl_flags 0x1f, resp_flags 0x1
> etc etc..
>>> Feb  9 18:40:29 server kernel: ata7: CPB 27: ctl_flags 0x1f, resp_flags 0x1
>>> Feb  9 18:40:29 server kernel: ata7: CPB 28: ctl_flags 0x1f, resp_flags 0x1
>>> Feb  9 18:40:29 server kernel: ata7: CPB 29: ctl_flags 0x1f, resp_flags 0x1
>>> Feb  9 18:40:29 server kernel: ata7: CPB 30: ctl_flags 0x1f, resp_flags 0x1
>>> Feb  9 18:40:29 server kernel: ata7: Resetting port
>>> Feb  9 18:40:29 server kernel: ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
>>> Feb  9 18:40:29 server kernel: ata7.00: cmd 61/08:00:1f:e4:50/00:00:09:00:00/40 tag 0 cdb 0x0 data 4096 out
>>> Feb  9 18:40:29 server kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

So it was tag 0 that timed out , and according to the CPBs the 
controller indeed believes the command is still outstanding, i.e. we 
didn't lose an interrupt. I'm suspicious of the fact that only one of 
two identical drives produced this error.. some kind of hardware-related 
problem perhaps? 30 seconds is an awfully long time for a drive to take 
to finish a command.

You can also try disabling NCQ without disabling ADMA and see what that 
does:

echo 1 > /sys/block/sdX/device/queue_depth

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-02-11 23:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-09 21:11 sata_nv - ADMA issues with 2.6.20 David R
     [not found] <fa.uoZWqSFxPjjB7zobQOwk/2zqG5A@ifi.uio.no>
2007-02-10  1:33 ` Robert Hancock
2007-02-10  9:34   ` David R
2007-02-11 19:55   ` Neil Schemenauer
2007-02-11 22:23     ` Björn Steinbrink
     [not found] <fa.4+EfmBKJuhLBKrzZ9Tr6IwbmP9o@ifi.uio.no>
     [not found] ` <fa.k0ZDYzgHEXJHC2/UddL2J4jq/kE@ifi.uio.no>
     [not found]   ` <fa.W/MKSda79+AG31HO6bl30nzW79o@ifi.uio.no>
     [not found]     ` <fa.skzjVxsBWVzk9Yzhsb2UIA1VDGY@ifi.uio.no>
2007-02-11 23:04       ` Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).