LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* CK804 SATA Errors (still got them)
@ 2007-03-01 13:39 Alistair John Strachan
2007-03-01 14:45 ` Robert Hancock
0 siblings, 1 reply; 12+ messages in thread
From: Alistair John Strachan @ 2007-03-01 13:39 UTC (permalink / raw)
To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel
Hi Robert,
Despite all the work that went into making these less frequent with ADMA,
they're still possible to trigger.
alistair@damocles:~$ cat /proc/version
Linux version 2.6.21-rc2-damocles (root@damocles) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Wed Feb 28 21:58:41 GMT 2007
alistair@damocles:~$ dmesg | tail -n 13
ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x500 next cpb count 0x0 next cpb idx 0x0
ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd ca/00:38:ae:08:c2/00:00:00:00:00/e0 tag 0 cdb 0x0 data 28672 out
res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
These cause the same ~30 second stalls. Machine was not under load.
No 3rd party modules were loaded.
--
Cheers,
Alistair.
Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-01 13:39 CK804 SATA Errors (still got them) Alistair John Strachan
@ 2007-03-01 14:45 ` Robert Hancock
2007-03-01 15:13 ` Alistair John Strachan
0 siblings, 1 reply; 12+ messages in thread
From: Robert Hancock @ 2007-03-01 14:45 UTC (permalink / raw)
To: Alistair John Strachan; +Cc: Jeff Garzik, linux-kernel
Alistair John Strachan wrote:
> Hi Robert,
>
> Despite all the work that went into making these less frequent with ADMA,
> they're still possible to trigger.
>
> alistair@damocles:~$ cat /proc/version
> Linux version 2.6.21-rc2-damocles (root@damocles) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Wed Feb 28 21:58:41 GMT 2007
>
> alistair@damocles:~$ dmesg | tail -n 13
> ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x500 next cpb count 0x0 next cpb idx 0x0
> ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: cmd ca/00:38:ae:08:c2/00:00:00:00:00/e0 tag 0 cdb 0x0 data 28672 out
> res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata1: soft resetting port
> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata1.00: configured for UDMA/133
> ata1: EH complete
> SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
> sda: Write Protect is off
> sda: Mode Sense: 00 3a 00 00
> SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>
> These cause the same ~30 second stalls. Machine was not under load.
>
> No 3rd party modules were loaded.
This one seems a bit different. This time it's not related to NCQ vs.
non-NCQ (this is a non-NCQ write here), it's in ADMA mode (so it's
presumably not related to switching between ADMA and register mode,
unless perhaps a flush cache or something executed just before), and
from the CPB data it appears the command completed but the controller's
registers aren't indicating that it has. Not sure if I've seen one like
that before..
How easily can you reproduce this?
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-01 14:45 ` Robert Hancock
@ 2007-03-01 15:13 ` Alistair John Strachan
2007-03-02 1:20 ` Alistair John Strachan
0 siblings, 1 reply; 12+ messages in thread
From: Alistair John Strachan @ 2007-03-01 15:13 UTC (permalink / raw)
To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel
On Thursday 01 March 2007 14:45, Robert Hancock wrote:
> This one seems a bit different. This time it's not related to NCQ vs.
> non-NCQ (this is a non-NCQ write here), it's in ADMA mode (so it's
> presumably not related to switching between ADMA and register mode,
> unless perhaps a flush cache or something executed just before), and
> from the CPB data it appears the command completed but the controller's
> registers aren't indicating that it has. Not sure if I've seen one like
> that before..
>
> How easily can you reproduce this?
It's the first one since -rc2, so apparently not easily. I'm more than willing
to find loads that expose it, though, so I might try that this afternoon.
--
Cheers,
Alistair.
Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-01 15:13 ` Alistair John Strachan
@ 2007-03-02 1:20 ` Alistair John Strachan
2007-03-02 2:40 ` Robert Hancock
0 siblings, 1 reply; 12+ messages in thread
From: Alistair John Strachan @ 2007-03-02 1:20 UTC (permalink / raw)
To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel
On Thursday 01 March 2007 15:13, Alistair John Strachan wrote:
> On Thursday 01 March 2007 14:45, Robert Hancock wrote:
> > This one seems a bit different. This time it's not related to NCQ vs.
> > non-NCQ (this is a non-NCQ write here), it's in ADMA mode (so it's
> > presumably not related to switching between ADMA and register mode,
> > unless perhaps a flush cache or something executed just before), and
> > from the CPB data it appears the command completed but the controller's
> > registers aren't indicating that it has. Not sure if I've seen one like
> > that before..
> >
> > How easily can you reproduce this?
>
> It's the first one since -rc2, so apparently not easily. I'm more than
> willing to find loads that expose it, though, so I might try that this
> afternoon.
Got another:
ata2: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x500 next cpb count 0x0 next cpb idx 0x0
ata2: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: cmd c8/00:80:85:c4:ed/00:00:00:00:00/e3 tag 0 cdb 0x0 data 65536 in
res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2: soft resetting port
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Different HD, similar problem.
--
Cheers,
Alistair.
Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-02 1:20 ` Alistair John Strachan
@ 2007-03-02 2:40 ` Robert Hancock
2007-03-02 15:47 ` Alistair John Strachan
0 siblings, 1 reply; 12+ messages in thread
From: Robert Hancock @ 2007-03-02 2:40 UTC (permalink / raw)
To: Alistair John Strachan; +Cc: Jeff Garzik, linux-kernel
Alistair John Strachan wrote:
> On Thursday 01 March 2007 15:13, Alistair John Strachan wrote:
>> On Thursday 01 March 2007 14:45, Robert Hancock wrote:
>>> This one seems a bit different. This time it's not related to NCQ vs.
>>> non-NCQ (this is a non-NCQ write here), it's in ADMA mode (so it's
>>> presumably not related to switching between ADMA and register mode,
>>> unless perhaps a flush cache or something executed just before), and
>>> from the CPB data it appears the command completed but the controller's
>>> registers aren't indicating that it has. Not sure if I've seen one like
>>> that before..
>>>
>>> How easily can you reproduce this?
>> It's the first one since -rc2, so apparently not easily. I'm more than
>> willing to find loads that expose it, though, so I might try that this
>> afternoon.
>
> Got another:
>
> ata2: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x500 next cpb count 0x0 next cpb idx 0x0
> ata2: CPB 0: ctl_flags 0xd, resp_flags 0x1
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata2.00: cmd c8/00:80:85:c4:ed/00:00:00:00:00/e3 tag 0 cdb 0x0 data 65536 in
> res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata2: soft resetting port
> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata2.00: configured for UDMA/133
> ata2: EH complete
> SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
> sdb: Write Protect is off
> sdb: Mode Sense: 00 3a 00 00
> SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>
> Different HD, similar problem.
Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a
(link below) and see what effect that has?
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=721449bf0d51213fe3abf0ac3e3561ef9ea7827a
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-02 2:40 ` Robert Hancock
@ 2007-03-02 15:47 ` Alistair John Strachan
2007-03-04 23:25 ` Robert Hancock
0 siblings, 1 reply; 12+ messages in thread
From: Alistair John Strachan @ 2007-03-02 15:47 UTC (permalink / raw)
To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel
On Friday 02 March 2007 02:40, Robert Hancock wrote:
> Alistair John Strachan wrote:
> > On Thursday 01 March 2007 15:13, Alistair John Strachan wrote:
> >> On Thursday 01 March 2007 14:45, Robert Hancock wrote:
> >>> This one seems a bit different. This time it's not related to NCQ vs.
> >>> non-NCQ (this is a non-NCQ write here), it's in ADMA mode (so it's
> >>> presumably not related to switching between ADMA and register mode,
> >>> unless perhaps a flush cache or something executed just before), and
> >>> from the CPB data it appears the command completed but the controller's
> >>> registers aren't indicating that it has. Not sure if I've seen one like
> >>> that before..
> >>>
> >>> How easily can you reproduce this?
> >>
> >> It's the first one since -rc2, so apparently not easily. I'm more than
> >> willing to find loads that expose it, though, so I might try that this
> >> afternoon.
> >
> > Got another:
> >
> > ata2: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000
> > status 0x500 next cpb count 0x0 next cpb idx 0x0 ata2: CPB 0: ctl_flags
> > 0xd, resp_flags 0x1
> > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> > ata2.00: cmd c8/00:80:85:c4:ed/00:00:00:00:00/e3 tag 0 cdb 0x0 data 65536
> > in res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft
> > resetting port
> > ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > ata2.00: configured for UDMA/133
> > ata2: EH complete
> > SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
> > sdb: Write Protect is off
> > sdb: Mode Sense: 00 3a 00 00
> > SCSI device sdb: write cache: enabled, read cache: enabled, doesn't
> > support DPO or FUA
> >
> > Different HD, similar problem.
>
> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a
> (link below) and see what effect that has?
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h
>=721449bf0d51213fe3abf0ac3e3561ef9ea7827a
Obviously, I'll let you know if it happens again, but I've reverted this
commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 HDs on an
NVIDIA sata controller, and this error hasn't appeared.
So I'm inclined to (very unscientifically) say that this brings it back to
2.6.20's level of stability.
--
Cheers,
Alistair.
Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-02 15:47 ` Alistair John Strachan
@ 2007-03-04 23:25 ` Robert Hancock
2007-03-04 23:41 ` Alistair John Strachan
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Robert Hancock @ 2007-03-04 23:25 UTC (permalink / raw)
To: Alistair John Strachan; +Cc: Jeff Garzik, linux-kernel
Alistair John Strachan wrote:
>> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a
>> (link below) and see what effect that has?
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h
>> =721449bf0d51213fe3abf0ac3e3561ef9ea7827a
>
> Obviously, I'll let you know if it happens again, but I've reverted this
> commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 HDs on an
> NVIDIA sata controller, and this error hasn't appeared.
>
> So I'm inclined to (very unscientifically) say that this brings it back to
> 2.6.20's level of stability.
Interesting. Can you try un-reverting that patch, and applying this one?
The reading of the status register is something that was part of the original
NVidia code, which I'm not really sure why is there. Given that reading
the status register clears the drive's interrupt status, that might be
causing some wierd interaction with the ADMA controller. Also, I added in
a printk for cases where notifiers are triggered but the command doesn't
indicate completion - if you still get problems, let me know if you see
that message.
--- linux-2.6.21-rc2-git3/drivers/ata/sata_nv.c 2007-03-04 14:44:05.000000000 -0600
+++ linux-2.6.21-rc2-git3edit/drivers/ata/sata_nv.c 2007-03-04 17:09:06.000000000 -0600
@@ -745,10 +745,10 @@
/* Grab the ATA port status for non-NCQ commands.
For NCQ commands the current status may have nothing to do with
the command just completed. */
- if (qc->tf.protocol != ATA_PROT_NCQ) {
+/* if (qc->tf.protocol != ATA_PROT_NCQ) {
u8 ata_status = readb(pp->ctl_block + (ATA_REG_STATUS * 4));
qc->err_mask |= ac_err_mask(ata_status);
- }
+ }*/
DPRINTK("Completing qc from tag %d with err_mask %u\n",cpb_num,
qc->err_mask);
ata_qc_complete(qc);
@@ -764,6 +764,9 @@
ata_port_freeze(ap);
return 1;
}
+ } else {
+ ata_port_printk(ap, KERN_WARNING, "notifier for tag %d but not complete?\n",
+ cpb_num);
}
return 0;
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-04 23:25 ` Robert Hancock
@ 2007-03-04 23:41 ` Alistair John Strachan
2007-03-04 23:49 ` Robert Hancock
2007-03-04 23:50 ` Jeff Garzik
2007-03-04 23:46 ` Jeff Garzik
2007-03-05 3:52 ` Alistair John Strachan
2 siblings, 2 replies; 12+ messages in thread
From: Alistair John Strachan @ 2007-03-04 23:41 UTC (permalink / raw)
To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel
On Sunday 04 March 2007 23:25, Robert Hancock wrote:
> Alistair John Strachan wrote:
> >> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a
> >> (link below) and see what effect that has?
> >>
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi
> >>t;h =721449bf0d51213fe3abf0ac3e3561ef9ea7827a
> >
> > Obviously, I'll let you know if it happens again, but I've reverted this
> > commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 HDs on
> > an NVIDIA sata controller, and this error hasn't appeared.
> >
> > So I'm inclined to (very unscientifically) say that this brings it back
> > to 2.6.20's level of stability.
>
> Interesting. Can you try un-reverting that patch, and applying this one?
Sorry for the newbie question, but is it adequate to do a:
git reset --hard v2.6.21-rc2
To ensure a patch is "unreverted" (I reverted it with "git revert"), before
applying your patch?
I've done so now, assuming this _will_ work. The reason I ask is that your
diff was offset by 12 lines versus -rc2.
--
Cheers,
Alistair.
Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-04 23:41 ` Alistair John Strachan
@ 2007-03-04 23:49 ` Robert Hancock
2007-03-04 23:50 ` Jeff Garzik
1 sibling, 0 replies; 12+ messages in thread
From: Robert Hancock @ 2007-03-04 23:49 UTC (permalink / raw)
To: Alistair John Strachan; +Cc: Jeff Garzik, linux-kernel
Alistair John Strachan wrote:
>> Interesting. Can you try un-reverting that patch, and applying this one?
>
> Sorry for the newbie question, but is it adequate to do a:
>
> git reset --hard v2.6.21-rc2
>
> To ensure a patch is "unreverted" (I reverted it with "git revert"), before
> applying your patch?
>
> I've done so now, assuming this _will_ work. The reason I ask is that your
> diff was offset by 12 lines versus -rc2.
I assume it's OK, though I'm not a git expert. I diffed against rc2-git3
which has some CONFIG_PM ifdef changes, those shouldn't be important though.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-04 23:41 ` Alistair John Strachan
2007-03-04 23:49 ` Robert Hancock
@ 2007-03-04 23:50 ` Jeff Garzik
1 sibling, 0 replies; 12+ messages in thread
From: Jeff Garzik @ 2007-03-04 23:50 UTC (permalink / raw)
To: Alistair John Strachan; +Cc: Robert Hancock, linux-kernel
Alistair John Strachan wrote:
> On Sunday 04 March 2007 23:25, Robert Hancock wrote:
>> Alistair John Strachan wrote:
>>>> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a
>>>> (link below) and see what effect that has?
>>>>
>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi
>>>> t;h =721449bf0d51213fe3abf0ac3e3561ef9ea7827a
>>> Obviously, I'll let you know if it happens again, but I've reverted this
>>> commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 HDs on
>>> an NVIDIA sata controller, and this error hasn't appeared.
>>>
>>> So I'm inclined to (very unscientifically) say that this brings it back
>>> to 2.6.20's level of stability.
>> Interesting. Can you try un-reverting that patch, and applying this one?
>
> Sorry for the newbie question, but is it adequate to do a:
>
> git reset --hard v2.6.21-rc2
>
> To ensure a patch is "unreverted" (I reverted it with "git revert"), before
> applying your patch?
>
> I've done so now, assuming this _will_ work. The reason I ask is that your
> diff was offset by 12 lines versus -rc2.
If you committed the revert to the repository, it's probably to blow it
away and re-clone. Generally, with git, you want to keep a pristine,
never-touched-except-for-pulling kernel repository around, and then when
doing compiles and experiments and such, run
git-clone --reference my-vanilla-2.6-repo $URL
The --reference argument will ensure that you don't haul around multiple
copies of the repository objects, with each clone.
Otherwise, if you have committed nothing to the repository, this will
undo all your not-committed changes:
git checkout -f
Regards,
Jeff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-04 23:25 ` Robert Hancock
2007-03-04 23:41 ` Alistair John Strachan
@ 2007-03-04 23:46 ` Jeff Garzik
2007-03-05 3:52 ` Alistair John Strachan
2 siblings, 0 replies; 12+ messages in thread
From: Jeff Garzik @ 2007-03-04 23:46 UTC (permalink / raw)
To: Robert Hancock
Cc: Alistair John Strachan, linux-kernel, IDE/ATA development list
Robert Hancock wrote:
> Alistair John Strachan wrote:
>>> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a
>>> (link below) and see what effect that has?
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h
>>>
>>> =721449bf0d51213fe3abf0ac3e3561ef9ea7827a
>>
>> Obviously, I'll let you know if it happens again, but I've reverted
>> this commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4
>> HDs on an NVIDIA sata controller, and this error hasn't appeared.
>>
>> So I'm inclined to (very unscientifically) say that this brings it
>> back to 2.6.20's level of stability.
>
> Interesting. Can you try un-reverting that patch, and applying this one?
>
> The reading of the status register is something that was part of the
> original
> NVidia code, which I'm not really sure why is there. Given that reading
> the status register clears the drive's interrupt status, that might be
> causing some wierd interaction with the ADMA controller. Also, I added in
> a printk for cases where notifiers are triggered but the command doesn't
> indicate completion - if you still get problems, let me know if you see
> that message.
AFAICS, when in ADMA mode, you absolutely should not touch the ATA
shadow registers at all.
This is normal for all controllers with both a "legacy mode" and an
"enhanced DMA mode" of some sort: the internal silicon state machines
"own" the ATA shadow registers while in enhanced DMA mode. Reading or
writing the ATA shadow registers while in enhanced DMA mode can lead to
undefined results, running the gamut from no-op to data corruption and
hardware lock-ups.
You may only access the ATA shadow registers when NV_ADMA_CTL_GO is
cleared, and then NV_ADMA_STAT_LEGACY is set, indicating the NVIDIA chip
is in register mode (aka legacy mode).
Jeff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them)
2007-03-04 23:25 ` Robert Hancock
2007-03-04 23:41 ` Alistair John Strachan
2007-03-04 23:46 ` Jeff Garzik
@ 2007-03-05 3:52 ` Alistair John Strachan
2 siblings, 0 replies; 12+ messages in thread
From: Alistair John Strachan @ 2007-03-05 3:52 UTC (permalink / raw)
To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel
On Sunday 04 March 2007 23:25, Robert Hancock wrote:
> Alistair John Strachan wrote:
> >> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a
> >> (link below) and see what effect that has?
> >>
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi
> >>t;h =721449bf0d51213fe3abf0ac3e3561ef9ea7827a
> >
> > Obviously, I'll let you know if it happens again, but I've reverted this
> > commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 HDs on
> > an NVIDIA sata controller, and this error hasn't appeared.
> >
> > So I'm inclined to (very unscientifically) say that this brings it back
> > to 2.6.20's level of stability.
>
> Interesting. Can you try un-reverting that patch, and applying this one?
>
> The reading of the status register is something that was part of the
> original NVidia code, which I'm not really sure why is there. Given that
> reading the status register clears the drive's interrupt status, that might
> be causing some wierd interaction with the ADMA controller. Also, I added
> in a printk for cases where notifiers are triggered but the command doesn't
> indicate completion - if you still get problems, let me know if you see
> that message.
Didn't take long to observe the problem again, so I'm guessing that this isn't
it. I was definitely using a kernel compiled with your patch:
alistair@damocles:~$ uname -v
#1 SMP Sun Mar 4 23:39:56 GMT 2007
I got the following in dmesg:
ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x500 next cpb count 0x0 next cpb idx 0x0
ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd c8/00:08:37:77:61/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in
res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Your debugging message did not appear in dmesg, however.
--
Cheers,
Alistair.
Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-03-05 3:53 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-01 13:39 CK804 SATA Errors (still got them) Alistair John Strachan
2007-03-01 14:45 ` Robert Hancock
2007-03-01 15:13 ` Alistair John Strachan
2007-03-02 1:20 ` Alistair John Strachan
2007-03-02 2:40 ` Robert Hancock
2007-03-02 15:47 ` Alistair John Strachan
2007-03-04 23:25 ` Robert Hancock
2007-03-04 23:41 ` Alistair John Strachan
2007-03-04 23:49 ` Robert Hancock
2007-03-04 23:50 ` Jeff Garzik
2007-03-04 23:46 ` Jeff Garzik
2007-03-05 3:52 ` Alistair John Strachan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).