LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] sata_nv: fix for completion handling
@ 2008-01-30  1:53 Robert Hancock
  2008-01-30  2:28 ` Tejun Heo
  2008-02-01 16:50 ` Jeff Garzik
  0 siblings, 2 replies; 4+ messages in thread
From: Robert Hancock @ 2008-01-30  1:53 UTC (permalink / raw)
  To: linux-kernel, ide, Jeff Garzik, Tejun Heo, Kuan Luo, Allen Martin

This patch is based on an original patch from Kuan Luo of NVIDIA,
posted under subject "fixed a bug of adma in rhel4u5 with HDS7250SASUN500G".
His description follows. I've reworked it a bit to avoid some unnecessary
repeated checks but it should be functionally identical.

"The patch is to solve the error message "ata1: CPB flags CMD err,
flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up. 

I traced the  bug and found that the interrupt finished a command (for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE  and  cpb->resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this moment.
It meaned the hardware  had not completely finished the command.
If at the same time  the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
then ADMA indicates that command with corresponding tag number completed
execution.

So i added the check notifier code. Sometimes i saw that the notifier
reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
code."

Signed-off-by: Robert Hancock <hancockr@shaw.ca>

--- linux-2.6.24-rc7-git3/drivers/ata/sata_nv.c	2008-01-11 17:32:40.000000000 -0600
+++ linux-2.6.24-rc7-git3edit/drivers/ata/sata_nv.c	2008-01-11 17:43:40.000000000 -0600
@@ -1011,14 +1011,20 @@
 			}
 
 			if (status & (NV_ADMA_STAT_DONE |
-				      NV_ADMA_STAT_CPBERR)) {
-				u32 check_commands;
+				      NV_ADMA_STAT_CPBERR |
+				      NV_ADMA_STAT_CMD_COMPLETE)) {
+				u32 check_commands = notifier_clears[i];
 				int pos, error = 0;
 
-				if (ata_tag_valid(ap->link.active_tag))
-					check_commands = 1 << ap->link.active_tag;
-				else
-					check_commands = ap->link.sactive;
+				if (status & NV_ADMA_STAT_CPBERR) {
+					/* Check all active commands */
+					if (ata_tag_valid(ap->link.active_tag))
+						check_commands = 1 <<
+							ap->link.active_tag;
+					else
+						check_commands = ap->
+							link.sactive;
+				}
 
 				/** Check CPBs for completed commands */
 				while ((pos = ffs(check_commands)) && !error) {


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sata_nv: fix for completion handling
  2008-01-30  1:53 [PATCH] sata_nv: fix for completion handling Robert Hancock
@ 2008-01-30  2:28 ` Tejun Heo
  2008-01-30  2:54   ` Robert Hancock
  2008-02-01 16:50 ` Jeff Garzik
  1 sibling, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2008-01-30  2:28 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, ide, Jeff Garzik, Kuan Luo, Allen Martin

Robert Hancock wrote:
> This patch is based on an original patch from Kuan Luo of NVIDIA,
> posted under subject "fixed a bug of adma in rhel4u5 with HDS7250SASUN500G".
> His description follows. I've reworked it a bit to avoid some unnecessary
> repeated checks but it should be functionally identical.
> 
> "The patch is to solve the error message "ata1: CPB flags CMD err,
> flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
> I tested this hd in 2.6.24-rc7 which needed to remove the mask in
> blacklist to run the ncq and the same error also showed up. 
> 
> I traced the  bug and found that the interrupt finished a command (for
> example, tag=0) when the driver got that adma status is
> NV_ADMA_STAT_DONE  and  cpb->resp_flags is NV_CPB_RESP_DONE.
> However, For this hd, the drive maybe didn't clear bit 0 at this moment.
> It meaned the hardware  had not completely finished the command.
> If at the same time  the driver freed the command(tag 0) and sended
> another command (tag 0), the error happened.
> 
> The notifier register is 32-bit register containing notifier value.
> Value is bit vector containing one bit per tag number (0-31) in
> corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
> then ADMA indicates that command with corresponding tag number completed
> execution.
> 
> So i added the check notifier code. Sometimes i saw that the notifier
> reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
> ,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
> code."
> 
> Signed-off-by: Robert Hancock <hancockr@shaw.ca>

Any chance this fixes the FLUSH problem?

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sata_nv: fix for completion handling
  2008-01-30  2:28 ` Tejun Heo
@ 2008-01-30  2:54   ` Robert Hancock
  0 siblings, 0 replies; 4+ messages in thread
From: Robert Hancock @ 2008-01-30  2:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, ide, Jeff Garzik, Kuan Luo, Allen Martin

Tejun Heo wrote:
> Robert Hancock wrote:
>> This patch is based on an original patch from Kuan Luo of NVIDIA,
>> posted under subject "fixed a bug of adma in rhel4u5 with HDS7250SASUN500G".
>> His description follows. I've reworked it a bit to avoid some unnecessary
>> repeated checks but it should be functionally identical.
>>
>> "The patch is to solve the error message "ata1: CPB flags CMD err,
>> flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
>> I tested this hd in 2.6.24-rc7 which needed to remove the mask in
>> blacklist to run the ncq and the same error also showed up. 
>>
>> I traced the  bug and found that the interrupt finished a command (for
>> example, tag=0) when the driver got that adma status is
>> NV_ADMA_STAT_DONE  and  cpb->resp_flags is NV_CPB_RESP_DONE.
>> However, For this hd, the drive maybe didn't clear bit 0 at this moment.
>> It meaned the hardware  had not completely finished the command.
>> If at the same time  the driver freed the command(tag 0) and sended
>> another command (tag 0), the error happened.
>>
>> The notifier register is 32-bit register containing notifier value.
>> Value is bit vector containing one bit per tag number (0-31) in
>> corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
>> then ADMA indicates that command with corresponding tag number completed
>> execution.
>>
>> So i added the check notifier code. Sometimes i saw that the notifier
>> reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
>> ,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
>> code."
>>
>> Signed-off-by: Robert Hancock <hancockr@shaw.ca>
> 
> Any chance this fixes the FLUSH problem?

I could still reproduce that issue when I took the udelay(20) out of the 
driver. Others have seen that without taking it out, so I suspect some 
systems/drives are more sensitive to that for some reason. However, who 
knows, it may help some people with that problem.

The symptoms of the problem dealt with here are different, not a command 
timeout it appears, but the controller reporting an error.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sata_nv: fix for completion handling
  2008-01-30  1:53 [PATCH] sata_nv: fix for completion handling Robert Hancock
  2008-01-30  2:28 ` Tejun Heo
@ 2008-02-01 16:50 ` Jeff Garzik
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff Garzik @ 2008-02-01 16:50 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, ide, Tejun Heo, Kuan Luo, Allen Martin

Robert Hancock wrote:
> This patch is based on an original patch from Kuan Luo of NVIDIA,
> posted under subject "fixed a bug of adma in rhel4u5 with HDS7250SASUN500G".
> His description follows. I've reworked it a bit to avoid some unnecessary
> repeated checks but it should be functionally identical.
> 
> "The patch is to solve the error message "ata1: CPB flags CMD err,
> flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
> I tested this hd in 2.6.24-rc7 which needed to remove the mask in
> blacklist to run the ncq and the same error also showed up. 
> 
> I traced the  bug and found that the interrupt finished a command (for
> example, tag=0) when the driver got that adma status is
> NV_ADMA_STAT_DONE  and  cpb->resp_flags is NV_CPB_RESP_DONE.
> However, For this hd, the drive maybe didn't clear bit 0 at this moment.
> It meaned the hardware  had not completely finished the command.
> If at the same time  the driver freed the command(tag 0) and sended
> another command (tag 0), the error happened.
> 
> The notifier register is 32-bit register containing notifier value.
> Value is bit vector containing one bit per tag number (0-31) in
> corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
> then ADMA indicates that command with corresponding tag number completed
> execution.
> 
> So i added the check notifier code. Sometimes i saw that the notifier
> reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
> ,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
> code."
> 
> Signed-off-by: Robert Hancock <hancockr@shaw.ca>

applied, thanks all for investigating this stuff




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-02-01 16:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-30  1:53 [PATCH] sata_nv: fix for completion handling Robert Hancock
2008-01-30  2:28 ` Tejun Heo
2008-01-30  2:54   ` Robert Hancock
2008-02-01 16:50 ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).