LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
@ 2007-02-07 17:17 Emmeran Seehuber
  2007-02-09  9:19 ` Tejun Heo
  0 siblings, 1 reply; 9+ messages in thread
From: Emmeran Seehuber @ 2007-02-07 17:17 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2350 bytes --]

Hi there,

we`ve got a database server machine running a 2.6.18.2 vanilla kernel on 
Debian Etch. The database is MySQL 5. Everything works fine, but sometimes 
the server "lags", i.e. it doesn`t respond for 30 seconds. We`ve now 
investigated the problem and found this messages in syslog (and dmesg):

15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
15:55:44 omega11 kernel: ata1: soft resetting port
15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 
300)
15:55:44 omega11 kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C
15:55:44 omega11 last message repeated 5 times
15:55:44 omega11 kernel: ata1.00: qc timeout (cmd 0xec)
15:55:44 omega11 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
15:55:44 omega11 kernel: ata1: failed to recover some devices, retrying in 5 
secs
15:55:44 omega11 kernel: ata1: hard resetting port
15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 
300)
15:55:44 omega11 kernel: ata1.00: configured for UDMA/133
15:55:44 omega11 kernel: ata1: EH complete
15:55:44 omega11 kernel: SCSI device sda: 293046768 512-byte hdwr sectors 
(150040 MB)
15:55:44 omega11 kernel: sda: Write Protect is off
15:55:44 omega11 kernel: SCSI device sda: drive cache: write back

We`ve got this messages up to 5 times a day since as far as our syslogs reach. 

It seems no kind of queuing is used:
# cat /sys/block/sda/device/queue_type
none
# cat /sys/block/sda/device/queue_depth
1

The server is up for 91 days now and has low to medium load (depending on 
daytime). Since it`s a production server located in a datacenter, we can`t 
just test some random kernel on it :(

Does somebody have a glue whats going on here? Could it be a hardware failure? 
We have an identical machine using the same kernel. It`s used as a webserver. 
There also this messages shows up, but not that often (10 times in 91 days 
uptime). If it is a hardware failure, then both machines would been affected 
by the same hardware problem.

What can we do to fix this problem? Is it known? 

I`ve found many posts related to SATA problems, but none seemed to be about 
this problem.

Do you need additional information?

Thanks

cu,
  Emmy

P.S.: Please CC me, since i`m not subscribed.

[-- Attachment #2: lspci.txt --]
[-- Type: text/plain, Size: 1565 bytes --]

00:01.0 PCI bridge: Broadcom HT1000 PCI/PCI-X bridge
00:02.0 Host bridge: Broadcom HT1000 Legacy South Bridge
00:02.1 IDE interface: Broadcom HT1000 Legacy IDE controller
00:02.2 ISA bridge: Broadcom HT1000 LPC Bridge
00:03.0 USB Controller: Broadcom HT1000 USB Controller (rev 01)
00:03.1 USB Controller: Broadcom HT1000 USB Controller (rev 01)
00:03.2 USB Controller: Broadcom HT1000 USB Controller (rev 01)
00:04.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller (rev 05)
00:05.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller (rev 05)
00:06.0 VGA compatible controller: XGI - Xabre Graphics Inc Volari Z7
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:0d.0 PCI bridge: Broadcom HT1000 PCI/PCI-X bridge (rev c0)
01:0e.0 RAID bus controller: Broadcom BCM5785 (HT1000) SATA Native SATA Mode

[-- Attachment #3: config.gz --]
[-- Type: application/x-gzip, Size: 11411 bytes --]

[-- Attachment #4: cpuinfo.txt --]
[-- Type: text/plain, Size: 3052 bytes --]

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 275
stepping        : 2
cpu MHz         : 2194.616
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips        : 4390.69
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 275
stepping        : 2
cpu MHz         : 2194.616
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips        : 4390.11
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 2
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 275
stepping        : 2
cpu MHz         : 2194.616
cache size      : 1024 KB
physical id     : 1
siblings        : 2
core id         : 0
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips        : 4393.11
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 275
stepping        : 2
cpu MHz         : 2194.616
cache size      : 1024 KB
physical id     : 1
siblings        : 2
core id         : 1
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips        : 4393.51
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
  2007-02-07 17:17 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000)) Emmeran Seehuber
@ 2007-02-09  9:19 ` Tejun Heo
  2007-02-09 11:37   ` Emmeran Seehuber
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2007-02-09  9:19 UTC (permalink / raw)
  To: Emmeran Seehuber; +Cc: linux-kernel

Hi,

Emmeran Seehuber wrote:
> we`ve got a database server machine running a 2.6.18.2 vanilla kernel on 
> Debian Etch. The database is MySQL 5. Everything works fine, but sometimes 
> the server "lags", i.e. it doesn`t respond for 30 seconds. We`ve now 
> investigated the problem and found this messages in syslog (and dmesg):
> 
> 15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
> 15:55:44 omega11 kernel: ata1: soft resetting port
> 15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
> 15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 
> 300)
> 15:55:44 omega11 kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C
> 15:55:44 omega11 last message repeated 5 times
> 15:55:44 omega11 kernel: ata1.00: qc timeout (cmd 0xec)
> 15:55:44 omega11 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> 15:55:44 omega11 kernel: ata1: failed to recover some devices, retrying in 5 
> secs
> 15:55:44 omega11 kernel: ata1: hard resetting port
> 15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 
> 300)
> 15:55:44 omega11 kernel: ata1.00: configured for UDMA/133
> 15:55:44 omega11 kernel: ata1: EH complete
> 15:55:44 omega11 kernel: SCSI device sda: 293046768 512-byte hdwr sectors 
> (150040 MB)
> 15:55:44 omega11 kernel: sda: Write Protect is off
> 15:55:44 omega11 kernel: SCSI device sda: drive cache: write back

This is just the recovery part.  Need more log.  If possible, please 
give a shot at 2.6.20.  It might have fixed your problem or at least 
allow better diagnosis.

> We`ve got this messages up to 5 times a day since as far as our syslogs reach. 
> 
> It seems no kind of queuing is used:
> # cat /sys/block/sda/device/queue_type
> none
> # cat /sys/block/sda/device/queue_depth
> 1
> 
> The server is up for 91 days now and has low to medium load (depending on 
> daytime). Since it`s a production server located in a datacenter, we can`t 
> just test some random kernel on it :(

I see.

> Does somebody have a glue whats going on here? Could it be a hardware failure? 

It might be.  Quite some SATA bug reports turn out to be hardware 
problem, most commonly PSU issues.

> We have an identical machine using the same kernel. It`s used as a webserver. 
> There also this messages shows up, but not that often (10 times in 91 days 
> uptime). If it is a hardware failure, then both machines would been affected 
> by the same hardware problem.

Hmmm...

> What can we do to fix this problem? Is it known? 
> 
> I`ve found many posts related to SATA problems, but none seemed to be about 
> this problem.
> 
> Do you need additional information?

Yeah, please post the content of /var/log/boot.msg if available and the 
result of dmesg and lspci -nn.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
  2007-02-09  9:19 ` Tejun Heo
@ 2007-02-09 11:37   ` Emmeran Seehuber
  2007-02-09 13:54     ` Tejun Heo
  0 siblings, 1 reply; 9+ messages in thread
From: Emmeran Seehuber @ 2007-02-09 11:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1003 bytes --]

Am Friday 09 February 2007 schrieb Tejun Heo:
> Hi,

>
> This is just the recovery part.  Need more log.  If possible, please
> give a shot at 2.6.20.  It might have fixed your problem or at least
> allow better diagnosis.
>

I´ll look into getting 2.6.20 on the machine. But it might take some time till 
we can do this.

> > Does somebody have a glue whats going on here? Could it be a hardware
> > failure?
>
> It might be.  Quite some SATA bug reports turn out to be hardware
> problem, most commonly PSU issues.

The power supply unit (you meant this with PSU, didn`t you?) has 800 Watt, so 
it should be powerfull enough for one harddisk and no graphics board.

> > Do you need additional information?
>
> Yeah, please post the content of /var/log/boot.msg if available and the
> result of dmesg and lspci -nn.

We don`t have a /var/log/boot.msg, but it seems the boot messages were saved 
in /var/log/dmesg, so I attached it.

Thanks for your effort.

cu,
  Emmy

[-- Attachment #2: lspci-nn.txt --]
[-- Type: text/plain, Size: 1945 bytes --]

00:01.0 PCI bridge [0604]: Broadcom HT1000 PCI/PCI-X bridge [1166:0036]
00:02.0 Host bridge [0600]: Broadcom HT1000 Legacy South Bridge [1166:0205]
00:02.1 IDE interface [0101]: Broadcom HT1000 Legacy IDE controller [1166:0214]
00:02.2 ISA bridge [0601]: Broadcom HT1000 LPC Bridge [1166:0234]
00:03.0 USB Controller [0c03]: Broadcom HT1000 USB Controller [1166:0223] (rev 01)
00:03.1 USB Controller [0c03]: Broadcom HT1000 USB Controller [1166:0223] (rev 01)
00:03.2 USB Controller [0c03]: Broadcom HT1000 USB Controller [1166:0223] (rev 01)
00:04.0 Ethernet controller [0200]: Intel Corporation 82541GI/PI Gigabit Ethernet Controller [8086:1076] (rev 05)
00:05.0 Ethernet controller [0200]: Intel Corporation 82541GI/PI Gigabit Ethernet Controller [8086:1076] (rev 05)
00:06.0 VGA compatible controller [0300]: XGI - Xabre Graphics Inc Volari Z7 [18ca:0020]
00:18.0 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration [1022:1100]
00:18.1 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map [1022:1101]
00:18.2 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller [1022:1102]
00:18.3 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control [1022:1103]
00:19.0 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration [1022:1100]
00:19.1 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map [1022:1101]
00:19.2 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller [1022:1102]
00:19.3 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control [1022:1103]
01:0d.0 PCI bridge [0604]: Broadcom HT1000 PCI/PCI-X bridge [1166:0104] (rev c0)
01:0e.0 RAID bus controller [0104]: Broadcom BCM5785 (HT1000) SATA Native SATA Mode [1166:024a]

[-- Attachment #3: dmesg.txt.gz --]
[-- Type: application/x-gzip, Size: 1191 bytes --]

[-- Attachment #4: boot.dmesg.txt.gz --]
[-- Type: application/x-gzip, Size: 7357 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
  2007-02-09 11:37   ` Emmeran Seehuber
@ 2007-02-09 13:54     ` Tejun Heo
  2007-02-09 17:09       ` Emmeran Seehuber
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2007-02-09 13:54 UTC (permalink / raw)
  To: Emmeran Seehuber; +Cc: linux-kernel

Emmeran Seehuber wrote:
>>> Does somebody have a glue whats going on here? Could it be a hardware
>>> failure?
>> It might be.  Quite some SATA bug reports turn out to be hardware
>> problem, most commonly PSU issues.
> 
> The power supply unit (you meant this with PSU, didn`t you?) has 800 Watt, so 
> it should be powerfull enough for one harddisk and no graphics board.

I see.

>>> Do you need additional information?
>> Yeah, please post the content of /var/log/boot.msg if available and the
>> result of dmesg and lspci -nn.
> 
> We don`t have a /var/log/boot.msg, but it seems the boot messages were saved 
> in /var/log/dmesg, so I attached it.

Yeap, that's exactly what I wanted.  So, the driver is sata_svw and 
errors are timeouts for both reads and writes with BMDMA engine still 
running.  It looks like transmission errors to me.  Can you post the 
result of 'smartctl -a /dev/sdX'?

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
  2007-02-09 13:54     ` Tejun Heo
@ 2007-02-09 17:09       ` Emmeran Seehuber
  2007-02-10  6:49         ` Tejun Heo
  0 siblings, 1 reply; 9+ messages in thread
From: Emmeran Seehuber @ 2007-02-09 17:09 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

Am Friday 09 February 2007 schrieb Tejun Heo:
> Yeap, that's exactly what I wanted.  So, the driver is sata_svw and
> errors are timeouts for both reads and writes with BMDMA engine still
> running.  It looks like transmission errors to me.  Can you post the
> result of 'smartctl -a /dev/sdX'?

here it is:

-->

# smartctl -a /dev/sda
smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce 
Allen
Home page is http://smartmontools.sourceforge.net/

Device: ATA      WDC WD1500ADFD-0 Version: 20.0
Serial number:      WD-WMAP41246348
Device type: disk
Local Time is: Fri Feb  9 18:06:23 2007 CET
Device does not support SMART

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging

<--

cu,
  Emmy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
  2007-02-09 17:09       ` Emmeran Seehuber
@ 2007-02-10  6:49         ` Tejun Heo
  2007-02-10  8:42           ` Emmeran Seehuber
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2007-02-10  6:49 UTC (permalink / raw)
  To: Emmeran Seehuber; +Cc: linux-kernel

Emmeran Seehuber wrote:
> # smartctl -a /dev/sda
> smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce 
> Allen
> Home page is http://smartmontools.sourceforge.net/
> 
> Device: ATA      WDC WD1500ADFD-0 Version: 20.0
> Serial number:      WD-WMAP41246348
> Device type: disk
> Local Time is: Fri Feb  9 18:06:23 2007 CET
> Device does not support SMART

Hmmm... Raptor not supporting SMART.  That's weird.  Please try 
'smartctl -d ata -a /dev/sda'.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
  2007-02-10  6:49         ` Tejun Heo
@ 2007-02-10  8:42           ` Emmeran Seehuber
  2007-02-11 22:19             ` Tejun Heo
  0 siblings, 1 reply; 9+ messages in thread
From: Emmeran Seehuber @ 2007-02-10  8:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 192 bytes --]

Am Saturday 10 February 2007 schrieb Tejun Heo:
> Hmmm... Raptor not supporting SMART.  That's weird.  Please try
> 'smartctl -d ata -a /dev/sda'.

The output is attached.

cu,
  Emmy

[-- Attachment #2: smartctl.txt --]
[-- Type: text/plain, Size: 4145 bytes --]

smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD1500ADFD-00NLR1
Serial Number:    WD-WMAP41246348
Firmware Version: 20.07P20
User Capacity:    150.039.945.216 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Not recognized. Minor revision code: 0x1d
Local Time is:    Sat Feb 10 09:31:27 2007 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 (4783) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  72) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   170   170   021    Pre-fail  Always       -       4533
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always       -       59
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   200   200   051    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       2232
 10 Spin_Retry_Count        0x0012   100   253   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       59
194 Temperature_Celsius     0x0022   118   116   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       5
200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
  2007-02-10  8:42           ` Emmeran Seehuber
@ 2007-02-11 22:19             ` Tejun Heo
  0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2007-02-11 22:19 UTC (permalink / raw)
  To: Emmeran Seehuber; +Cc: linux-kernel

Hello, Emmeran.

There is no logged error on drive's side.  Only timeouts on host's side 
with BMDMA engine running.  I dunno specifics of the severwork 
controller but many controllers with BMDMA interface timeouts the 
command if transmission failure occurs, so my primary suspect is still 
hardware transmission problem which seems quite common in SATA world.

Can you try the followings?

1. Use different cable and connect the hdd to different hdd.

2. If possible, connect the harddisk to different power supply.  (I know 
you have juicy PSU but just in case)

Probably applying this to only one machine and leaving the other alone 
as control is a good idea.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
@ 2007-02-09 17:56 koan
  0 siblings, 0 replies; 9+ messages in thread
From: koan @ 2007-02-09 17:56 UTC (permalink / raw)
  To: linux-kernel

I believe you need to add the flag '-d ata' to the smartctl command in
order to see the smart status of a SATA device.

-Jesse

-------------------------------------------------------------
here it is:

-->

# smartctl -a /dev/sda
smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/

Device: ATA      WDC WD1500ADFD-0 Version: 20.0
Serial number:      WD-WMAP41246348
Device type: disk
Local Time is: Fri Feb  9 18:06:23 2007 CET
Device does not support SMART

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging

<--

cu,
 Emmy

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-02-11 22:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-07 17:17 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000)) Emmeran Seehuber
2007-02-09  9:19 ` Tejun Heo
2007-02-09 11:37   ` Emmeran Seehuber
2007-02-09 13:54     ` Tejun Heo
2007-02-09 17:09       ` Emmeran Seehuber
2007-02-10  6:49         ` Tejun Heo
2007-02-10  8:42           ` Emmeran Seehuber
2007-02-11 22:19             ` Tejun Heo
2007-02-09 17:56 koan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).