LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
@ 2008-02-11 16:44 Aron Stansvik
  2008-02-13  8:03 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Aron Stansvik @ 2008-02-11 16:44 UTC (permalink / raw)
  To: linux-kernel

Hello LKML.

Under semi-high disk I/O (e.g. installing a compiled KDE), I get the
following (accompanied by seconds of lock-ups on the machine):

[ 7727.345183] arcmsr0: abort device command of scsi id = 0 lun = 0
[ 7730.348776] arcmsr0:                 scsi id = 0 lun = 0 ccb =
'0xdfb461c0' poll command abort successfully
[ 8053.795943] arcmsr0: abort device command of scsi id = 0 lun = 0
[ 8056.799528] arcmsr0:                 scsi id = 0 lun = 0 ccb =
'0xdfb595e0' poll command abort successfully
[ 8884.592810] arcmsr0: abort device command of scsi id = 0 lun = 0
[ 8887.596392] arcmsr0:                 scsi id = 0 lun = 0 ccb =
'0xdfb56d80' poll command abort successfully
[ 8917.760216] arcmsr0: abort device command of scsi id = 0 lun = 0
[ 8920.763797] arcmsr0:                 scsi id = 0 lun = 0 ccb =
'0xdfb472c0' poll command abort successfully
[ 9074.106547] arcmsr0: abort device command of scsi id = 0 lun = 0

This is my setup:

1 x MSI K8N Master2-FAR
1 x Opteron 252
1 x Areca ARC1200 (sitting in a PCIe x4 socket)
2 x WD1500ADFD in RAID1

astan@rubik:~$ uname -a
Linux rubik 2.6.24-7-generic #1 SMP Thu Feb 7 01:29:58 UTC 2008 i686 GNU/Linux
astan@rubik:~$ modinfo arcmsr
filename:
/lib/modules/2.6.24-7-generic/kernel/drivers/scsi/arcmsr/arcmsr.ko
version:        Driver Version 1.20.00.15 2007/08/30
license:        Dual BSD/GPL
description:    ARECA (ARC11xx/12xx/13xx/16xx) SATA/SAS RAID HOST Adapter
author:         Erich Chen <support@areca.com.tw>
srcversion:     28EAD6AB49D4491CA04D465
[...]

I've read some previous posts here on LKML that it could be the Areca
firmware who doesn't like my WD disks. Anyone know if this is an IRQ
handling problem in the kernel, or if it's a problem with the RAID
controller firmware?

Erich Chen (of Areca); have you tried the new ARC1200 in RAID1
configuration with Raptor disks on Linux?

As a side note, I can tell you that I first tried running FreeBSD 6.3
(RELENG_6) on this machine, but got random reboots during disk I/O
(even with a kernel with KDB debugging turned on). This leads me to
believe that it might be a firmware issue, and that Linux just handles
it more gracefully than FreeBSD.

Any ideas or advice is appriciated. This is my first post to the LKML,
so please instruct me if you want more information or if you want me
to take further debugging actions.

Best regards,
Aron Stansvik

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
  2008-02-11 16:44 Aborted commands with arcmsr and 2xWD1500ADFD in RAID1 Aron Stansvik
@ 2008-02-13  8:03 ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2008-02-13  8:03 UTC (permalink / raw)
  To: Aron Stansvik; +Cc: linux-kernel, linux-scsi, erich


(cc's added)

On Mon, 11 Feb 2008 17:44:08 +0100 "Aron Stansvik" <elvstone@gmail.com> wrote:

> Hello LKML.
> 
> Under semi-high disk I/O (e.g. installing a compiled KDE), I get the
> following (accompanied by seconds of lock-ups on the machine):
> 
> [ 7727.345183] arcmsr0: abort device command of scsi id = 0 lun = 0
> [ 7730.348776] arcmsr0:                 scsi id = 0 lun = 0 ccb =
> '0xdfb461c0' poll command abort successfully
> [ 8053.795943] arcmsr0: abort device command of scsi id = 0 lun = 0
> [ 8056.799528] arcmsr0:                 scsi id = 0 lun = 0 ccb =
> '0xdfb595e0' poll command abort successfully
> [ 8884.592810] arcmsr0: abort device command of scsi id = 0 lun = 0
> [ 8887.596392] arcmsr0:                 scsi id = 0 lun = 0 ccb =
> '0xdfb56d80' poll command abort successfully
> [ 8917.760216] arcmsr0: abort device command of scsi id = 0 lun = 0
> [ 8920.763797] arcmsr0:                 scsi id = 0 lun = 0 ccb =
> '0xdfb472c0' poll command abort successfully
> [ 9074.106547] arcmsr0: abort device command of scsi id = 0 lun = 0
> 
> This is my setup:
> 
> 1 x MSI K8N Master2-FAR
> 1 x Opteron 252
> 1 x Areca ARC1200 (sitting in a PCIe x4 socket)
> 2 x WD1500ADFD in RAID1
> 
> astan@rubik:~$ uname -a
> Linux rubik 2.6.24-7-generic #1 SMP Thu Feb 7 01:29:58 UTC 2008 i686 GNU/Linux
> astan@rubik:~$ modinfo arcmsr
> filename:
> /lib/modules/2.6.24-7-generic/kernel/drivers/scsi/arcmsr/arcmsr.ko
> version:        Driver Version 1.20.00.15 2007/08/30
> license:        Dual BSD/GPL
> description:    ARECA (ARC11xx/12xx/13xx/16xx) SATA/SAS RAID HOST Adapter
> author:         Erich Chen <support@areca.com.tw>
> srcversion:     28EAD6AB49D4491CA04D465
> [...]
> 
> I've read some previous posts here on LKML that it could be the Areca
> firmware who doesn't like my WD disks. Anyone know if this is an IRQ
> handling problem in the kernel, or if it's a problem with the RAID
> controller firmware?
> 
> Erich Chen (of Areca); have you tried the new ARC1200 in RAID1
> configuration with Raptor disks on Linux?
> 
> As a side note, I can tell you that I first tried running FreeBSD 6.3
> (RELENG_6) on this machine, but got random reboots during disk I/O
> (even with a kernel with KDB debugging turned on). This leads me to
> believe that it might be a firmware issue, and that Linux just handles
> it more gracefully than FreeBSD.
> 
> Any ideas or advice is appriciated. This is my first post to the LKML,
> so please instruct me if you want more information or if you want me
> to take further debugging actions.
> 
> Best regards,
> Aron Stansvik


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
  2008-03-14 17:25   ` Aron Stansvik
  2008-03-15 10:14     ` Aron Stansvik
@ 2008-03-17  1:56     ` nickcheng
  1 sibling, 0 replies; 7+ messages in thread
From: nickcheng @ 2008-03-17  1:56 UTC (permalink / raw)
  To: 'Aron Stansvik'; +Cc: akpm, linux-scsi, linux-kernel

Hi Aron,
We have two WDWD1500ADFD and a MSI motherboard which is not exactly the same
as what you have. 
I have tested your case two weeks ago, and do it with copy-compare test
program but I find nothing as you describe.
Could you give me your experimental directory to let me try again?? 
BTW, what is your FW version?
The latest FW version is v1.44. 
Please check it out,
-----Original Message-----
From: Aron Stansvik [mailto:elvstone@gmail.com] 
Sent: Saturday, March 15, 2008 1:26 AM
To: nick.cheng@areca.com.tw
Cc: erich; akpm@linux-foundation.org; linux-scsi@vger.kernel.org;
linux-kernel@vger.kernel.org
Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1

2008/2/26, nickcheng <nick.cheng@areca.com.tw>:
> Hi Aron,
>  Thanks for your patience.
>  If you still got into trouble, please let me know.
>  Thank you again,

I have now tried:

* Turning on/off NCQ in the Areca RAID.
* Turning on/off read-ahead cache in the Areca RAID.
* Putting the disks in anti-vibration mounts in 5.25" slots.
* Switching SATA cables.
* Using legacy ATA power connectors instead of the SATA ones.

But I still have the problem. The power supply is 650W so there should
be plenty of power. There's only two Raptor disks, an Opteron CPU and
an nVidia 6600GT in the machine.

The Raptor two Raptor disks have different firmware on them, could
that cause any problem?

Two people who had read my post here on LKML have contacted me on
e-mail and have the same problem, but they have Seagate and Samsung
disks, and use the 1220 controller.

The problem is hard to trigger, I've not been able to trigger it with
any benchmarking tool, but in ~95% of the cases I can trigger it by
just copying a directory with lots of small files (around 500 MB).

Anyone else seeing this? I'd really like to get it to work since this
is my only computer :(

Should I try with XFS or ReiserFS instead of EXT3?

Regards,
Aron

>
> -----Original Message-----
>  From: Aron Stansvik [mailto:elvstone@gmail.com]
>
> Sent: Tuesday, February 26, 2008 6:52 AM
>  To: erich
>  Cc: nick.cheng@areca.com.tw; akpm@linux-foundation.org;
>  linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
>  Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
>
>  Hi Erich.
>
>  2008/2/25, nickcheng <nick.cheng@areca.com.tw>:
>  > Hi Aron,
>  >  From our field experiences and customers' feedbacks, all of them
direct
>  to
>  >  vibration and power issues.
>  >  The vibration could be caused by FANs not only by themselves.
>
>  Okay. I have a chassi fan that is quite close to the drives, I will
>  try disabling it. I've also ordered two Nexus TwinDisk anti vibration
>  harddrive mounts with which I'll place the disks in my 5.25" slots
>  instead, away from any fans.
>
>  If this doesn't work, I'm stumped, as I really don't think it's the
>  power supply and I don't have the money to buy a new one.
>
>  >  You mentioned it could be the F/W issue.
>  >  If the environment does not meet the prerequisite, FW could not work
>  >  correctly.
>  >  Actually FW just reacts to the situations not it causes the issue.
>
>  Of course, I understand this. Just trying to figure this problem out..
>
>  >  Please check it out!!
>
>  I'll report back with my findings with moving disk away from fans and
>  using anti-vibrations mounts.
>
>  Thanks for taking your time to reply.
>
>  Aron
>
>  >  Thank you,
>  >
>  >
>  >  -----Original Message-----
>  >  From: Aron Stansvik [mailto:elvstone@gmail.com]
>  >  Sent: Sunday, February 24, 2008 1:54 AM
>  >  To: nick.cheng@areca.com.tw
>  >  Cc: erich; akpm@linux-foundation.org; linux-scsi@vger.kernel.org;
>  >  linux-kernel@vger.kernel.org
>  >
>  > Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
>  >
>  >  Hello again Areca and LKML hackers.
>  >
>  >  2008/2/18, Aron Stansvik <elvstone@gmail.com>:
>  >  > Hello Nick.
>  >  >
>  >  >  Sorry that I'm not answering until now. I've been busy.
>  >  >
>  >  >  2008/2/13, nickcheng <nick.cheng@areca.com.tw>:
>  >  >
>  >  > > Hi Aron,
>  >  >  >  From our experience and some customers' feedback, your issue
could
>  be
>  >  caused
>  >  >  >  by power instability or vibration to your HDs.
>  >  >  >  Please check step by step:
>  >  >  >  (1).under your original environment, increase the SCSI command
>  value,
>  >  >  >  default=30, with the shell script, set_scsicmd_timeout(). 90 or
120
>  is
>  >  >  >  enough.
>  >  >  >  (2).if method 1 does not work, find out the vibration source or
>  change
>  >  the
>  >  >  >  power supply
>  >  >
>  >  >
>  >  > I will try to increase that value. I don't think it's vibration; the
>  >  >  disks are firmly in place in a very heavy chassi (Silverstone
>  >  >  SST-TJ05B-T). And I really don't think there's something wrong with
>  >  >  the power supply, it's a pretty new Silverstone ST65ZF 650W. This
is
>  >  >  my own personal workstation, so I don't just have another power
supply
>  >  >  to test with :/
>  >  >
>  >  >  I will report back on my success/failure. Thanks for your answer.
>  >
>  >  I've now tried with both 90 and 120 for the timeout value, and the
>  >  problem still persists. It seems to happen when lots of small writes
>  >  are occuring, e.g. when installing something.
>  >
>  >  I really don't think the disks are vibrating, I don't see how they
>  >  could. One more thing I'm going to test is to use the legacy ATA power
>  >  connector instead of the SATA power connector. This was what I was
>  >  using before when I only had a single drive and no RAID controller.
>  >  Maybe my power supply is malfunctioning and not giving enough power on
>  >  the SATA power connectors.. but I doubt it.
>  >
>  >  Is there anything else that could cause this? Have you guys at Areca
>  >  tested the ARC-1200 with Raptors in RAID1?
>  >
>  >  :(
>  >
>  >  Regards,
>  >  Aron
>  >
>  >  >
>  >  >
>  >  >  Aron
>  >  >
>  >  >
>  >  >  >  If your still have any questions, please feel free to let me
know.
>  >  >  >
>  >  >  >  P.S. The attached driver source, arcmsr-1.20.00.15-71224, has
been
>  >  >  >  upstreamed to kernel.org and will be released in kernel 2.6.25.
If
>  you
>  >  like,
>  >  >  >  you could update your driver with it.
>  >  >  >  It fixes some minor bugs, but these bugs are nothing to do with
>  your
>  >  issue.
>  >  >  >
>  >  >  >
>  >  >  >  -----Original Message-----
>  >  >  >  From: erich [mailto:erich@areca.com.tw]
>  >  >  >  Sent: Wednesday, February 13, 2008 4:33 PM
>  >  >  >  To: (廣安科技)鄭守謙
>  >  >  >  Subject: Fw: Aborted commands with arcmsr and 2xWD1500ADFD in
RAID1
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >  ----- Original Message -----
>  >  >  >  From: "Andrew Morton" <akpm@linux-foundation.org>
>  >  >  >  To: "Aron Stansvik" <elvstone@gmail.com>
>  >  >  >  Cc: <linux-kernel@vger.kernel.org>;
<linux-scsi@vger.kernel.org>;
>  >  "erich"
>  >  >  >  <erich@areca.com.tw>
>  >  >  >  Sent: Wednesday, February 13, 2008 4:03 PM
>  >  >  >  Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in
RAID1
>  >  >  >
>  >  >  >
>  >  >  >  >
>  >  >  >  > (cc's added)
>  >  >  >  >
>  >  >  >  > On Mon, 11 Feb 2008 17:44:08 +0100 "Aron Stansvik"
>  >  <elvstone@gmail.com>
>  >  >  >  > wrote:
>  >  >  >  >
>  >  >  >  >> Hello LKML.
>  >  >  >  >>
>  >  >  >  >> Under semi-high disk I/O (e.g. installing a compiled KDE), I
get
>  >  the
>  >  >  >  >> following (accompanied by seconds of lock-ups on the
machine):
>  >  >  >  >>
>  >  >  >  >> [ 7727.345183] arcmsr0: abort device command of scsi id = 0
lun
>  = 0
>  >  >  >  >> [ 7730.348776] arcmsr0:                 scsi id = 0 lun = 0
ccb
>  =
>  >  >  >  >> '0xdfb461c0' poll command abort successfully
>  >  >  >  >> [ 8053.795943] arcmsr0: abort device command of scsi id = 0
lun
>  = 0
>  >  >  >  >> [ 8056.799528] arcmsr0:                 scsi id = 0 lun = 0
ccb
>  =
>  >  >  >  >> '0xdfb595e0' poll command abort successfully
>  >  >  >  >> [ 8884.592810] arcmsr0: abort device command of scsi id = 0
lun
>  = 0
>  >  >  >  >> [ 8887.596392] arcmsr0:                 scsi id = 0 lun = 0
ccb
>  =
>  >  >  >  >> '0xdfb56d80' poll command abort successfully
>  >  >  >  >> [ 8917.760216] arcmsr0: abort device command of scsi id = 0
lun
>  = 0
>  >  >  >  >> [ 8920.763797] arcmsr0:                 scsi id = 0 lun = 0
ccb
>  =
>  >  >  >  >> '0xdfb472c0' poll command abort successfully
>  >  >  >  >> [ 9074.106547] arcmsr0: abort device command of scsi id = 0
lun
>  = 0
>  >  >  >  >>
>  >  >  >  >> This is my setup:
>  >  >  >  >>
>  >  >  >  >> 1 x MSI K8N Master2-FAR
>  >  >  >  >> 1 x Opteron 252
>  >  >  >  >> 1 x Areca ARC1200 (sitting in a PCIe x4 socket)
>  >  >  >  >> 2 x WD1500ADFD in RAID1
>  >  >  >  >>
>  >  >  >  >> astan@rubik:~$ uname -a
>  >  >  >  >> Linux rubik 2.6.24-7-generic #1 SMP Thu Feb 7 01:29:58 UTC
2008
>  >  i686
>  >  >  >  >> GNU/Linux
>  >  >  >  >> astan@rubik:~$ modinfo arcmsr
>  >  >  >  >> filename:
>  >  >  >  >>
/lib/modules/2.6.24-7-generic/kernel/drivers/scsi/arcmsr/arcmsr.
>  ko
>  >  >  >  >> version:        Driver Version 1.20.00.15 2007/08/30
>  >  >  >  >> license:        Dual BSD/GPL
>  >  >  >  >> description:    ARECA (ARC11xx/12xx/13xx/16xx) SATA/SAS RAID
>  HOST
>  >  Adapter
>  >  >  >  >> author:         Erich Chen <support@areca.com.tw>
>  >  >  >  >> srcversion:     28EAD6AB49D4491CA04D465
>  >  >  >  >> [...]
>  >  >  >  >>
>  >  >  >  >> I've read some previous posts here on LKML that it could be
the
>  >  Areca
>  >  >  >  >> firmware who doesn't like my WD disks. Anyone know if this is
an
>  >  IRQ
>  >  >  >  >> handling problem in the kernel, or if it's a problem with the
>  RAID
>  >  >  >  >> controller firmware?
>  >  >  >  >>
>  >  >  >  >> Erich Chen (of Areca); have you tried the new ARC1200 in
RAID1
>  >  >  >  >> configuration with Raptor disks on Linux?
>  >  >  >  >>
>  >  >  >  >> As a side note, I can tell you that I first tried running
>  FreeBSD
>  >  6.3
>  >  >  >  >> (RELENG_6) on this machine, but got random reboots during
disk
>  I/O
>  >  >  >  >> (even with a kernel with KDB debugging turned on). This leads
me
>  to
>  >  >  >  >> believe that it might be a firmware issue, and that Linux
just
>  >  handles
>  >  >  >  >> it more gracefully than FreeBSD.
>  >  >  >  >>
>  >  >  >  >> Any ideas or advice is appriciated. This is my first post to
the
>  >  LKML,
>  >  >  >  >> so please instruct me if you want more information or if you
>  want
>  >  me
>  >  >  >  >> to take further debugging actions.
>  >  >  >  >>
>  >  >  >  >> Best regards,
>  >  >  >  >> Aron Stansvik
>  >  >  >  >
>  >  >  >
>  >  >  >
>  >  >
>  >
>  >
>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
  2008-03-14 17:25   ` Aron Stansvik
@ 2008-03-15 10:14     ` Aron Stansvik
  2008-03-17  1:56     ` nickcheng
  1 sibling, 0 replies; 7+ messages in thread
From: Aron Stansvik @ 2008-03-15 10:14 UTC (permalink / raw)
  To: nick.cheng; +Cc: erich, akpm, linux-scsi, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=BIG5, Size: 10995 bytes --]

Hello again.
2008/3/14, Aron Stansvik <elvstone@gmail.com>:> 2008/2/26, nickcheng <nick.cheng@areca.com.tw>:>  > Hi Aron,>> >  Thanks for your patience.>  >  If you still got into trouble, please let me know.>  >  Thank you again,>>> I have now tried:>>  * Turning on/off NCQ in the Areca RAID.>  * Turning on/off read-ahead cache in the Areca RAID.>  * Putting the disks in anti-vibration mounts in 5.25" slots.>  * Switching SATA cables.>  * Using legacy ATA power connectors instead of the SATA ones.>>  But I still have the problem. The power supply is 650W so there should>  be plenty of power. There's only two Raptor disks, an Opteron CPU and>  an nVidia 6600GT in the machine.>>  The Raptor two Raptor disks have different firmware on them, could>  that cause any problem?>>  Two people who had read my post here on LKML have contacted me on>  e-mail and have the same problem, but they have Seagate and Samsung>  disks, and use the 1220 controller.>>  The problem is hard to trigger, I've not been able to trigger it with>  any benchmarking tool, but in ~95% of the cases I can trigger it by>  just copying a directory with lots of small files (around 500 MB).>>  Anyone else seeing this? I'd really like to get it to work since this>  is my only computer :(>>  Should I try with XFS or ReiserFS instead of EXT3?
I've now tried with ReiserFS, same problem. My distribution didn'thave XFS as a choice at installation so I haven't tried it yet. I'lltry with a LiveCD that supports XFS later. The two disks in the arrayare the only ones in the system.
Aron
>>  Regards,>> Aron>>>  >>  > -----Original Message----->  >  From: Aron Stansvik [mailto:elvstone@gmail.com]>  >>  > Sent: Tuesday, February 26, 2008 6:52 AM>  >  To: erich>  >  Cc: nick.cheng@areca.com.tw; akpm@linux-foundation.org;>  >  linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org>  >  Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >>  >  Hi Erich.>  >>  >  2008/2/25, nickcheng <nick.cheng@areca.com.tw>:>  >  > Hi Aron,>  >  >  From our field experiences and customers' feedbacks, all of them direct>  >  to>  >  >  vibration and power issues.>  >  >  The vibration could be caused by FANs not only by themselves.>  >>  >  Okay. I have a chassi fan that is quite close to the drives, I will>  >  try disabling it. I've also ordered two Nexus TwinDisk anti vibration>  >  harddrive mounts with which I'll place the disks in my 5.25" slots>  >  instead, away from any fans.>  >>  >  If this doesn't work, I'm stumped, as I really don't think it's the>  >  power supply and I don't have the money to buy a new one.>  >>  >  >  You mentioned it could be the F/W issue.>  >  >  If the environment does not meet the prerequisite, FW could not work>  >  >  correctly.>  >  >  Actually FW just reacts to the situations not it causes the issue.>  >>  >  Of course, I understand this. Just trying to figure this problem out..>  >>  >  >  Please check it out!!>  >>  >  I'll report back with my findings with moving disk away from fans and>  >  using anti-vibrations mounts.>  >>  >  Thanks for taking your time to reply.>  >>  >  Aron>  >>  >  >  Thank you,>  >  >>  >  >>  >  >  -----Original Message----->  >  >  From: Aron Stansvik [mailto:elvstone@gmail.com]>  >  >  Sent: Sunday, February 24, 2008 1:54 AM>  >  >  To: nick.cheng@areca.com.tw>  >  >  Cc: erich; akpm@linux-foundation.org; linux-scsi@vger.kernel.org;>  >  >  linux-kernel@vger.kernel.org>  >  >>  >  > Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >  >>  >  >  Hello again Areca and LKML hackers.>  >  >>  >  >  2008/2/18, Aron Stansvik <elvstone@gmail.com>:>  >  >  > Hello Nick.>  >  >  >>  >  >  >  Sorry that I'm not answering until now. I've been busy.>  >  >  >>  >  >  >  2008/2/13, nickcheng <nick.cheng@areca.com.tw>:>  >  >  >>  >  >  > > Hi Aron,>  >  >  >  >  From our experience and some customers' feedback, your issue could>  >  be>  >  >  caused>  >  >  >  >  by power instability or vibration to your HDs.>  >  >  >  >  Please check step by step:>  >  >  >  >  (1).under your original environment, increase the SCSI command>  >  value,>  >  >  >  >  default=30, with the shell script, set_scsicmd_timeout(). 90 or 120>  >  is>  >  >  >  >  enough.>  >  >  >  >  (2).if method 1 does not work, find out the vibration source or>  >  change>  >  >  the>  >  >  >  >  power supply>  >  >  >>  >  >  >>  >  >  > I will try to increase that value. I don't think it's vibration; the>  >  >  >  disks are firmly in place in a very heavy chassi (Silverstone>  >  >  >  SST-TJ05B-T). And I really don't think there's something wrong with>  >  >  >  the power supply, it's a pretty new Silverstone ST65ZF 650W. This is>  >  >  >  my own personal workstation, so I don't just have another power supply>  >  >  >  to test with :/>  >  >  >>  >  >  >  I will report back on my success/failure. Thanks for your answer.>  >  >>  >  >  I've now tried with both 90 and 120 for the timeout value, and the>  >  >  problem still persists. It seems to happen when lots of small writes>  >  >  are occuring, e.g. when installing something.>  >  >>  >  >  I really don't think the disks are vibrating, I don't see how they>  >  >  could. One more thing I'm going to test is to use the legacy ATA power>  >  >  connector instead of the SATA power connector. This was what I was>  >  >  using before when I only had a single drive and no RAID controller.>  >  >  Maybe my power supply is malfunctioning and not giving enough power on>  >  >  the SATA power connectors.. but I doubt it.>  >  >>  >  >  Is there anything else that could cause this? Have you guys at Areca>  >  >  tested the ARC-1200 with Raptors in RAID1?>  >  >>  >  >  :(>  >  >>  >  >  Regards,>  >  >  Aron>  >  >>  >  >  >>  >  >  >>  >  >  >  Aron>  >  >  >>  >  >  >>  >  >  >  >  If your still have any questions, please feel free to let me know.>  >  >  >  >>  >  >  >  >  P.S. The attached driver source, arcmsr-1.20.00.15-71224, has been>  >  >  >  >  upstreamed to kernel.org and will be released in kernel 2.6.25. If>  >  you>  >  >  like,>  >  >  >  >  you could update your driver with it.>  >  >  >  >  It fixes some minor bugs, but these bugs are nothing to do with>  >  your>  >  >  issue.>  >  >  >  >>  >  >  >  >>  >  >  >  >  -----Original Message----->  >  >  >  >  From: erich [mailto:erich@areca.com.tw]>  >  >  >  >  Sent: Wednesday, February 13, 2008 4:33 PM>  >  >  >  >  To: (¼s¦w¬ì§Þ)¾G¦uÁ¾>  >  >  >  >  Subject: Fw: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >  >  >  >>  >  >  >  >>  >  >  >  >>  >  >  >  >  ----- Original Message ----->  >  >  >  >  From: "Andrew Morton" <akpm@linux-foundation.org>>  >  >  >  >  To: "Aron Stansvik" <elvstone@gmail.com>>  >  >  >  >  Cc: <linux-kernel@vger.kernel.org>; <linux-scsi@vger.kernel.org>;>  >  >  "erich">  >  >  >  >  <erich@areca.com.tw>>  >  >  >  >  Sent: Wednesday, February 13, 2008 4:03 PM>  >  >  >  >  Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >  >  >  >>  >  >  >  >>  >  >  >  >  >>  >  >  >  >  > (cc's added)>  >  >  >  >  >>  >  >  >  >  > On Mon, 11 Feb 2008 17:44:08 +0100 "Aron Stansvik">  >  >  <elvstone@gmail.com>>  >  >  >  >  > wrote:>  >  >  >  >  >>  >  >  >  >  >> Hello LKML.>  >  >  >  >  >>>  >  >  >  >  >> Under semi-high disk I/O (e.g. installing a compiled KDE), I get>  >  >  the>  >  >  >  >  >> following (accompanied by seconds of lock-ups on the machine):>  >  >  >  >  >>>  >  >  >  >  >> [ 7727.345183] arcmsr0: abort device command of scsi id = 0 lun>  >  = 0>  >  >  >  >  >> [ 7730.348776] arcmsr0:                 scsi id = 0 lun = 0 ccb>  >  =>  >  >  >  >  >> '0xdfb461c0' poll command abort successfully>  >  >  >  >  >> [ 8053.795943] arcmsr0: abort device command of scsi id = 0 lun>  >  = 0>  >  >  >  >  >> [ 8056.799528] arcmsr0:                 scsi id = 0 lun = 0 ccb>  >  =>  >  >  >  >  >> '0xdfb595e0' poll command abort successfully>  >  >  >  >  >> [ 8884.592810] arcmsr0: abort device command of scsi id = 0 lun>  >  = 0>  >  >  >  >  >> [ 8887.596392] arcmsr0:                 scsi id = 0 lun = 0 ccb>  >  =>  >  >  >  >  >> '0xdfb56d80' poll command abort successfully>  >  >  >  >  >> [ 8917.760216] arcmsr0: abort device command of scsi id = 0 lun>  >  = 0>  >  >  >  >  >> [ 8920.763797] arcmsr0:                 scsi id = 0 lun = 0 ccb>  >  =>  >  >  >  >  >> '0xdfb472c0' poll command abort successfully>  >  >  >  >  >> [ 9074.106547] arcmsr0: abort device command of scsi id = 0 lun>  >  = 0>  >  >  >  >  >>>  >  >  >  >  >> This is my setup:>  >  >  >  >  >>>  >  >  >  >  >> 1 x MSI K8N Master2-FAR>  >  >  >  >  >> 1 x Opteron 252>  >  >  >  >  >> 1 x Areca ARC1200 (sitting in a PCIe x4 socket)>  >  >  >  >  >> 2 x WD1500ADFD in RAID1>  >  >  >  >  >>>  >  >  >  >  >> astan@rubik:~$ uname -a>  >  >  >  >  >> Linux rubik 2.6.24-7-generic #1 SMP Thu Feb 7 01:29:58 UTC 2008>  >  >  i686>  >  >  >  >  >> GNU/Linux>  >  >  >  >  >> astan@rubik:~$ modinfo arcmsr>  >  >  >  >  >> filename:>  >  >  >  >  >> /lib/modules/2.6.24-7-generic/kernel/drivers/scsi/arcmsr/arcmsr.>  >  ko>  >  >  >  >  >> version:        Driver Version 1.20.00.15 2007/08/30>  >  >  >  >  >> license:        Dual BSD/GPL>  >  >  >  >  >> description:    ARECA (ARC11xx/12xx/13xx/16xx) SATA/SAS RAID>  >  HOST>  >  >  Adapter>  >  >  >  >  >> author:         Erich Chen <support@areca.com.tw>>  >  >  >  >  >> srcversion:     28EAD6AB49D4491CA04D465>  >  >  >  >  >> [...]>  >  >  >  >  >>>  >  >  >  >  >> I've read some previous posts here on LKML that it could be the>  >  >  Areca>  >  >  >  >  >> firmware who doesn't like my WD disks. Anyone know if this is an>  >  >  IRQ>  >  >  >  >  >> handling problem in the kernel, or if it's a problem with the>  >  RAID>  >  >  >  >  >> controller firmware?>  >  >  >  >  >>>  >  >  >  >  >> Erich Chen (of Areca); have you tried the new ARC1200 in RAID1>  >  >  >  >  >> configuration with Raptor disks on Linux?>  >  >  >  >  >>>  >  >  >  >  >> As a side note, I can tell you that I first tried running>  >  FreeBSD>  >  >  6.3>  >  >  >  >  >> (RELENG_6) on this machine, but got random reboots during disk>  >  I/O>  >  >  >  >  >> (even with a kernel with KDB debugging turned on). This leads me>  >  to>  >  >  >  >  >> believe that it might be a firmware issue, and that Linux just>  >  >  handles>  >  >  >  >  >> it more gracefully than FreeBSD.>  >  >  >  >  >>>  >  >  >  >  >> Any ideas or advice is appriciated. This is my first post to the>  >  >  LKML,>  >  >  >  >  >> so please instruct me if you want more information or if you>  >  want>  >  >  me>  >  >  >  >  >> to take further debugging actions.>  >  >  >  >  >>>  >  >  >  >  >> Best regards,>  >  >  >  >  >> Aron Stansvik>  >  >  >  >  >>  >  >  >  >>  >  >  >  >>  >  >  >>  >  >>  >  >>  >>  >>ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
  2008-02-26  2:20 ` nickcheng
@ 2008-03-14 17:25   ` Aron Stansvik
  2008-03-15 10:14     ` Aron Stansvik
  2008-03-17  1:56     ` nickcheng
  0 siblings, 2 replies; 7+ messages in thread
From: Aron Stansvik @ 2008-03-14 17:25 UTC (permalink / raw)
  To: nick.cheng; +Cc: erich, akpm, linux-scsi, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=BIG5, Size: 9883 bytes --]

2008/2/26, nickcheng <nick.cheng@areca.com.tw>:> Hi Aron,>  Thanks for your patience.>  If you still got into trouble, please let me know.>  Thank you again,
I have now tried:
* Turning on/off NCQ in the Areca RAID.* Turning on/off read-ahead cache in the Areca RAID.* Putting the disks in anti-vibration mounts in 5.25" slots.* Switching SATA cables.* Using legacy ATA power connectors instead of the SATA ones.
But I still have the problem. The power supply is 650W so there shouldbe plenty of power. There's only two Raptor disks, an Opteron CPU andan nVidia 6600GT in the machine.
The Raptor two Raptor disks have different firmware on them, couldthat cause any problem?
Two people who had read my post here on LKML have contacted me one-mail and have the same problem, but they have Seagate and Samsungdisks, and use the 1220 controller.
The problem is hard to trigger, I've not been able to trigger it withany benchmarking tool, but in ~95% of the cases I can trigger it byjust copying a directory with lots of small files (around 500 MB).
Anyone else seeing this? I'd really like to get it to work since thisis my only computer :(
Should I try with XFS or ReiserFS instead of EXT3?
Regards,Aron
>> -----Original Message----->  From: Aron Stansvik [mailto:elvstone@gmail.com]>> Sent: Tuesday, February 26, 2008 6:52 AM>  To: erich>  Cc: nick.cheng@areca.com.tw; akpm@linux-foundation.org;>  linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org>  Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>>  Hi Erich.>>  2008/2/25, nickcheng <nick.cheng@areca.com.tw>:>  > Hi Aron,>  >  From our field experiences and customers' feedbacks, all of them direct>  to>  >  vibration and power issues.>  >  The vibration could be caused by FANs not only by themselves.>>  Okay. I have a chassi fan that is quite close to the drives, I will>  try disabling it. I've also ordered two Nexus TwinDisk anti vibration>  harddrive mounts with which I'll place the disks in my 5.25" slots>  instead, away from any fans.>>  If this doesn't work, I'm stumped, as I really don't think it's the>  power supply and I don't have the money to buy a new one.>>  >  You mentioned it could be the F/W issue.>  >  If the environment does not meet the prerequisite, FW could not work>  >  correctly.>  >  Actually FW just reacts to the situations not it causes the issue.>>  Of course, I understand this. Just trying to figure this problem out..>>  >  Please check it out!!>>  I'll report back with my findings with moving disk away from fans and>  using anti-vibrations mounts.>>  Thanks for taking your time to reply.>>  Aron>>  >  Thank you,>  >>  >>  >  -----Original Message----->  >  From: Aron Stansvik [mailto:elvstone@gmail.com]>  >  Sent: Sunday, February 24, 2008 1:54 AM>  >  To: nick.cheng@areca.com.tw>  >  Cc: erich; akpm@linux-foundation.org; linux-scsi@vger.kernel.org;>  >  linux-kernel@vger.kernel.org>  >>  > Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >>  >  Hello again Areca and LKML hackers.>  >>  >  2008/2/18, Aron Stansvik <elvstone@gmail.com>:>  >  > Hello Nick.>  >  >>  >  >  Sorry that I'm not answering until now. I've been busy.>  >  >>  >  >  2008/2/13, nickcheng <nick.cheng@areca.com.tw>:>  >  >>  >  > > Hi Aron,>  >  >  >  From our experience and some customers' feedback, your issue could>  be>  >  caused>  >  >  >  by power instability or vibration to your HDs.>  >  >  >  Please check step by step:>  >  >  >  (1).under your original environment, increase the SCSI command>  value,>  >  >  >  default=30, with the shell script, set_scsicmd_timeout(). 90 or 120>  is>  >  >  >  enough.>  >  >  >  (2).if method 1 does not work, find out the vibration source or>  change>  >  the>  >  >  >  power supply>  >  >>  >  >>  >  > I will try to increase that value. I don't think it's vibration; the>  >  >  disks are firmly in place in a very heavy chassi (Silverstone>  >  >  SST-TJ05B-T). And I really don't think there's something wrong with>  >  >  the power supply, it's a pretty new Silverstone ST65ZF 650W. This is>  >  >  my own personal workstation, so I don't just have another power supply>  >  >  to test with :/>  >  >>  >  >  I will report back on my success/failure. Thanks for your answer.>  >>  >  I've now tried with both 90 and 120 for the timeout value, and the>  >  problem still persists. It seems to happen when lots of small writes>  >  are occuring, e.g. when installing something.>  >>  >  I really don't think the disks are vibrating, I don't see how they>  >  could. One more thing I'm going to test is to use the legacy ATA power>  >  connector instead of the SATA power connector. This was what I was>  >  using before when I only had a single drive and no RAID controller.>  >  Maybe my power supply is malfunctioning and not giving enough power on>  >  the SATA power connectors.. but I doubt it.>  >>  >  Is there anything else that could cause this? Have you guys at Areca>  >  tested the ARC-1200 with Raptors in RAID1?>  >>  >  :(>  >>  >  Regards,>  >  Aron>  >>  >  >>  >  >>  >  >  Aron>  >  >>  >  >>  >  >  >  If your still have any questions, please feel free to let me know.>  >  >  >>  >  >  >  P.S. The attached driver source, arcmsr-1.20.00.15-71224, has been>  >  >  >  upstreamed to kernel.org and will be released in kernel 2.6.25. If>  you>  >  like,>  >  >  >  you could update your driver with it.>  >  >  >  It fixes some minor bugs, but these bugs are nothing to do with>  your>  >  issue.>  >  >  >>  >  >  >>  >  >  >  -----Original Message----->  >  >  >  From: erich [mailto:erich@areca.com.tw]>  >  >  >  Sent: Wednesday, February 13, 2008 4:33 PM>  >  >  >  To: (¼s¦w¬ì§Þ)¾G¦uÁ¾>  >  >  >  Subject: Fw: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >  >  >>  >  >  >>  >  >  >>  >  >  >  ----- Original Message ----->  >  >  >  From: "Andrew Morton" <akpm@linux-foundation.org>>  >  >  >  To: "Aron Stansvik" <elvstone@gmail.com>>  >  >  >  Cc: <linux-kernel@vger.kernel.org>; <linux-scsi@vger.kernel.org>;>  >  "erich">  >  >  >  <erich@areca.com.tw>>  >  >  >  Sent: Wednesday, February 13, 2008 4:03 PM>  >  >  >  Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >  >  >>  >  >  >>  >  >  >  >>  >  >  >  > (cc's added)>  >  >  >  >>  >  >  >  > On Mon, 11 Feb 2008 17:44:08 +0100 "Aron Stansvik">  >  <elvstone@gmail.com>>  >  >  >  > wrote:>  >  >  >  >>  >  >  >  >> Hello LKML.>  >  >  >  >>>  >  >  >  >> Under semi-high disk I/O (e.g. installing a compiled KDE), I get>  >  the>  >  >  >  >> following (accompanied by seconds of lock-ups on the machine):>  >  >  >  >>>  >  >  >  >> [ 7727.345183] arcmsr0: abort device command of scsi id = 0 lun>  = 0>  >  >  >  >> [ 7730.348776] arcmsr0:                 scsi id = 0 lun = 0 ccb>  =>  >  >  >  >> '0xdfb461c0' poll command abort successfully>  >  >  >  >> [ 8053.795943] arcmsr0: abort device command of scsi id = 0 lun>  = 0>  >  >  >  >> [ 8056.799528] arcmsr0:                 scsi id = 0 lun = 0 ccb>  =>  >  >  >  >> '0xdfb595e0' poll command abort successfully>  >  >  >  >> [ 8884.592810] arcmsr0: abort device command of scsi id = 0 lun>  = 0>  >  >  >  >> [ 8887.596392] arcmsr0:                 scsi id = 0 lun = 0 ccb>  =>  >  >  >  >> '0xdfb56d80' poll command abort successfully>  >  >  >  >> [ 8917.760216] arcmsr0: abort device command of scsi id = 0 lun>  = 0>  >  >  >  >> [ 8920.763797] arcmsr0:                 scsi id = 0 lun = 0 ccb>  =>  >  >  >  >> '0xdfb472c0' poll command abort successfully>  >  >  >  >> [ 9074.106547] arcmsr0: abort device command of scsi id = 0 lun>  = 0>  >  >  >  >>>  >  >  >  >> This is my setup:>  >  >  >  >>>  >  >  >  >> 1 x MSI K8N Master2-FAR>  >  >  >  >> 1 x Opteron 252>  >  >  >  >> 1 x Areca ARC1200 (sitting in a PCIe x4 socket)>  >  >  >  >> 2 x WD1500ADFD in RAID1>  >  >  >  >>>  >  >  >  >> astan@rubik:~$ uname -a>  >  >  >  >> Linux rubik 2.6.24-7-generic #1 SMP Thu Feb 7 01:29:58 UTC 2008>  >  i686>  >  >  >  >> GNU/Linux>  >  >  >  >> astan@rubik:~$ modinfo arcmsr>  >  >  >  >> filename:>  >  >  >  >> /lib/modules/2.6.24-7-generic/kernel/drivers/scsi/arcmsr/arcmsr.>  ko>  >  >  >  >> version:        Driver Version 1.20.00.15 2007/08/30>  >  >  >  >> license:        Dual BSD/GPL>  >  >  >  >> description:    ARECA (ARC11xx/12xx/13xx/16xx) SATA/SAS RAID>  HOST>  >  Adapter>  >  >  >  >> author:         Erich Chen <support@areca.com.tw>>  >  >  >  >> srcversion:     28EAD6AB49D4491CA04D465>  >  >  >  >> [...]>  >  >  >  >>>  >  >  >  >> I've read some previous posts here on LKML that it could be the>  >  Areca>  >  >  >  >> firmware who doesn't like my WD disks. Anyone know if this is an>  >  IRQ>  >  >  >  >> handling problem in the kernel, or if it's a problem with the>  RAID>  >  >  >  >> controller firmware?>  >  >  >  >>>  >  >  >  >> Erich Chen (of Areca); have you tried the new ARC1200 in RAID1>  >  >  >  >> configuration with Raptor disks on Linux?>  >  >  >  >>>  >  >  >  >> As a side note, I can tell you that I first tried running>  FreeBSD>  >  6.3>  >  >  >  >> (RELENG_6) on this machine, but got random reboots during disk>  I/O>  >  >  >  >> (even with a kernel with KDB debugging turned on). This leads me>  to>  >  >  >  >> believe that it might be a firmware issue, and that Linux just>  >  handles>  >  >  >  >> it more gracefully than FreeBSD.>  >  >  >  >>>  >  >  >  >> Any ideas or advice is appriciated. This is my first post to the>  >  LKML,>  >  >  >  >> so please instruct me if you want more information or if you>  want>  >  me>  >  >  >  >> to take further debugging actions.>  >  >  >  >>>  >  >  >  >> Best regards,>  >  >  >  >> Aron Stansvik>  >  >  >  >>  >  >  >>  >  >  >>  >  >>  >>  >>>ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
       [not found] <751a4f870802251451k2f6355d0o110d74cd226f0f84@mail.gmail.com>
@ 2008-02-26  2:20 ` nickcheng
  2008-03-14 17:25   ` Aron Stansvik
  0 siblings, 1 reply; 7+ messages in thread
From: nickcheng @ 2008-02-26  2:20 UTC (permalink / raw)
  To: 'Aron Stansvik', 'erich'; +Cc: akpm, linux-scsi, linux-kernel

Hi Aron,
Thanks for your patience.
If you still got into trouble, please let me know.
Thank you again,
-----Original Message-----
From: Aron Stansvik [mailto:elvstone@gmail.com] 
Sent: Tuesday, February 26, 2008 6:52 AM
To: erich
Cc: nick.cheng@areca.com.tw; akpm@linux-foundation.org;
linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1

Hi Erich.

2008/2/25, nickcheng <nick.cheng@areca.com.tw>:
> Hi Aron,
>  From our field experiences and customers' feedbacks, all of them direct
to
>  vibration and power issues.
>  The vibration could be caused by FANs not only by themselves.

Okay. I have a chassi fan that is quite close to the drives, I will
try disabling it. I've also ordered two Nexus TwinDisk anti vibration
harddrive mounts with which I'll place the disks in my 5.25" slots
instead, away from any fans.

If this doesn't work, I'm stumped, as I really don't think it's the
power supply and I don't have the money to buy a new one.

>  You mentioned it could be the F/W issue.
>  If the environment does not meet the prerequisite, FW could not work
>  correctly.
>  Actually FW just reacts to the situations not it causes the issue.

Of course, I understand this. Just trying to figure this problem out..

>  Please check it out!!

I'll report back with my findings with moving disk away from fans and
using anti-vibrations mounts.

Thanks for taking your time to reply.

Aron

>  Thank you,
>
>
>  -----Original Message-----
>  From: Aron Stansvik [mailto:elvstone@gmail.com]
>  Sent: Sunday, February 24, 2008 1:54 AM
>  To: nick.cheng@areca.com.tw
>  Cc: erich; akpm@linux-foundation.org; linux-scsi@vger.kernel.org;
>  linux-kernel@vger.kernel.org
>
> Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
>
>  Hello again Areca and LKML hackers.
>
>  2008/2/18, Aron Stansvik <elvstone@gmail.com>:
>  > Hello Nick.
>  >
>  >  Sorry that I'm not answering until now. I've been busy.
>  >
>  >  2008/2/13, nickcheng <nick.cheng@areca.com.tw>:
>  >
>  > > Hi Aron,
>  >  >  From our experience and some customers' feedback, your issue could
be
>  caused
>  >  >  by power instability or vibration to your HDs.
>  >  >  Please check step by step:
>  >  >  (1).under your original environment, increase the SCSI command
value,
>  >  >  default=30, with the shell script, set_scsicmd_timeout(). 90 or 120
is
>  >  >  enough.
>  >  >  (2).if method 1 does not work, find out the vibration source or
change
>  the
>  >  >  power supply
>  >
>  >
>  > I will try to increase that value. I don't think it's vibration; the
>  >  disks are firmly in place in a very heavy chassi (Silverstone
>  >  SST-TJ05B-T). And I really don't think there's something wrong with
>  >  the power supply, it's a pretty new Silverstone ST65ZF 650W. This is
>  >  my own personal workstation, so I don't just have another power supply
>  >  to test with :/
>  >
>  >  I will report back on my success/failure. Thanks for your answer.
>
>  I've now tried with both 90 and 120 for the timeout value, and the
>  problem still persists. It seems to happen when lots of small writes
>  are occuring, e.g. when installing something.
>
>  I really don't think the disks are vibrating, I don't see how they
>  could. One more thing I'm going to test is to use the legacy ATA power
>  connector instead of the SATA power connector. This was what I was
>  using before when I only had a single drive and no RAID controller.
>  Maybe my power supply is malfunctioning and not giving enough power on
>  the SATA power connectors.. but I doubt it.
>
>  Is there anything else that could cause this? Have you guys at Areca
>  tested the ARC-1200 with Raptors in RAID1?
>
>  :(
>
>  Regards,
>  Aron
>
>  >
>  >
>  >  Aron
>  >
>  >
>  >  >  If your still have any questions, please feel free to let me know.
>  >  >
>  >  >  P.S. The attached driver source, arcmsr-1.20.00.15-71224, has been
>  >  >  upstreamed to kernel.org and will be released in kernel 2.6.25. If
you
>  like,
>  >  >  you could update your driver with it.
>  >  >  It fixes some minor bugs, but these bugs are nothing to do with
your
>  issue.
>  >  >
>  >  >
>  >  >  -----Original Message-----
>  >  >  From: erich [mailto:erich@areca.com.tw]
>  >  >  Sent: Wednesday, February 13, 2008 4:33 PM
>  >  >  To: (廣安科技)鄭守謙
>  >  >  Subject: Fw: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
>  >  >
>  >  >
>  >  >
>  >  >  ----- Original Message -----
>  >  >  From: "Andrew Morton" <akpm@linux-foundation.org>
>  >  >  To: "Aron Stansvik" <elvstone@gmail.com>
>  >  >  Cc: <linux-kernel@vger.kernel.org>; <linux-scsi@vger.kernel.org>;
>  "erich"
>  >  >  <erich@areca.com.tw>
>  >  >  Sent: Wednesday, February 13, 2008 4:03 PM
>  >  >  Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
>  >  >
>  >  >
>  >  >  >
>  >  >  > (cc's added)
>  >  >  >
>  >  >  > On Mon, 11 Feb 2008 17:44:08 +0100 "Aron Stansvik"
>  <elvstone@gmail.com>
>  >  >  > wrote:
>  >  >  >
>  >  >  >> Hello LKML.
>  >  >  >>
>  >  >  >> Under semi-high disk I/O (e.g. installing a compiled KDE), I get
>  the
>  >  >  >> following (accompanied by seconds of lock-ups on the machine):
>  >  >  >>
>  >  >  >> [ 7727.345183] arcmsr0: abort device command of scsi id = 0 lun
= 0
>  >  >  >> [ 7730.348776] arcmsr0:                 scsi id = 0 lun = 0 ccb
=
>  >  >  >> '0xdfb461c0' poll command abort successfully
>  >  >  >> [ 8053.795943] arcmsr0: abort device command of scsi id = 0 lun
= 0
>  >  >  >> [ 8056.799528] arcmsr0:                 scsi id = 0 lun = 0 ccb
=
>  >  >  >> '0xdfb595e0' poll command abort successfully
>  >  >  >> [ 8884.592810] arcmsr0: abort device command of scsi id = 0 lun
= 0
>  >  >  >> [ 8887.596392] arcmsr0:                 scsi id = 0 lun = 0 ccb
=
>  >  >  >> '0xdfb56d80' poll command abort successfully
>  >  >  >> [ 8917.760216] arcmsr0: abort device command of scsi id = 0 lun
= 0
>  >  >  >> [ 8920.763797] arcmsr0:                 scsi id = 0 lun = 0 ccb
=
>  >  >  >> '0xdfb472c0' poll command abort successfully
>  >  >  >> [ 9074.106547] arcmsr0: abort device command of scsi id = 0 lun
= 0
>  >  >  >>
>  >  >  >> This is my setup:
>  >  >  >>
>  >  >  >> 1 x MSI K8N Master2-FAR
>  >  >  >> 1 x Opteron 252
>  >  >  >> 1 x Areca ARC1200 (sitting in a PCIe x4 socket)
>  >  >  >> 2 x WD1500ADFD in RAID1
>  >  >  >>
>  >  >  >> astan@rubik:~$ uname -a
>  >  >  >> Linux rubik 2.6.24-7-generic #1 SMP Thu Feb 7 01:29:58 UTC 2008
>  i686
>  >  >  >> GNU/Linux
>  >  >  >> astan@rubik:~$ modinfo arcmsr
>  >  >  >> filename:
>  >  >  >> /lib/modules/2.6.24-7-generic/kernel/drivers/scsi/arcmsr/arcmsr.
ko
>  >  >  >> version:        Driver Version 1.20.00.15 2007/08/30
>  >  >  >> license:        Dual BSD/GPL
>  >  >  >> description:    ARECA (ARC11xx/12xx/13xx/16xx) SATA/SAS RAID
HOST
>  Adapter
>  >  >  >> author:         Erich Chen <support@areca.com.tw>
>  >  >  >> srcversion:     28EAD6AB49D4491CA04D465
>  >  >  >> [...]
>  >  >  >>
>  >  >  >> I've read some previous posts here on LKML that it could be the
>  Areca
>  >  >  >> firmware who doesn't like my WD disks. Anyone know if this is an
>  IRQ
>  >  >  >> handling problem in the kernel, or if it's a problem with the
RAID
>  >  >  >> controller firmware?
>  >  >  >>
>  >  >  >> Erich Chen (of Areca); have you tried the new ARC1200 in RAID1
>  >  >  >> configuration with Raptor disks on Linux?
>  >  >  >>
>  >  >  >> As a side note, I can tell you that I first tried running
FreeBSD
>  6.3
>  >  >  >> (RELENG_6) on this machine, but got random reboots during disk
I/O
>  >  >  >> (even with a kernel with KDB debugging turned on). This leads me
to
>  >  >  >> believe that it might be a firmware issue, and that Linux just
>  handles
>  >  >  >> it more gracefully than FreeBSD.
>  >  >  >>
>  >  >  >> Any ideas or advice is appriciated. This is my first post to the
>  LKML,
>  >  >  >> so please instruct me if you want more information or if you
want
>  me
>  >  >  >> to take further debugging actions.
>  >  >  >>
>  >  >  >> Best regards,
>  >  >  >> Aron Stansvik
>  >  >  >
>  >  >
>  >  >
>  >
>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
       [not found] <751a4f870802230954m6db5963dnb7865005a60a7bf0@mail.gmail.com>
@ 2008-02-25  4:33 ` nickcheng
  0 siblings, 0 replies; 7+ messages in thread
From: nickcheng @ 2008-02-25  4:33 UTC (permalink / raw)
  To: 'Aron Stansvik'; +Cc: 'erich', akpm, linux-scsi, linux-kernel

Hi Aron,
>From our field experiences and customers' feedbacks, all of them direct to
vibration and power issues.
The vibration could be caused by FANs not only by themselves.
You mentioned it could be the F/W issue.
If the environment does not meet the prerequisite, FW could not work
correctly.
Actually FW just reacts to the situations not it causes the issue.
Please check it out!!
Thank you,

-----Original Message-----
From: Aron Stansvik [mailto:elvstone@gmail.com] 
Sent: Sunday, February 24, 2008 1:54 AM
To: nick.cheng@areca.com.tw
Cc: erich; akpm@linux-foundation.org; linux-scsi@vger.kernel.org;
linux-kernel@vger.kernel.org
Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1

Hello again Areca and LKML hackers.

2008/2/18, Aron Stansvik <elvstone@gmail.com>:
> Hello Nick.
>
>  Sorry that I'm not answering until now. I've been busy.
>
>  2008/2/13, nickcheng <nick.cheng@areca.com.tw>:
>
> > Hi Aron,
>  >  From our experience and some customers' feedback, your issue could be
caused
>  >  by power instability or vibration to your HDs.
>  >  Please check step by step:
>  >  (1).under your original environment, increase the SCSI command value,
>  >  default=30, with the shell script, set_scsicmd_timeout(). 90 or 120 is
>  >  enough.
>  >  (2).if method 1 does not work, find out the vibration source or change
the
>  >  power supply
>
>
> I will try to increase that value. I don't think it's vibration; the
>  disks are firmly in place in a very heavy chassi (Silverstone
>  SST-TJ05B-T). And I really don't think there's something wrong with
>  the power supply, it's a pretty new Silverstone ST65ZF 650W. This is
>  my own personal workstation, so I don't just have another power supply
>  to test with :/
>
>  I will report back on my success/failure. Thanks for your answer.

I've now tried with both 90 and 120 for the timeout value, and the
problem still persists. It seems to happen when lots of small writes
are occuring, e.g. when installing something.

I really don't think the disks are vibrating, I don't see how they
could. One more thing I'm going to test is to use the legacy ATA power
connector instead of the SATA power connector. This was what I was
using before when I only had a single drive and no RAID controller.
Maybe my power supply is malfunctioning and not giving enough power on
the SATA power connectors.. but I doubt it.

Is there anything else that could cause this? Have you guys at Areca
tested the ARC-1200 with Raptors in RAID1?

:(

Regards,
Aron

>
>
>  Aron
>
>
>  >  If your still have any questions, please feel free to let me know.
>  >
>  >  P.S. The attached driver source, arcmsr-1.20.00.15-71224, has been
>  >  upstreamed to kernel.org and will be released in kernel 2.6.25. If you
like,
>  >  you could update your driver with it.
>  >  It fixes some minor bugs, but these bugs are nothing to do with your
issue.
>  >
>  >
>  >  -----Original Message-----
>  >  From: erich [mailto:erich@areca.com.tw]
>  >  Sent: Wednesday, February 13, 2008 4:33 PM
>  >  To: (廣安科技)鄭守謙
>  >  Subject: Fw: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
>  >
>  >
>  >
>  >  ----- Original Message -----
>  >  From: "Andrew Morton" <akpm@linux-foundation.org>
>  >  To: "Aron Stansvik" <elvstone@gmail.com>
>  >  Cc: <linux-kernel@vger.kernel.org>; <linux-scsi@vger.kernel.org>;
"erich"
>  >  <erich@areca.com.tw>
>  >  Sent: Wednesday, February 13, 2008 4:03 PM
>  >  Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1
>  >
>  >
>  >  >
>  >  > (cc's added)
>  >  >
>  >  > On Mon, 11 Feb 2008 17:44:08 +0100 "Aron Stansvik"
<elvstone@gmail.com>
>  >  > wrote:
>  >  >
>  >  >> Hello LKML.
>  >  >>
>  >  >> Under semi-high disk I/O (e.g. installing a compiled KDE), I get
the
>  >  >> following (accompanied by seconds of lock-ups on the machine):
>  >  >>
>  >  >> [ 7727.345183] arcmsr0: abort device command of scsi id = 0 lun = 0
>  >  >> [ 7730.348776] arcmsr0:                 scsi id = 0 lun = 0 ccb =
>  >  >> '0xdfb461c0' poll command abort successfully
>  >  >> [ 8053.795943] arcmsr0: abort device command of scsi id = 0 lun = 0
>  >  >> [ 8056.799528] arcmsr0:                 scsi id = 0 lun = 0 ccb =
>  >  >> '0xdfb595e0' poll command abort successfully
>  >  >> [ 8884.592810] arcmsr0: abort device command of scsi id = 0 lun = 0
>  >  >> [ 8887.596392] arcmsr0:                 scsi id = 0 lun = 0 ccb =
>  >  >> '0xdfb56d80' poll command abort successfully
>  >  >> [ 8917.760216] arcmsr0: abort device command of scsi id = 0 lun = 0
>  >  >> [ 8920.763797] arcmsr0:                 scsi id = 0 lun = 0 ccb =
>  >  >> '0xdfb472c0' poll command abort successfully
>  >  >> [ 9074.106547] arcmsr0: abort device command of scsi id = 0 lun = 0
>  >  >>
>  >  >> This is my setup:
>  >  >>
>  >  >> 1 x MSI K8N Master2-FAR
>  >  >> 1 x Opteron 252
>  >  >> 1 x Areca ARC1200 (sitting in a PCIe x4 socket)
>  >  >> 2 x WD1500ADFD in RAID1
>  >  >>
>  >  >> astan@rubik:~$ uname -a
>  >  >> Linux rubik 2.6.24-7-generic #1 SMP Thu Feb 7 01:29:58 UTC 2008
i686
>  >  >> GNU/Linux
>  >  >> astan@rubik:~$ modinfo arcmsr
>  >  >> filename:
>  >  >> /lib/modules/2.6.24-7-generic/kernel/drivers/scsi/arcmsr/arcmsr.ko
>  >  >> version:        Driver Version 1.20.00.15 2007/08/30
>  >  >> license:        Dual BSD/GPL
>  >  >> description:    ARECA (ARC11xx/12xx/13xx/16xx) SATA/SAS RAID HOST
Adapter
>  >  >> author:         Erich Chen <support@areca.com.tw>
>  >  >> srcversion:     28EAD6AB49D4491CA04D465
>  >  >> [...]
>  >  >>
>  >  >> I've read some previous posts here on LKML that it could be the
Areca
>  >  >> firmware who doesn't like my WD disks. Anyone know if this is an
IRQ
>  >  >> handling problem in the kernel, or if it's a problem with the RAID
>  >  >> controller firmware?
>  >  >>
>  >  >> Erich Chen (of Areca); have you tried the new ARC1200 in RAID1
>  >  >> configuration with Raptor disks on Linux?
>  >  >>
>  >  >> As a side note, I can tell you that I first tried running FreeBSD
6.3
>  >  >> (RELENG_6) on this machine, but got random reboots during disk I/O
>  >  >> (even with a kernel with KDB debugging turned on). This leads me to
>  >  >> believe that it might be a firmware issue, and that Linux just
handles
>  >  >> it more gracefully than FreeBSD.
>  >  >>
>  >  >> Any ideas or advice is appriciated. This is my first post to the
LKML,
>  >  >> so please instruct me if you want more information or if you want
me
>  >  >> to take further debugging actions.
>  >  >>
>  >  >> Best regards,
>  >  >> Aron Stansvik
>  >  >
>  >
>  >
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-03-17  1:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-11 16:44 Aborted commands with arcmsr and 2xWD1500ADFD in RAID1 Aron Stansvik
2008-02-13  8:03 ` Andrew Morton
     [not found] <751a4f870802230954m6db5963dnb7865005a60a7bf0@mail.gmail.com>
2008-02-25  4:33 ` nickcheng
     [not found] <751a4f870802251451k2f6355d0o110d74cd226f0f84@mail.gmail.com>
2008-02-26  2:20 ` nickcheng
2008-03-14 17:25   ` Aron Stansvik
2008-03-15 10:14     ` Aron Stansvik
2008-03-17  1:56     ` nickcheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).