LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "linux-os \(Dick Johnson\)" <linux-os@analogic.com>
To: "Olivier Galibert" <galibert@pobox.com>
Cc: "Hack inc." <linux-kernel@vger.kernel.org>
Subject: Re: What does this scsi error mean ?
Date: Tue, 16 Jan 2007 10:16:22 -0500	[thread overview]
Message-ID: <Pine.LNX.4.61.0701160959450.8079@chaos.analogic.com> (raw)
In-Reply-To: <20070115214503.GA56952@dspnet.fr.eu.org>


On Mon, 15 Jan 2007, Olivier Galibert wrote:

> On Mon, Jan 15, 2007 at 06:45:40PM +0000, Alan wrote:
>> On Mon, 15 Jan 2007 18:16:02 +0100
>> Olivier Galibert <galibert@pobox.com> wrote:
>>
>>> sd 0:0:0:0: SCSI error: return code = 0x08000002
>>> sda: Current: sense key: Hardware Error
>>>     ASC=0x42 ASCQ=0x0
>>
>> I'll give you a clue: The words "Hardware Error".
>>
>> Run a SCSI verify pass on the drive with some drive utilities and see
>> what happens. If you are lucky it'll just reallocate blocks and decide
>> the drive is ok, if not well see what the smart data thinks.
>
> Both smart and the internal blade diagnostics say "everything is a-ok
> with the drive, there hasn't been any error ever except a bunch of
> corrected ECC ones, and no more than with a similar drive in another
> working blade".  Hence my initial post.  "Hardware error" is kinda
> imprecise, so I was wondering whether it was unexpected controller
> answer, detected transmission error, block write error, sector not
> found...  Is there a way to have more information?
>
>  OG.

Correctable SCSI errors show that the data in a sector was not properly
read, but the device was able to fix the data error because of the
redundancy in the CRC. The error could be permanently fixed is you
rewrote the sector. You probably don't know where the bad sector is
without adding a printk() to driver code. Some BIOS SCSI utilities
(Adaptec) have the capability of reading an entire drive and fixing
bad sectors either by rewrite or relocation. Since drives can be
accessed as files, you could write a utility that opens the RAW
device with in NOT mounted, reads a bunch of sectors, then writes
them back. To do this, you need to verify that lseek() works on
your particular drive because you need to write the data back to
the same offset that you read it from. I mention this because
the raw r/w of an early Adaptec (aha1542) driver, didn't impliment
lseek, just returned 'okay'. You can imagine the mess I made of
a drive with that controller!

Once you verify that lseek works, the rest of the code is trivial.
I suggest reading then writing 64 kilobytes at a time. It will seem
to take 'forever', but the retries on these relatively short groups
of sectors (128 sectors), will be short when errors are encountered.

Make sure the drive is either not mounted or mounted r/o.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.67 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

  parent reply	other threads:[~2007-01-16 15:16 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-15 17:16 Olivier Galibert
2007-01-15 18:45 ` Alan
2007-01-15 21:45   ` Olivier Galibert
2007-01-15 23:14     ` Alan
2007-01-16  0:10       ` Olivier Galibert
2007-01-18 14:08       ` Olivier Galibert
2007-02-07 17:43         ` Olivier Galibert
2007-01-16 15:16     ` linux-os (Dick Johnson) [this message]
2007-01-16 15:47       ` Alan
2007-01-16 17:25         ` Olivier Galibert
2007-01-15 23:27 ` Stefan Richter
2007-01-15 23:35   ` Olivier Galibert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.61.0701160959450.8079@chaos.analogic.com \
    --to=linux-os@analogic.com \
    --cc=galibert@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: What does this scsi error mean ?' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).