LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Hans-Peter Jansen <hpj@urpla.net>
To: Roger Heflin <rogerheflin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org
Subject: Re: 2.6.24.3: regular sata drive resets - worrisome?
Date: Sat, 22 Mar 2008 00:06:48 +0100	[thread overview]
Message-ID: <200803220006.49609.hpj@urpla.net> (raw)
In-Reply-To: <47E3FF30.3090300@gmail.com>

Am Freitag, 21. März 2008 schrieb Roger Heflin:
> Andrew Morton wrote:
> > (cc linux-ide)
> > (regression?)
> >
> > On Thu, 20 Mar 2008 15:18:31 +0100 Hans-Peter Jansen <hpj@urpla.net> 
wrote:
> >> Hi,
> >>
> >> since I upgraded to 2.6.24.3 on one of my production systems, I see
> >> regular device resets like these:
>
> Hans,
>
> What kernel were you using before you updated to that kernel?

You don't what to know that ;-) (cough 2.6.11 cough)

> >> Mar 20 14:33:03 lisa5 kernel: ata2.00: exception Emask 0x0 SAct 0x0
> >> SErr 0x0 action 0x2 frozen Mar 20 14:33:03 lisa5 kernel: ata2.00: cmd
> >> ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Mar 20 14:33:03 lisa5
> >> kernel:          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4
> >> (timeout) Mar 20 14:33:03 lisa5 kernel: ata2.00: status: { DRDY }
> >> Mar 20 14:33:03 lisa5 kernel: ata2: hard resetting link
> >> Mar 20 14:33:05 lisa5 kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
> >> SControl 0) Mar 20 14:33:05 lisa5 kernel: ata2.00: configured for
> >> UDMA/100 Mar 20 14:33:05 lisa5 kernel: ata2: EH complete
> >> Mar 20 14:33:05 lisa5 kernel: sd 2:0:0:0: [sdc] 488397168 512-byte
> >> hardware sectors (250059 MB) Mar 20 14:33:05 lisa5 kernel: sd 2:0:0:0:
> >> [sdc] Write Protect is off Mar 20 14:33:05 lisa5 kernel: sd 2:0:0:0:
> >> [sdc] Mode Sense: 00 3a 00 00 Mar 20 14:33:05 lisa5 kernel: sd
> >> 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> >> support DPO or FUA Mar 20 14:36:11 lisa5 kernel: ata3.00: exception
> >> Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Mar 20 14:36:11 lisa5
> >> kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Mar 20
> >> 14:36:11 lisa5 kernel:          res
> >> 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Mar 20
> >> 14:36:11 lisa5 kernel: ata3.00: status: { DRDY }
> >> Mar 20 14:36:11 lisa5 kernel: ata3: hard resetting link
> >> Mar 20 14:36:13 lisa5 kernel: ata3: SATA link up 3.0 Gbps (SStatus 123
> >> SControl 0) Mar 20 14:36:13 lisa5 kernel: ata3.00: configured for
> >> UDMA/100 Mar 20 14:36:13 lisa5 kernel: ata3: EH complete
> >> Mar 20 14:36:13 lisa5 kernel: sd 3:0:0:0: [sdd] 488397168 512-byte
> >> hardware sectors (250059 MB) Mar 20 14:36:13 lisa5 kernel: sd 3:0:0:0:
> >> [sdd] Write Protect is off Mar 20 14:36:13 lisa5 kernel: sd 3:0:0:0:
> >> [sdd] Mode Sense: 00 3a 00 00 Mar 20 14:36:13 lisa5 kernel: sd
> >> 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
> >> support DPO or FUA
> >>
> >> Should I be worried? smartd doesn't show anything suspicious on those.
>
> Andrew,
>
> I don't think it is a recent regression, I have seen it happening for a
> while on my machine, I don't think it is causing any crashes but I am
> getting unexplained events about 1x per month that appear to deadlock a
> number of things (machine is up, but top won't run and vmstat actually
> gets a FP exception on the second sample, and a number of other things
> have issues until reboot).

Well, that doesn't sound reassuring, does it?

> I have 4 identical disks, 2 on a sata_sil and 2 on another controller,
> the ones on the sil controller have this behavior, I have seen it in
> 2.6.23.1, FC7-2.6.23.15-80 and FC7-2.6.22.9-91.   My sil is a 4-port 3114
> PCI card, and my disks are 500GB Western Digital disks.  I have a fairly
> long run with 20-30 events on the 2 disks on the sata_sil and no events
> on the identical non-sil disks that had previously been getting resets
> (when on the sil controller), and since they are under software raid5 all
> 4 disks should have very very similar IO loads.

Okay, those resets may happen without further consequences, but they're 
disturbing nevertheless. 

BTW, I'm preparing a hardware reorg, which will eliminate this controller 
during this weekend.. Well, to be correct, the other is the one, that 
really nags me for some time now (3ware 9xxx-8). Swapping both with one 
Areca 1130, which performes _much_ better.

Pete

  reply	other threads:[~2008-03-21 23:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-20 14:18 Hans-Peter Jansen
2008-03-21  4:48 ` Andrew Morton
2008-03-21 18:32   ` Roger Heflin
2008-03-21 23:06     ` Hans-Peter Jansen [this message]
2008-03-29 12:58   ` Tejun Heo
2008-03-30  0:14     ` Hans-Peter Jansen
2008-03-30  0:54       ` Tejun Heo
2008-03-30 12:00         ` Hans-Peter Jansen
2008-03-30 12:41           ` Roger Heflin
2008-03-31  4:33             ` Tejun Heo
2008-04-01 19:27               ` Roger Heflin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200803220006.49609.hpj@urpla.net \
    --to=hpj@urpla.net \
    --cc=akpm@linux-foundation.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rogerheflin@gmail.com \
    --subject='Re: 2.6.24.3: regular sata drive resets - worrisome?' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).