LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard @ 2008-10-28 17:01 Oskar Liljeblad 2008-10-28 17:59 ` David Rees 2008-10-28 23:25 ` Phillip O'Donnell 0 siblings, 2 replies; 18+ messages in thread From: Oskar Liljeblad @ 2008-10-28 17:01 UTC (permalink / raw) To: linux-kernel Can anyone make any sense of these SATA errors? They're killing my md RAID5 (at least the second error did). Hard drives (ata1/sda, ata2/sdb, ata3/sdc): Seagate ST31500341AS 1.5TB SATA Motherboard: Asus M3A78-EH with AMD 780G/SB700 chipset SATA driver: ahci 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] Smart reports no errors on the drives, short & long tests have been run as well. The system is brand new. I've read some reports about SATA 3.0 Gbps vs 1.5 Gbps problems and I'm considering limiting the drives to 1.5 Gbps using jumpers. Would that be a good idea? 19:24:26 ata2: exception Emask 0x50 SAct 0x0 SErr 0x90a02 action 0xe frozen 19:24:26 ata2: irq_stat 0x00400000, PHY RDY changed 19:24:26 ata2: SError: { RecovComm Persist HostInt PHYRdyChg 10B8B } 19:24:26 ata2: hard resetting link 19:24:27 ata2: SATA link down (SStatus 0 SControl 300) 19:24:30 ata2: hard resetting link 19:24:35 ata2: link is slow to respond, please be patient (ready=0) 19:24:38 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) 19:24:38 ata2.00: configured for UDMA/133 19:24:38 ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4 19:24:38 ata2: irq_stat 0x00000040, connection status changed 19:24:38 ata2.00: configured for UDMA/133 19:24:38 ata2: EH complete And then the day after: 09:07:49 ata3.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x6 frozen 09:07:49 ata3: SError: { HostInt } 09:07:49 ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 09:07:49 res 40/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x44 (timeout) 09:07:49 ata3.00: status: { DRDY } 09:07:49 ata3: hard resetting link 09:07:49 ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) 09:07:49 ata3.00: configured for UDMA/133 09:07:49 ata3: EH complete 09:07:49 sd 2:0:0:0: [sdc] 2930277168 512-byte hardware sectors (1500302 MB) 09:07:49 sd 2:0:0:0: [sdc] Write Protect is off 09:07:49 sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 09:07:49 sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA 09:07:49 end_request: I/O error, dev sdc, sector 8 09:07:49 md: super_written gets error=-5, uptodate=0 09:07:49 raid5: Disk failure on sdc, disabling device. 09:07:49 raid5: Operation continuing on 1 devices. For reference: ata1: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fff900 irq 22 ata2: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fff980 irq 22 ata3: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fffa00 irq 22 ata4: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fffa80 irq 22 ata5: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fffb00 irq 22 ata6: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fffb80 irq 22 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: HPA detected: current 2930277168, native 18446744072344861488 ata1.00: ATA-8: ST31500341AS, SD17, max UDMA/133 ata1.00: 2930277168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: HPA detected: current 2930277168, native 18446744072344861488 ata2.00: ATA-8: ST31500341AS, SD17, max UDMA/133 ata2.00: 2930277168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: HPA detected: current 2930277168, native 18446744072344861488 ata3.00: ATA-8: ST31500341AS, SD17, max UDMA/133 ata3.00: 2930277168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 Regards, Oskar ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard 2008-10-28 17:01 sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard Oskar Liljeblad @ 2008-10-28 17:59 ` David Rees 2008-10-28 23:25 ` Phillip O'Donnell 1 sibling, 0 replies; 18+ messages in thread From: David Rees @ 2008-10-28 17:59 UTC (permalink / raw) To: Oskar Liljeblad; +Cc: linux-kernel On Tue, Oct 28, 2008 at 10:01 AM, Oskar Liljeblad <oskar@osk.mine.nu> wrote: > Can anyone make any sense of these SATA errors? They're killing my md RAID5 > (at least the second error did). > > Hard drives (ata1/sda, ata2/sdb, ata3/sdc): Seagate ST31500341AS 1.5TB SATA > Motherboard: Asus M3A78-EH with AMD 780G/SB700 chipset > SATA driver: ahci > 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] Seems to be a known issue with the 1.5TB drives and Linux (and Mac users, too) http://forums.seagate.com/stx/board/message?board.id=ata_drives&thread.id=2390 http://ubuntuforums.org/showthread.php?t=933053 -Dave ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard 2008-10-28 17:01 sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard Oskar Liljeblad 2008-10-28 17:59 ` David Rees @ 2008-10-28 23:25 ` Phillip O'Donnell 2008-10-28 23:52 ` [PATCH] libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127 Roland Dreier 1 sibling, 1 reply; 18+ messages in thread From: Phillip O'Donnell @ 2008-10-28 23:25 UTC (permalink / raw) To: Oskar Liljeblad, jeff; +Cc: linux-kernel Hi, I've got this issue, and I'm involved in the thread on the Seagate forums. I've been going through the libata code with a fine tooth comb to see if I can find the issue, and so far - not a lot of joy. However, and this is more directed to Jeff Garzik, there is a minor display bug with drives that have more than 2^31 sectors. The messages: ata3.00: HPA detected: current 2930277168, native 18446744072344861488 is a bug. The two sector counts are calculated from different ATA commands and are parsed differently: Current Sector Count is retrieved from the IDENTIFY result (words 100-103), and calculated with the ata_id_u64() macro Native Sector Count (LBA48 max) is retrieved from the READ NATIVE MAX ADDRESS EXT command, and calculated with the ata_tf_to_lba48() function. ata_tf_to_lba48() seems to be overflowing when the total size will be greater than 2^31 sectors, while ata_id_u64() does not. I noticed an identical bug in the latest release of hdparm 8.9, even returning an identical native sector count, but hdparm gets its information from the IDENTIFY result. I've been able to patch hdparm to display correctly. Haven't yet tried to patch ata_tf_to_lba48() because the data is stored differently and haven't had the time to figure it out yet. I have some code that shows the bug in action against the hdparm implementation, won't be hard to modify to prove the bug against the ata_tf_to_lba48() implementation, but I'm not at home at the moment and can't send it through. I can also send through the appropriate values for words 100 - 103. All that said, this does NOT appear to be causing the issues that both you and I are suffering from - I can't see anywhere in libata that uses the ata_tf_to_lba48() function other than the HPA detection code, and it seems purely display related only, although Jeff would hopefully be able to comment further on this and whether there could be other code doing LBA48 calculations like this. Cheers, Phillip On Wed, Oct 29, 2008 at 6:01 AM, Oskar Liljeblad <oskar@osk.mine.nu> wrote: > > Can anyone make any sense of these SATA errors? They're killing my md RAID5 > (at least the second error did). > > Hard drives (ata1/sda, ata2/sdb, ata3/sdc): Seagate ST31500341AS 1.5TB SATA > Motherboard: Asus M3A78-EH with AMD 780G/SB700 chipset > SATA driver: ahci > 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] > > Smart reports no errors on the drives, short & long tests have been run as > well. The system is brand new. > > I've read some reports about SATA 3.0 Gbps vs 1.5 Gbps problems and I'm > considering limiting the drives to 1.5 Gbps using jumpers. Would that be a > good idea? > > 19:24:26 ata2: exception Emask 0x50 SAct 0x0 SErr 0x90a02 action 0xe frozen > 19:24:26 ata2: irq_stat 0x00400000, PHY RDY changed > 19:24:26 ata2: SError: { RecovComm Persist HostInt PHYRdyChg 10B8B } > 19:24:26 ata2: hard resetting link > 19:24:27 ata2: SATA link down (SStatus 0 SControl 300) > 19:24:30 ata2: hard resetting link > 19:24:35 ata2: link is slow to respond, please be patient (ready=0) > 19:24:38 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > 19:24:38 ata2.00: configured for UDMA/133 > 19:24:38 ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4 > 19:24:38 ata2: irq_stat 0x00000040, connection status changed > 19:24:38 ata2.00: configured for UDMA/133 > 19:24:38 ata2: EH complete > > And then the day after: > > 09:07:49 ata3.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x6 frozen > 09:07:49 ata3: SError: { HostInt } > 09:07:49 ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 > 09:07:49 res 40/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x44 (timeout) > 09:07:49 ata3.00: status: { DRDY } > 09:07:49 ata3: hard resetting link > 09:07:49 ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > 09:07:49 ata3.00: configured for UDMA/133 > 09:07:49 ata3: EH complete > 09:07:49 sd 2:0:0:0: [sdc] 2930277168 512-byte hardware sectors (1500302 MB) > 09:07:49 sd 2:0:0:0: [sdc] Write Protect is off > 09:07:49 sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 > 09:07:49 sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > 09:07:49 end_request: I/O error, dev sdc, sector 8 > 09:07:49 md: super_written gets error=-5, uptodate=0 > 09:07:49 raid5: Disk failure on sdc, disabling device. > 09:07:49 raid5: Operation continuing on 1 devices. > > For reference: > > ata1: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fff900 irq 22 > ata2: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fff980 irq 22 > ata3: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fffa00 irq 22 > ata4: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fffa80 irq 22 > ata5: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fffb00 irq 22 > ata6: SATA max UDMA/133 abar m1024@0xf8fff800 port 0xf8fffb80 irq 22 > ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > ata1.00: HPA detected: current 2930277168, native 18446744072344861488 > ata1.00: ATA-8: ST31500341AS, SD17, max UDMA/133 > ata1.00: 2930277168 sectors, multi 16: LBA48 NCQ (depth 31/32) > ata1.00: configured for UDMA/133 > ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > ata2.00: HPA detected: current 2930277168, native 18446744072344861488 > ata2.00: ATA-8: ST31500341AS, SD17, max UDMA/133 > ata2.00: 2930277168 sectors, multi 16: LBA48 NCQ (depth 31/32) > ata2.00: configured for UDMA/133 > ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > ata3.00: HPA detected: current 2930277168, native 18446744072344861488 > ata3.00: ATA-8: ST31500341AS, SD17, max UDMA/133 > ata3.00: 2930277168 sectors, multi 16: LBA48 NCQ (depth 31/32) > ata3.00: configured for UDMA/133 > > Regards, > > Oskar > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127 2008-10-28 23:25 ` Phillip O'Donnell @ 2008-10-28 23:52 ` Roland Dreier 2008-10-29 2:04 ` Phillip O'Donnell ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Roland Dreier @ 2008-10-28 23:52 UTC (permalink / raw) To: Phillip O'Donnell, jeff; +Cc: Oskar Liljeblad, jeff, linux-kernel In ata_tf_to_lba48(), when evaluating (tf->hob_lbal & 0xff) << 24 the expression is promoted to signed int (since int can hold all values of u8). However, if hob_lbal is 128 or more, then it is treated as a negative signed value and sign-extended when promoted to u64 to | into sectors, which leads to the MSB 32 bits of section getting set incorrectly. For example, Phillip O'Donnell <phillip.odonnell@gmail.com> reported that a 1.5GB drive caused: ata3.00: HPA detected: current 2930277168, native 18446744072344861488 where 2930277168 == 0xAEA87B30 and 18446744072344861488 == 0xffffffffaea87b30 which shows the problem when hob_lbal is 0xae. Fix this by adding a cast to u64, just as is used by for hob_lbah and hob_lbam in the function. Reported-by: Phillip O'Donnell <phillip.odonnell@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> --- Phillip, this should fix at least your cosmetic issue; can you test it and report back? Thanks, Roland drivers/ata/libata-core.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index bbb3cae..10424ff 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -1268,7 +1268,7 @@ u64 ata_tf_to_lba48(const struct ata_taskfile *tf) sectors |= ((u64)(tf->hob_lbah & 0xff)) << 40; sectors |= ((u64)(tf->hob_lbam & 0xff)) << 32; - sectors |= (tf->hob_lbal & 0xff) << 24; + sectors |= ((u64)(tf->hob_lbal & 0xff)) << 24; sectors |= (tf->lbah & 0xff) << 16; sectors |= (tf->lbam & 0xff) << 8; sectors |= (tf->lbal & 0xff); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127 2008-10-28 23:52 ` [PATCH] libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127 Roland Dreier @ 2008-10-29 2:04 ` Phillip O'Donnell 2008-11-04 18:34 ` [PATCH] libata: Avoid overflow in ata_tf_read_block() " Roland Dreier 2008-10-29 13:28 ` [PATCH] libata: Avoid overflow in ata_tf_to_lba48() " Phillip O'Donnell 2008-10-31 5:45 ` Jeff Garzik 2 siblings, 1 reply; 18+ messages in thread From: Phillip O'Donnell @ 2008-10-29 2:04 UTC (permalink / raw) To: Roland Dreier; +Cc: jeff, Oskar Liljeblad, linux-kernel Hey Roland, Sure thing - I'll give that a try tonight. Just had a cursory glance over libata-core.c and I've noticed that ata_tf_read_block() uses hob_lbal in the same uncast fashion for LBA48 - reckon that one needs patching too? Only seems to be used in libata-scsi.c within ata_gen_ata_sense() Cheers, Phillip On Wed, Oct 29, 2008 at 12:52 PM, Roland Dreier <rdreier@cisco.com> wrote: > In ata_tf_to_lba48(), when evaluating > > (tf->hob_lbal & 0xff) << 24 > > the expression is promoted to signed int (since int can hold all values > of u8). However, if hob_lbal is 128 or more, then it is treated as a > negative signed value and sign-extended when promoted to u64 to | into > sectors, which leads to the MSB 32 bits of section getting set > incorrectly. > > For example, Phillip O'Donnell <phillip.odonnell@gmail.com> reported > that a 1.5GB drive caused: > > ata3.00: HPA detected: current 2930277168, native 18446744072344861488 > > where 2930277168 == 0xAEA87B30 and 18446744072344861488 == 0xffffffffaea87b30 > which shows the problem when hob_lbal is 0xae. > > Fix this by adding a cast to u64, just as is used by for hob_lbah and > hob_lbam in the function. > > Reported-by: Phillip O'Donnell <phillip.odonnell@gmail.com> > Signed-off-by: Roland Dreier <rolandd@cisco.com> > --- > Phillip, this should fix at least your cosmetic issue; can you test it > and report back? > > Thanks, > Roland > > drivers/ata/libata-core.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index bbb3cae..10424ff 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -1268,7 +1268,7 @@ u64 ata_tf_to_lba48(const struct ata_taskfile *tf) > > sectors |= ((u64)(tf->hob_lbah & 0xff)) << 40; > sectors |= ((u64)(tf->hob_lbam & 0xff)) << 32; > - sectors |= (tf->hob_lbal & 0xff) << 24; > + sectors |= ((u64)(tf->hob_lbal & 0xff)) << 24; > sectors |= (tf->lbah & 0xff) << 16; > sectors |= (tf->lbam & 0xff) << 8; > sectors |= (tf->lbal & 0xff); > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] libata: Avoid overflow in ata_tf_read_block() when tf->hba_lbal > 127 2008-10-29 2:04 ` Phillip O'Donnell @ 2008-11-04 18:34 ` Roland Dreier [not found] ` <7a9b5c320811041441q78920938q58ed7ab3cbe97253@mail.gmail.com> 2008-11-11 8:02 ` Jeff Garzik 0 siblings, 2 replies; 18+ messages in thread From: Roland Dreier @ 2008-11-04 18:34 UTC (permalink / raw) To: jeff; +Cc: Phillip O'Donnell, Oskar Liljeblad, linux-kernel Phillip O'Donnell <phillip.odonnell@gmail.com> pointed out that the same sign extension bug that was fixed in commit ba14a9c2 ("libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127") also appears to exist in ata_tf_read_block(). Fix this by adding a cast to u64. Signed-off-by: Roland Dreier <rolandd@cisco.com> --- I don't have any way to test this -- I guess you would have to get an error on a block above 2G (ie data above 1TB)? But it looks "obviously correct" enough to add to -next I guess. drivers/ata/libata-core.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 622350d..a6ad862 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -612,7 +612,7 @@ u64 ata_tf_read_block(struct ata_taskfile *tf, struct ata_device *dev) if (tf->flags & ATA_TFLAG_LBA48) { block |= (u64)tf->hob_lbah << 40; block |= (u64)tf->hob_lbam << 32; - block |= tf->hob_lbal << 24; + block |= (u64)tf->hob_lbal << 24; } else block |= (tf->device & 0xf) << 24; ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <7a9b5c320811041441q78920938q58ed7ab3cbe97253@mail.gmail.com>]
* Re: [PATCH] libata: Avoid overflow in ata_tf_read_block() when tf->hba_lbal > 127 [not found] ` <7a9b5c320811041441q78920938q58ed7ab3cbe97253@mail.gmail.com> @ 2008-11-04 22:44 ` Phillip O'Donnell 0 siblings, 0 replies; 18+ messages in thread From: Phillip O'Donnell @ 2008-11-04 22:44 UTC (permalink / raw) To: linux-kernel Thanks Roland, Right now, my observations indicate that my original fault occurs when a command (of any type, e.g, read or flush ... ) times out, which should trigger the sense routines. Next thing I need to do is add some debug code to identify whether this only occurs if the timeout happens on a sector above 2^31. I've identified a reasonably reliable testcase for my fault, so I'll add that patch and see if it occurs. I'll let you know how it pans out. Cheers, Phillip On Wed, Nov 5, 2008 at 7:34 AM, Roland Dreier <rdreier@cisco.com> wrote: > > Phillip O'Donnell <phillip.odonnell@gmail.com> pointed out that the same > sign extension bug that was fixed in commit ba14a9c2 ("libata: Avoid > overflow in ata_tf_to_lba48() when tf->hba_lbal > 127") also appears to > exist in ata_tf_read_block(). Fix this by adding a cast to u64. > > Signed-off-by: Roland Dreier <rolandd@cisco.com> > --- > I don't have any way to test this -- I guess you would have to get an > error on a block above 2G (ie data above 1TB)? But it looks "obviously > correct" enough to add to -next I guess. > > drivers/ata/libata-core.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index 622350d..a6ad862 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -612,7 +612,7 @@ u64 ata_tf_read_block(struct ata_taskfile *tf, struct ata_device *dev) > if (tf->flags & ATA_TFLAG_LBA48) { > block |= (u64)tf->hob_lbah << 40; > block |= (u64)tf->hob_lbam << 32; > - block |= tf->hob_lbal << 24; > + block |= (u64)tf->hob_lbal << 24; > } else > block |= (tf->device & 0xf) << 24; > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] libata: Avoid overflow in ata_tf_read_block() when tf->hba_lbal > 127 2008-11-04 18:34 ` [PATCH] libata: Avoid overflow in ata_tf_read_block() " Roland Dreier [not found] ` <7a9b5c320811041441q78920938q58ed7ab3cbe97253@mail.gmail.com> @ 2008-11-11 8:02 ` Jeff Garzik 1 sibling, 0 replies; 18+ messages in thread From: Jeff Garzik @ 2008-11-11 8:02 UTC (permalink / raw) To: Roland Dreier; +Cc: Phillip O'Donnell, Oskar Liljeblad, linux-kernel Roland Dreier wrote: > Phillip O'Donnell <phillip.odonnell@gmail.com> pointed out that the same > sign extension bug that was fixed in commit ba14a9c2 ("libata: Avoid > overflow in ata_tf_to_lba48() when tf->hba_lbal > 127") also appears to > exist in ata_tf_read_block(). Fix this by adding a cast to u64. > > Signed-off-by: Roland Dreier <rolandd@cisco.com> > --- > I don't have any way to test this -- I guess you would have to get an > error on a block above 2G (ie data above 1TB)? But it looks "obviously > correct" enough to add to -next I guess. > > drivers/ata/libata-core.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index 622350d..a6ad862 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -612,7 +612,7 @@ u64 ata_tf_read_block(struct ata_taskfile *tf, struct ata_device *dev) > if (tf->flags & ATA_TFLAG_LBA48) { > block |= (u64)tf->hob_lbah << 40; > block |= (u64)tf->hob_lbam << 32; > - block |= tf->hob_lbal << 24; > + block |= (u64)tf->hob_lbal << 24; > } else > block |= (tf->device & 0xf) << 24; applied ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127 2008-10-28 23:52 ` [PATCH] libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127 Roland Dreier 2008-10-29 2:04 ` Phillip O'Donnell @ 2008-10-29 13:28 ` Phillip O'Donnell 2008-10-31 5:45 ` Jeff Garzik 2 siblings, 0 replies; 18+ messages in thread From: Phillip O'Donnell @ 2008-10-29 13:28 UTC (permalink / raw) To: Roland Dreier; +Cc: jeff, linux-kernel Confirmed - HPA is no longer detected on boot. Cheers, Phillip On Wed, Oct 29, 2008 at 12:52 PM, Roland Dreier <rdreier@cisco.com> wrote: > In ata_tf_to_lba48(), when evaluating > > (tf->hob_lbal & 0xff) << 24 > > the expression is promoted to signed int (since int can hold all values > of u8). However, if hob_lbal is 128 or more, then it is treated as a > negative signed value and sign-extended when promoted to u64 to | into > sectors, which leads to the MSB 32 bits of section getting set > incorrectly. > > For example, Phillip O'Donnell <phillip.odonnell@gmail.com> reported > that a 1.5GB drive caused: > > ata3.00: HPA detected: current 2930277168, native 18446744072344861488 > > where 2930277168 == 0xAEA87B30 and 18446744072344861488 == 0xffffffffaea87b30 > which shows the problem when hob_lbal is 0xae. > > Fix this by adding a cast to u64, just as is used by for hob_lbah and > hob_lbam in the function. > > Reported-by: Phillip O'Donnell <phillip.odonnell@gmail.com> > Signed-off-by: Roland Dreier <rolandd@cisco.com> > --- > Phillip, this should fix at least your cosmetic issue; can you test it > and report back? > > Thanks, > Roland > > drivers/ata/libata-core.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index bbb3cae..10424ff 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -1268,7 +1268,7 @@ u64 ata_tf_to_lba48(const struct ata_taskfile *tf) > > sectors |= ((u64)(tf->hob_lbah & 0xff)) << 40; > sectors |= ((u64)(tf->hob_lbam & 0xff)) << 32; > - sectors |= (tf->hob_lbal & 0xff) << 24; > + sectors |= ((u64)(tf->hob_lbal & 0xff)) << 24; > sectors |= (tf->lbah & 0xff) << 16; > sectors |= (tf->lbam & 0xff) << 8; > sectors |= (tf->lbal & 0xff); > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127 2008-10-28 23:52 ` [PATCH] libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127 Roland Dreier 2008-10-29 2:04 ` Phillip O'Donnell 2008-10-29 13:28 ` [PATCH] libata: Avoid overflow in ata_tf_to_lba48() " Phillip O'Donnell @ 2008-10-31 5:45 ` Jeff Garzik 2 siblings, 0 replies; 18+ messages in thread From: Jeff Garzik @ 2008-10-31 5:45 UTC (permalink / raw) To: Roland Dreier; +Cc: Phillip O'Donnell, Oskar Liljeblad, linux-kernel Roland Dreier wrote: > In ata_tf_to_lba48(), when evaluating > > (tf->hob_lbal & 0xff) << 24 > > the expression is promoted to signed int (since int can hold all values > of u8). However, if hob_lbal is 128 or more, then it is treated as a > negative signed value and sign-extended when promoted to u64 to | into > sectors, which leads to the MSB 32 bits of section getting set > incorrectly. > > For example, Phillip O'Donnell <phillip.odonnell@gmail.com> reported > that a 1.5GB drive caused: > > ata3.00: HPA detected: current 2930277168, native 18446744072344861488 > > where 2930277168 == 0xAEA87B30 and 18446744072344861488 == 0xffffffffaea87b30 > which shows the problem when hob_lbal is 0xae. > > Fix this by adding a cast to u64, just as is used by for hob_lbah and > hob_lbam in the function. > > Reported-by: Phillip O'Donnell <phillip.odonnell@gmail.com> > Signed-off-by: Roland Dreier <rolandd@cisco.com> > --- > Phillip, this should fix at least your cosmetic issue; can you test it > and report back? > > Thanks, > Roland > > drivers/ata/libata-core.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) applied ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <fa.01zEaARwrup2dCOTuHTYxzuS9BI@ifi.uio.no>]
* Re: sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard [not found] <fa.01zEaARwrup2dCOTuHTYxzuS9BI@ifi.uio.no> @ 2008-10-28 23:19 ` Robert Hancock 2008-10-29 18:58 ` Oskar Liljeblad 0 siblings, 1 reply; 18+ messages in thread From: Robert Hancock @ 2008-10-28 23:19 UTC (permalink / raw) To: Oskar Liljeblad; +Cc: linux-kernel Oskar Liljeblad wrote: > Can anyone make any sense of these SATA errors? They're killing my md RAID5 > (at least the second error did). > > Hard drives (ata1/sda, ata2/sdb, ata3/sdc): Seagate ST31500341AS 1.5TB SATA > Motherboard: Asus M3A78-EH with AMD 780G/SB700 chipset > SATA driver: ahci > 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] > > Smart reports no errors on the drives, short & long tests have been run as > well. The system is brand new. > > I've read some reports about SATA 3.0 Gbps vs 1.5 Gbps problems and I'm > considering limiting the drives to 1.5 Gbps using jumpers. Would that be a > good idea? > > 19:24:26 ata2: exception Emask 0x50 SAct 0x0 SErr 0x90a02 action 0xe frozen > 19:24:26 ata2: irq_stat 0x00400000, PHY RDY changed > 19:24:26 ata2: SError: { RecovComm Persist HostInt PHYRdyChg 10B8B } RecovComm: Communications between device and host temporarily lost, but regained Persist: Persistent communication or data integrity error HostInt: Host bus adapter internal error PHYRdyChg: PhyRdy signal changed state 10B8B: 10b to 8b decoding error occurred Sounds like the drive and the controller are unhappy with each other, or there's some kind of communications or hardware problem. Not likely a kernel issue. It's unclear if limiting to 1.5 Gbps would help, you could try it and see.. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard 2008-10-28 23:19 ` sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard Robert Hancock @ 2008-10-29 18:58 ` Oskar Liljeblad 2008-10-29 20:17 ` Alan Cox 0 siblings, 1 reply; 18+ messages in thread From: Oskar Liljeblad @ 2008-10-29 18:58 UTC (permalink / raw) To: Robert Hancock; +Cc: linux-kernel On Tuesday, October 28, 2008 at 17:37, Robert Hancock wrote: [..] >> Hard drives (ata1/sda, ata2/sdb, ata3/sdc): Seagate ST31500341AS 1.5TB SATA [..] >> 19:24:26 ata2: exception Emask 0x50 SAct 0x0 SErr 0x90a02 action 0xe frozen >> 19:24:26 ata2: irq_stat 0x00400000, PHY RDY changed >> 19:24:26 ata2: SError: { RecovComm Persist HostInt PHYRdyChg 10B8B } [..] > Sounds like the drive and the controller are unhappy with each other, or > there's some kind of communications or hardware problem. Not likely a > kernel issue. > > It's unclear if limiting to 1.5 Gbps would help, you could try it and see.. It seems the solution is to disable write caching. So far no errors, and other people confirm this solution. Anyway Seagate (inofficially) claims it's a driver issue in Linux - from http://forums.seagate.com/stx/board/message?board.id=ata_drives&thread.id=2390&view=by_date_ascending&page=6 "We already know about the Linux issue, it is indeed a kernel error causing the problem as it was explained to me by one of our developers." Regards, Oskar ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard 2008-10-29 18:58 ` Oskar Liljeblad @ 2008-10-29 20:17 ` Alan Cox 2008-10-29 20:23 ` Ric Wheeler 0 siblings, 1 reply; 18+ messages in thread From: Alan Cox @ 2008-10-29 20:17 UTC (permalink / raw) To: Oskar Liljeblad; +Cc: Robert Hancock, linux-kernel O> Anyway Seagate (inofficially) claims it's a driver issue in Linux - from > http://forums.seagate.com/stx/board/message?board.id=ata_drives&thread.id=2390&view=by_date_ascending&page=6 Well then perhaps they would care to share the information with us 8) The HPA size reporting one is certinly Linux, the flush cache one I don't think is. The fact Mac people report it and Seagate suggest workarounds of the form of "don't use 33% of the disk" don't inspire confidence. > "We already know about the Linux issue, it is indeed a kernel error > causing the problem as it was explained to me by one of our developers." Well if they'd care to explain it to linux-ide perhaps we can find a work around. I would be cautious about disabling the write caching as it will harm both performance and probably drive lifetime. Alan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard 2008-10-29 20:17 ` Alan Cox @ 2008-10-29 20:23 ` Ric Wheeler 2008-10-29 20:52 ` Phillip O'Donnell 0 siblings, 1 reply; 18+ messages in thread From: Ric Wheeler @ 2008-10-29 20:23 UTC (permalink / raw) To: Alan Cox; +Cc: Oskar Liljeblad, Robert Hancock, linux-kernel Alan Cox wrote: > O> Anyway Seagate (inofficially) claims it's a driver issue in Linux - from > >> http://forums.seagate.com/stx/board/message?board.id=ata_drives&thread.id=2390&view=by_date_ascending&page=6 >> > > Well then perhaps they would care to share the information with us 8) > > The HPA size reporting one is certinly Linux, the flush cache one I don't > think is. The fact Mac people report it and Seagate suggest workarounds > of the form of "don't use 33% of the disk" don't inspire confidence. > I suspect that the drive is simply choking on the barrier related cache flushing that we do - that seemed to be the MacOS error as well. The windows comment suggested that windows had an hba/driver bug (most likely unrelated to this). If you want to avoid the issue until they fix the drive, you could run fast and dangerous (mount without barriers on) or slow and safe (disable the write cache). > >> "We already know about the Linux issue, it is indeed a kernel error >> causing the problem as it was explained to me by one of our developers." >> > > Well if they'd care to explain it to linux-ide perhaps we can find a work > around. I would be cautious about disabling the write caching as it will > harm both performance and probably drive lifetime. > > Alan > This looks to me to be a drive firmware issue. I would wait until someone can test with their announced firmware upgrade before looking at the kernel :-) ric ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard 2008-10-29 20:23 ` Ric Wheeler @ 2008-10-29 20:52 ` Phillip O'Donnell 2008-10-29 22:37 ` Ric Wheeler 0 siblings, 1 reply; 18+ messages in thread From: Phillip O'Donnell @ 2008-10-29 20:52 UTC (permalink / raw) To: Ric Wheeler; +Cc: Alan Cox, Oskar Liljeblad, Robert Hancock, linux-kernel Hi Ric, Not too sure about that - I run with XFS, which announces that it disables the barriers on my devices (I use LVM on top of them) but still get the same issue... Unless I've misunderstood your comment? Cheers, Phillip > I suspect that the drive is simply choking on the barrier related cache > flushing that we do - that seemed to be the MacOS error as well. The windows > comment suggested that windows had an hba/driver bug (most likely unrelated > to this). > > If you want to avoid the issue until they fix the drive, you could run fast > and dangerous (mount without barriers on) or slow and safe (disable the > write cache). ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard 2008-10-29 20:52 ` Phillip O'Donnell @ 2008-10-29 22:37 ` Ric Wheeler 2008-11-07 11:33 ` Kasper Sandberg 0 siblings, 1 reply; 18+ messages in thread From: Ric Wheeler @ 2008-10-29 22:37 UTC (permalink / raw) To: Phillip O'Donnell Cc: Alan Cox, Oskar Liljeblad, Robert Hancock, linux-kernel Phillip O'Donnell wrote: > Hi Ric, > > Not too sure about that - I run with XFS, which announces that it > disables the barriers on my devices (I use LVM on top of them) but > still get the same issue... Unless I've misunderstood your comment? > > Cheers, > Phillip > XFS has a different issue with barriers that has been recently fixed. I am just going based on what I read at the Seagate customer site - it looks like the hang was during the processing of the ATA_CACHE_FLUSH_EXT command. New drives are routinely buggy to some degree, especially ones that jump up in capacity :-) Seagate has a well earned reputation for quality and I will be surprised if they don't fix this issue soon, Ric > >> I suspect that the drive is simply choking on the barrier related cache >> flushing that we do - that seemed to be the MacOS error as well. The windows >> comment suggested that windows had an hba/driver bug (most likely unrelated >> to this). >> >> If you want to avoid the issue until they fix the drive, you could run fast >> and dangerous (mount without barriers on) or slow and safe (disable the >> write cache). >> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard 2008-10-29 22:37 ` Ric Wheeler @ 2008-11-07 11:33 ` Kasper Sandberg 2008-11-07 12:27 ` Ric Wheeler 0 siblings, 1 reply; 18+ messages in thread From: Kasper Sandberg @ 2008-11-07 11:33 UTC (permalink / raw) To: Ric Wheeler Cc: Phillip O'Donnell, Alan Cox, Oskar Liljeblad, Robert Hancock, linux-kernel On Wed, 2008-10-29 at 18:37 -0400, Ric Wheeler wrote: > Phillip O'Donnell wrote: <snip> > I am just going based on what I read at the Seagate customer site - it > looks like the hang was during the processing of the ATA_CACHE_FLUSH_EXT > command. > > New drives are routinely buggy to some degree, especially ones that jump > up in capacity :-) Seagate has a well earned reputation for quality and > I will be surprised if they don't fix this issue soon, Is there any new information on this? so far the only thing i can find seems to be people reporting the issue, but no word from seagate.. > > Ric > > > > >> I suspect that the drive is simply choking on the barrier related cache > >> flushing that we do - that seemed to be the MacOS error as well. The windows > >> comment suggested that windows had an hba/driver bug (most likely unrelated > >> to this). > >> > >> If you want to avoid the issue until they fix the drive, you could run fast > >> and dangerous (mount without barriers on) or slow and safe (disable the > >> write cache). > >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard 2008-11-07 11:33 ` Kasper Sandberg @ 2008-11-07 12:27 ` Ric Wheeler 0 siblings, 0 replies; 18+ messages in thread From: Ric Wheeler @ 2008-11-07 12:27 UTC (permalink / raw) To: Kasper Sandberg Cc: Phillip O'Donnell, Alan Cox, Oskar Liljeblad, Robert Hancock, linux-kernel Kasper Sandberg wrote: > On Wed, 2008-10-29 at 18:37 -0400, Ric Wheeler wrote: > >> Phillip O'Donnell wrote: >> > <snip> > >> I am just going based on what I read at the Seagate customer site - it >> looks like the hang was during the processing of the ATA_CACHE_FLUSH_EXT >> command. >> >> New drives are routinely buggy to some degree, especially ones that jump >> up in capacity :-) Seagate has a well earned reputation for quality and >> I will be surprised if they don't fix this issue soon, >> > > Is there any new information on this? so far the only thing i can find > seems to be people reporting the issue, but no word from seagate.. > I don't actually have one of these drives, so I don't have any updates, sorry. There was a recent patch to correctly calculate sector numbers for these disks, but I am not sure that this was the same issue you saw... ric > >> Ric >> >> >>> >>> >>>> I suspect that the drive is simply choking on the barrier related cache >>>> flushing that we do - that seemed to be the MacOS error as well. The windows >>>> comment suggested that windows had an hba/driver bug (most likely unrelated >>>> to this). >>>> >>>> If you want to avoid the issue until they fix the drive, you could run fast >>>> and dangerous (mount without barriers on) or slow and safe (disable the >>>> write cache). >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-11-11 8:02 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-10-28 17:01 sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard Oskar Liljeblad 2008-10-28 17:59 ` David Rees 2008-10-28 23:25 ` Phillip O'Donnell 2008-10-28 23:52 ` [PATCH] libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127 Roland Dreier 2008-10-29 2:04 ` Phillip O'Donnell 2008-11-04 18:34 ` [PATCH] libata: Avoid overflow in ata_tf_read_block() " Roland Dreier [not found] ` <7a9b5c320811041441q78920938q58ed7ab3cbe97253@mail.gmail.com> 2008-11-04 22:44 ` Phillip O'Donnell 2008-11-11 8:02 ` Jeff Garzik 2008-10-29 13:28 ` [PATCH] libata: Avoid overflow in ata_tf_to_lba48() " Phillip O'Donnell 2008-10-31 5:45 ` Jeff Garzik [not found] <fa.01zEaARwrup2dCOTuHTYxzuS9BI@ifi.uio.no> 2008-10-28 23:19 ` sata errors with Seagate 1.5TB on AMD 780G/SB700 motherboard Robert Hancock 2008-10-29 18:58 ` Oskar Liljeblad 2008-10-29 20:17 ` Alan Cox 2008-10-29 20:23 ` Ric Wheeler 2008-10-29 20:52 ` Phillip O'Donnell 2008-10-29 22:37 ` Ric Wheeler 2008-11-07 11:33 ` Kasper Sandberg 2008-11-07 12:27 ` Ric Wheeler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).