LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* data from kernel.bkbits.net
@ 2003-11-24  5:19 Larry McVoy
  2003-11-24  7:34 ` H. Peter Anvin
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Larry McVoy @ 2003-11-24  5:19 UTC (permalink / raw)
  To: linux-kernel, hpa


I've been trying to get all the data off the drives on the machine which
was broken into.  I have a feeling that whoever this was was hiding stuff
in the file system because both drives will not fsck clean nor will they
completely read.

I've managed to get most of the data off but not all.  Given that I've put
about 3 days into this I'm pretty much done.  If someone else wants to look
at the drives I can make them available, let me know.  But just reading the
main drive makes the kernel (Fedora 1) kill the tar process as below (it
also managed to wack the system enough that it overwrote the NVRAM with
garbage).  It hasn't been a fun weekend.

3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x1b, unit #3.
3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x1b, unit #3.
3w-xxxx: scsi0: AEN: WARNING: ATA port timeout: Port #3.
3w-xxxx: scsi0: AEN: WARNING: ATA port timeout: Port #3.
3w-xxxx: scsi0: Reset succeeded.
3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x1b, unit #3.
3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x1b, unit #3.
3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x1b, unit #3.
3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x1b, unit #3.
3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x1b, unit #3.
3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x1b, unit #3.
3w-xxxx: scsi0: AEN: WARNING: ATA port timeout: Port #3.
3w-xxxx: scsi0: AEN: WARNING: ATA port timeout: Port #3.
3w-xxxx: scsi0: AEN: WARNING: ATA port timeout: Port #3.
3w-xxxx: scsi0: AEN: WARNING: ATA port timeout: Port #3.
3w-xxxx: scsi0: AEN: WARNING: ATA port timeout: Port #3.
3w-xxxx: scsi0: AEN: WARNING: ATA port timeout: Port #3.
3w-xxxx: scsi0: Reset succeeded.
Unable to handle kernel paging request at virtual address 4954507d
 printing eip:
c015a129
*pde = 00000000
Oops: 0000
3w-xxxx sd_mod sis900 ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables sg scsi_mod keybdev mousedev hid input ehci-hcd usb-uhci usbcore ext3 jbd  
CPU:    0
EIP:    0060:[<c015a129>]    Not tainted
EFLAGS: 00010a97

EIP is at find_inode [kernel] 0x19 (2.4.22-1.2115.nptl)
eax: 00000000   ebx: 49545055   ecx: 0000000f   edx: c1640000
esi: 00000000   edi: c1655868   ebp: 0027ace1   esp: cea97ea4
ds: 0068   es: 0068   ss: 0068
Process tar (pid: 2816, stackpage=cea97000)
Stack: db99a05c 00000000 0000002a dacd43c0 c1655868 0027ace1 df9db800 c015a452 
       df9db800 0027ace1 c1655868 00000000 00000000 dacd43c0 dd476d40 df9db800 
       dd476d40 c0173669 df9db800 0027ace1 00000000 00000000 fffffff4 dacd442c 
Call Trace:   [<c015a452>] iget4_locked [kernel] 0x52 (0xcea97ec0)
[<c0173669>] ext2_lookup [kernel] 0x69 (0xcea97ee8)
[<c014f197>] real_lookup [kernel] 0xc7 (0xcea97f08)
[<c014f88a>] link_path_walk [kernel] 0x59a (0xcea97f24)
[<c014fb67>] path_lookup [kernel] 0x37 (0xcea97f60)
[<c014fdf9>] __user_walk [kernel] 0x49 (0xcea97f70)
[<c014bddf>] sys_lstat64 [kernel] 0x1f (0xcea97f8c)
[<c01099df>] system_call [kernel] 0x33 (0xcea97fc0)


Code: 39 6b 28 89 de 75 f1 8b 44 24 20 39 83 a0 00 00 00 75 e5 8b 
 
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24  5:19 data from kernel.bkbits.net Larry McVoy
@ 2003-11-24  7:34 ` H. Peter Anvin
  2003-11-24 14:57   ` Larry McVoy
  2003-11-24 15:43   ` Ricky Beam
  2003-11-24  9:48 ` Willy Tarreau
  2003-11-25 15:00 ` Ben Collins
  2 siblings, 2 replies; 18+ messages in thread
From: H. Peter Anvin @ 2003-11-24  7:34 UTC (permalink / raw)
  To: Larry McVoy; +Cc: linux-kernel

Larry McVoy wrote:
> I've been trying to get all the data off the drives on the machine which
> was broken into.  I have a feeling that whoever this was was hiding stuff
> in the file system because both drives will not fsck clean nor will they
> completely read.
> 
> I've managed to get most of the data off but not all.  Given that I've put
> about 3 days into this I'm pretty much done.  If someone else wants to look
> at the drives I can make them available, let me know.  But just reading the
> main drive makes the kernel (Fedora 1) kill the tar process as below (it
> also managed to wack the system enough that it overwrote the NVRAM with
> garbage).  It hasn't been a fun weekend.
> 

Looks more like a 3Ware driver bug to me.  Hard to say for sure, though.

	-hpa


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24  5:19 data from kernel.bkbits.net Larry McVoy
  2003-11-24  7:34 ` H. Peter Anvin
@ 2003-11-24  9:48 ` Willy Tarreau
  2003-11-25 15:00 ` Ben Collins
  2 siblings, 0 replies; 18+ messages in thread
From: Willy Tarreau @ 2003-11-24  9:48 UTC (permalink / raw)
  To: Larry McVoy, linux-kernel, hpa

On Sun, Nov 23, 2003 at 09:19:10PM -0800, Larry McVoy wrote:

> Unable to handle kernel paging request at virtual address 4954507d
> eax: 00000000   ebx: 49545055   ecx: 0000000f   edx: c1640000

Here, EBX is pure text : "UPTI", as in "CORRUPTION" or "INTERRUPTIBLE". May be
there has been some memory corruption somewhere in a linked list ?

Cheers,
Willy


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24  7:34 ` H. Peter Anvin
@ 2003-11-24 14:57   ` Larry McVoy
  2003-11-24 15:43   ` Ricky Beam
  1 sibling, 0 replies; 18+ messages in thread
From: Larry McVoy @ 2003-11-24 14:57 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Larry McVoy, linux-kernel

On Sun, Nov 23, 2003 at 11:34:35PM -0800, H. Peter Anvin wrote:
> Looks more like a 3Ware driver bug to me.  Hard to say for sure, though.

I've used both a 3ware and a Highpoint and onboard AMD IDE interfaces and
gotten problems (albeit different problems) each time.  
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24  7:34 ` H. Peter Anvin
  2003-11-24 14:57   ` Larry McVoy
@ 2003-11-24 15:43   ` Ricky Beam
  2003-11-24 15:50     ` Larry McVoy
  1 sibling, 1 reply; 18+ messages in thread
From: Ricky Beam @ 2003-11-24 15:43 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Larry McVoy, linux-kernel

On Sun, 23 Nov 2003, H. Peter Anvin wrote:
>Larry McVoy wrote:
>> ...
>
>Looks more like a 3Ware driver bug to me.  Hard to say for sure, though.

Or simply a dead drive. (or a "dirty" cable -- try re-plugging that one.)
I'm guessing the machine was powered off after being hacked and now some
of the drives don't work so well anymore. (such is the way of things with
cheap IDE drives -- and even cheap SCSI ones too. all too often, they don't
even spin back up.)

--Ricky



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 15:43   ` Ricky Beam
@ 2003-11-24 15:50     ` Larry McVoy
  2003-11-24 19:17       ` Ricky Beam
  0 siblings, 1 reply; 18+ messages in thread
From: Larry McVoy @ 2003-11-24 15:50 UTC (permalink / raw)
  To: Ricky Beam; +Cc: H. Peter Anvin, Larry McVoy, linux-kernel

On Mon, Nov 24, 2003 at 10:43:34AM -0500, Ricky Beam wrote:
> On Sun, 23 Nov 2003, H. Peter Anvin wrote:
> >Larry McVoy wrote:
> >> ...
> >
> >Looks more like a 3Ware driver bug to me.  Hard to say for sure, though.
> 
> Or simply a dead drive. (or a "dirty" cable -- try re-plugging that one.)

Thanks for the advice, but this drive has been plugged into 3 different
controllers on different machines using different cables.  Both this drive
and the backup drive refused to fsck clean (they had a lot of errors with
directory corruption problems).  

It is not a dirty cable or a bad controller, I've been building and 
debugging PC hardware for years and I know how to track down obvious
problems.

Sorry to be short but I already said that I'd eliminated this source of
error.  What did you think I was doing all weekend?
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 15:50     ` Larry McVoy
@ 2003-11-24 19:17       ` Ricky Beam
  2003-11-24 19:24         ` Larry McVoy
  0 siblings, 1 reply; 18+ messages in thread
From: Ricky Beam @ 2003-11-24 19:17 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Linux Kernel Mail List

On Mon, 24 Nov 2003, Larry McVoy wrote:
>Sorry to be short but I already said that I'd eliminated this source of
>error.  What did you think I was doing all weekend?

Let me be equally short.  Your original message gave no details of what
debugging steps had been taken. (I can assume you would know what you're
doing, but frankly, I could be wrong.)  You venture a guess that the
system had been h4x0r3d in some inventive way to prevent your attempts
to recover data and proceed to paste error messages from the 3ware
driver that indicate a problem with the hardware (either driver bug,
cabling, controller, or channel on that controller) including the
drive itself.

Please do not attribute to hackers what is simply a half dead drive.  So,
was the machine powered down for an extended period as I aluded? (to
preserve the machine until someone had time to look at it.)

--Ricky



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 19:17       ` Ricky Beam
@ 2003-11-24 19:24         ` Larry McVoy
  2003-11-24 19:35           ` Jamie Lokier
                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Larry McVoy @ 2003-11-24 19:24 UTC (permalink / raw)
  To: Ricky Beam; +Cc: Larry McVoy, Linux Kernel Mail List

On Mon, Nov 24, 2003 at 02:17:44PM -0500, Ricky Beam wrote:
> On Mon, 24 Nov 2003, Larry McVoy wrote:
> >Sorry to be short but I already said that I'd eliminated this source of
> >error.  What did you think I was doing all weekend?
> 
> Let me be equally short.  Your original message gave no details of what
> debugging steps had been taken. (I can assume you would know what you're
> doing, but frankly, I could be wrong.)  You venture a guess that the
> system had been h4x0r3d in some inventive way to prevent your attempts
> to recover data and proceed to paste error messages from the 3ware
> driver that indicate a problem with the hardware (either driver bug,
> cabling, controller, or channel on that controller) including the
> drive itself.
> 
> Please do not attribute to hackers what is simply a half dead drive.  So,
> was the machine powered down for an extended period as I aluded? (to
> preserve the machine until someone had time to look at it.)

As I said, *both* drives have extensive file system problems.  No, the 
machine was not powered down for a long time, and no, neither of these
drives are old, and no, they are not from the same factory batch (they
aren't even the same vendor, one is a Maxtor and the other is a Seagate),
and yes, I of course tried different cable/controller/machine combos.

Any other questions?
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 19:24         ` Larry McVoy
@ 2003-11-24 19:35           ` Jamie Lokier
  2003-11-24 20:05           ` Richard B. Johnson
  2003-11-24 20:09           ` Ricky Beam
  2 siblings, 0 replies; 18+ messages in thread
From: Jamie Lokier @ 2003-11-24 19:35 UTC (permalink / raw)
  To: Larry McVoy, Ricky Beam, Linux Kernel Mail List

Larry McVoy wrote:
> Any other questions?

At risk of sounding like second level support,

  1. Are you able to copy the raw partitions (e.g. using dd) to
     another disk or system?

  2. Do you see similar error messages when copying the raw partitions?

  3. When you mount the _copies_ of the partitions, do you see similar
     error messages?

That'll differentiate whether it's a pure disk/driver problem or
something triggered by a filesystem problem.  As a bonus, if the disks
are both dying (maybe you had a lightning strike), then you'll have
the data copied somewhere safe.

-- Jamie

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 19:24         ` Larry McVoy
  2003-11-24 19:35           ` Jamie Lokier
@ 2003-11-24 20:05           ` Richard B. Johnson
  2003-11-24 20:33             ` Theodore Ts'o
  2003-11-24 20:09           ` Ricky Beam
  2 siblings, 1 reply; 18+ messages in thread
From: Richard B. Johnson @ 2003-11-24 20:05 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Ricky Beam, Linux Kernel Mail List

On Mon, 24 Nov 2003, Larry McVoy wrote:

> On Mon, Nov 24, 2003 at 02:17:44PM -0500, Ricky Beam wrote:
> > On Mon, 24 Nov 2003, Larry McVoy wrote:
> > >Sorry to be short but I already said that I'd eliminated this source of
> > >error.  What did you think I was doing all weekend?
> >
> > Let me be equally short.  Your original message gave no details of what
> > debugging steps had been taken. (I can assume you would know what you're
> > doing, but frankly, I could be wrong.)  You venture a guess that the
> > system had been h4x0r3d in some inventive way to prevent your attempts
> > to recover data and proceed to paste error messages from the 3ware
> > driver that indicate a problem with the hardware (either driver bug,
> > cabling, controller, or channel on that controller) including the
> > drive itself.
> >
> > Please do not attribute to hackers what is simply a half dead drive.  So,
> > was the machine powered down for an extended period as I aluded? (to
> > preserve the machine until someone had time to look at it.)
>
> As I said, *both* drives have extensive file system problems.  No, the
> machine was not powered down for a long time, and no, neither of these
> drives are old, and no, they are not from the same factory batch (they
> aren't even the same vendor, one is a Maxtor and the other is a Seagate),
> and yes, I of course tried different cable/controller/machine combos.
>
> Any other questions?
> --
> ---
> Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm
> -

Attempt to copy the raw drive to /dev/null.  If that works, the
drive is likely okay, but the fs got fsucked up by software. You
might be able to mount the drive on a 2.4.22 machine if you have a
spare. Then you might be able to selectively copy important stuff
to another drive, after which you can make a new file-system as
a "repair".

If you can't copy the raw drive, yet you booted on a system that
uses the same driver(s) to access the disk, then you probably
have a bad drive.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 19:24         ` Larry McVoy
  2003-11-24 19:35           ` Jamie Lokier
  2003-11-24 20:05           ` Richard B. Johnson
@ 2003-11-24 20:09           ` Ricky Beam
  2 siblings, 0 replies; 18+ messages in thread
From: Ricky Beam @ 2003-11-24 20:09 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Linux Kernel Mail List

On Mon, 24 Nov 2003, Larry McVoy wrote:
...
>Any other questions?

Have you ran the factory diag utility(s) to ensure the drives are "ok"?
(not that those tests are 100% as I own drives that pass those tests that
 are, in fact, bad.)  Have you made a complete bit image clone of the
drives (ala 'dd')? (how big are they?)

And there was a recent thread on linux-raid where someone had a drive
with bad internal cache memory -- a single bit was always '1'.  That one
gets an "I've never seen it do that before."

--Ricky



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 20:05           ` Richard B. Johnson
@ 2003-11-24 20:33             ` Theodore Ts'o
  2003-11-24 21:34               ` Richard B. Johnson
  0 siblings, 1 reply; 18+ messages in thread
From: Theodore Ts'o @ 2003-11-24 20:33 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Larry McVoy, Ricky Beam, Linux Kernel Mail List

On Mon, Nov 24, 2003 at 03:05:24PM -0500, Richard B. Johnson wrote:
> Attempt to copy the raw drive to /dev/null.  If that works, the
> drive is likely okay, but the fs got fsucked up by software. You
> might be able to mount the drive on a 2.4.22 machine if you have a
> spare. Then you might be able to selectively copy important stuff
> to another drive, after which you can make a new file-system as
> a "repair".

The error messages Larry reported were obviously reported by the
hardware, and were **not** filesystem errors. 

						- Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 20:33             ` Theodore Ts'o
@ 2003-11-24 21:34               ` Richard B. Johnson
  2003-11-24 22:24                 ` Larry McVoy
  0 siblings, 1 reply; 18+ messages in thread
From: Richard B. Johnson @ 2003-11-24 21:34 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Larry McVoy, Ricky Beam, Linux Kernel Mail List

On Mon, 24 Nov 2003, Theodore Ts'o wrote:

> On Mon, Nov 24, 2003 at 03:05:24PM -0500, Richard B. Johnson wrote:
> > Attempt to copy the raw drive to /dev/null.  If that works, the
> > drive is likely okay, but the fs got fsucked up by software. You
> > might be able to mount the drive on a 2.4.22 machine if you have a
> > spare. Then you might be able to selectively copy important stuff
> > to another drive, after which you can make a new file-system as
> > a "repair".
>
> The error messages Larry reported were obviously reported by the
> hardware, and were **not** filesystem errors.
>
> 						- Ted

Yes but an attempt to read beyond the limits of the physical
drive will provide you with a lot of **interesting** hardware
errors. This happens if the file-system gets corrupt.

And I'm not implying that the software screwed up either. The
software doesn't know if an "extra" bit was set during a write
to the drive. These things happen asd a result of bad RAM, bad
DMA, and other hardware-corrupting things....

So, the first check is to see if the drive can be read without
any reference to its contents. Since Read/Write is usually the
software implimentation detail of a direction bit, if you can
read, you can usually write.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 21:34               ` Richard B. Johnson
@ 2003-11-24 22:24                 ` Larry McVoy
  2003-11-24 22:38                   ` Jamie Lokier
  2003-11-25  0:30                   ` Theodore Ts'o
  0 siblings, 2 replies; 18+ messages in thread
From: Larry McVoy @ 2003-11-24 22:24 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Theodore Ts'o, Larry McVoy, Ricky Beam, Linux Kernel Mail List

On Mon, Nov 24, 2003 at 04:34:43PM -0500, Richard B. Johnson wrote:
> On Mon, 24 Nov 2003, Theodore Ts'o wrote:
> 
> > On Mon, Nov 24, 2003 at 03:05:24PM -0500, Richard B. Johnson wrote:
> > > Attempt to copy the raw drive to /dev/null.  If that works, the
> > > drive is likely okay, but the fs got fsucked up by software. You
> > > might be able to mount the drive on a 2.4.22 machine if you have a
> > > spare. Then you might be able to selectively copy important stuff
> > > to another drive, after which you can make a new file-system as
> > > a "repair".
> >
> > The error messages Larry reported were obviously reported by the
> > hardware, and were **not** filesystem errors.
> >
> > 						- Ted
> 
> Yes but an attempt to read beyond the limits of the physical
> drive will provide you with a lot of **interesting** hardware
> errors. This happens if the file-system gets corrupt.

Yeah, I think Richard may be right.  Anyway, the drive sort of reads
from the raw partition.  It gets a IDE reset and then it reads.  I can
read it a second time with no reset.  Haven't tried a reboot between
reads, hang on, yeah, a reboot brings the errors back.

But, fscking the dd-ed image gets me less errors so I'm trying that 
route to get the data back.
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 22:24                 ` Larry McVoy
@ 2003-11-24 22:38                   ` Jamie Lokier
  2003-11-25  0:30                   ` Theodore Ts'o
  1 sibling, 0 replies; 18+ messages in thread
From: Jamie Lokier @ 2003-11-24 22:38 UTC (permalink / raw)
  To: linux-kernel

Larry McVoy wrote:
> But, fscking the dd-ed image gets me less errors so I'm trying that 
                                       ^^^^
Fewer!

have nice day :)
-- Jamie

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24 22:24                 ` Larry McVoy
  2003-11-24 22:38                   ` Jamie Lokier
@ 2003-11-25  0:30                   ` Theodore Ts'o
  1 sibling, 0 replies; 18+ messages in thread
From: Theodore Ts'o @ 2003-11-25  0:30 UTC (permalink / raw)
  To: Larry McVoy, Richard B. Johnson, Larry McVoy, Ricky Beam,
	Linux Kernel Mail List

On Mon, Nov 24, 2003 at 02:24:13PM -0800, Larry McVoy wrote:
> > Yes but an attempt to read beyond the limits of the physical
> > drive will provide you with a lot of **interesting** hardware
> > errors. This happens if the file-system gets corrupt.

Sure, but not that those kinds of errors.  You'll see errors like this
instead:

 kernel: attempt to access beyond end of device
 kernel: 08:05: rw=0, want=198500353, limit=5779456
 kernel: attempt to access beyond end of device
 kernel: 08:05: rw=0, want=4294934529, limit=5779456 

ATA device timeouts, which is what Larry reported, are not caused by
attempting to read beyond the limits of the physical device.

> Yeah, I think Richard may be right.  Anyway, the drive sort of reads
> from the raw partition.  It gets a IDE reset and then it reads.  I can
> read it a second time with no reset.  Haven't tried a reboot between
> reads, hang on, yeah, a reboot brings the errors back.

It really, really sounds like the disk is pooched.  I don't know if it
was bad luck, cooincidence, or the fact that it was powered down for a
while.  But I'm guessing that it's taking a long time for disk to read
a sector, which is causing the disk driver to timeout and reset the
bus, but then the sector is first cached in the IDE disk cache (where
it can be read quickly) and then it ends up getting cached in the
system memory.  That would explain why a reboot brings the errors backed.

> But, fscking the dd-ed image gets me less errors so I'm trying that 
> route to get the data back.

If using the dd'ed image is giving you less errors, combined with your
other description, it's causing me to be really suspicious about the
hard drive.  If you're really brave, or foolish, (or have already
backed up the image), you might try doing a non-destructive read/write
test using the badblocks(8) command.  I'm pretty confident that it
will turn up all sorts of problems, though, since the low-level device
driver errors you were describing really are not consistent with
filesystem corruption, but with a hardware failure of some kind.

						- Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: data from kernel.bkbits.net
  2003-11-24  5:19 data from kernel.bkbits.net Larry McVoy
  2003-11-24  7:34 ` H. Peter Anvin
  2003-11-24  9:48 ` Willy Tarreau
@ 2003-11-25 15:00 ` Ben Collins
  2 siblings, 0 replies; 18+ messages in thread
From: Ben Collins @ 2003-11-25 15:00 UTC (permalink / raw)
  To: Larry McVoy; +Cc: linux-kernel, hpa

On Sun, Nov 23, 2003 at 09:19:10PM -0800, Larry McVoy wrote:
> 
> I've been trying to get all the data off the drives on the machine which
> was broken into.  I have a feeling that whoever this was was hiding stuff
> in the file system because both drives will not fsck clean nor will they
> completely read.
> 
> I've managed to get most of the data off but not all.  Given that I've put
> about 3 days into this I'm pretty much done.  If someone else wants to look
> at the drives I can make them available, let me know.  But just reading the
> main drive makes the kernel (Fedora 1) kill the tar process as below (it
> also managed to wack the system enough that it overwrote the NVRAM with
> garbage).  It hasn't been a fun weekend.

FYI, you can ignore the large SVN repos. They are easily rebuilt. I just
need the bkcvs2svn script in my home directory.

-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: data from kernel.bkbits.net
@ 2003-11-24 19:14 Adam Radford
  0 siblings, 0 replies; 18+ messages in thread
From: Adam Radford @ 2003-11-24 19:14 UTC (permalink / raw)
  To: 'H. Peter Anvin', Larry McVoy; +Cc: linux-kernel

This looks like glitchy power cables, drive cable or dying drive to me.

-Adam

-----Original Message-----
From: H. Peter Anvin [mailto:hpa@zytor.com]
Sent: Sunday, November 23, 2003 11:35 PM
To: Larry McVoy
Cc: linux-kernel@vger.kernel.org
Subject: Re: data from kernel.bkbits.net


Larry McVoy wrote:
> I've been trying to get all the data off the drives on the machine which
> was broken into.  I have a feeling that whoever this was was hiding stuff
> in the file system because both drives will not fsck clean nor will they
> completely read.
> 
> I've managed to get most of the data off but not all.  Given that I've put
> about 3 days into this I'm pretty much done.  If someone else wants to
look
> at the drives I can make them available, let me know.  But just reading
the
> main drive makes the kernel (Fedora 1) kill the tar process as below (it
> also managed to wack the system enough that it overwrote the NVRAM with
> garbage).  It hasn't been a fun weekend.
> 

Looks more like a 3Ware driver bug to me.  Hard to say for sure, though.

	-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


DISCLAIMER: The information contained in this electronic mail transmission
is intended by 3Ware for the use of the named individual or entity to which
it is directed and may contain information that is confidential or
privileged and should not be disseminated without prior approval from 3ware 



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-11-25 15:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-24  5:19 data from kernel.bkbits.net Larry McVoy
2003-11-24  7:34 ` H. Peter Anvin
2003-11-24 14:57   ` Larry McVoy
2003-11-24 15:43   ` Ricky Beam
2003-11-24 15:50     ` Larry McVoy
2003-11-24 19:17       ` Ricky Beam
2003-11-24 19:24         ` Larry McVoy
2003-11-24 19:35           ` Jamie Lokier
2003-11-24 20:05           ` Richard B. Johnson
2003-11-24 20:33             ` Theodore Ts'o
2003-11-24 21:34               ` Richard B. Johnson
2003-11-24 22:24                 ` Larry McVoy
2003-11-24 22:38                   ` Jamie Lokier
2003-11-25  0:30                   ` Theodore Ts'o
2003-11-24 20:09           ` Ricky Beam
2003-11-24  9:48 ` Willy Tarreau
2003-11-25 15:00 ` Ben Collins
2003-11-24 19:14 Adam Radford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).