LKML Archive on
help / color / mirror / Atom feed
From: "Ahmed El Zein" <>
To: David Chinner <>
Cc: "Ramy M. Hassan " <>,,
Subject: Re: xfs internal error on a new filesystem
Date: Thu, 15 Feb 2007 16:19:32 GMT	[thread overview]
Message-ID: <> (raw)

David Chinner <> wrote on 15 Feb 2007, 11:16 AM:
Subject: Re: xfs internal error on a new filesystem
>On Wed, Feb 14, 2007 at 10:24:27AM +0000, Ramy M. Hassan  wrote:
>> Hello,
>> We got the following xfs internal error on one of our production servers:
>> Feb 14 08:28:52 info6 kernel: [238186.676483] Filesystem "sdd8": XFS
>> internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. 
>> Caller 0xf8b906e7
>Real stack looks to be:
> xfs_trans_cancel
> xfs_mkdir
> xfs_vn_mknod
> xfs_vn_mkdir
> vfs_mkdir
> sys_mkdirat
> sys_mkdir
>We aborted a transaction for some reason. We got an error somewhere in
>a mkdir while we had a dirty transaction.  Unfortunately, this tells us
>little about the error that actually caused the shutdown.
>What is your filessytem layout? (xfs_info <mntpt>) How much memory
>do you have and were you near enomem conditions?

We have 1536 MB of ram. It is possible that at the time of the crash we
were near enomem conditions, I don;t know for sure but we have seen such
spikes on our servers.

root@info6:~# xfs_info /vol/6/
meta-data=/dev/sdd8              isize=256    agcount=16, agsize=7001584
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=112025248, imaxpct=25
         =                       sunit=16     swidth=64 blks, unwritten=0
naming   =version 2              bsize=4096  
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

>> We were able to unmount/remount the volume (didn't do xfs_repair because
>> thought it might take long time, and the server was already in production
>> at the moement)
>Risky to run a production system on a filesystem that might be corrupted.
>You risk further problems if you don't run repair....
>> The file system was created less than 48hours ago, and 370G of sensitve
>> production data was moved to the server before it xfs crash.
>So that's not a "new" filesystem at all...
By new we meant 48 hours old.

>FWIW, did you do any offline testing before you put it into production?

We did some basic testing. But as a filesystem developer, how would you
test a filesystem so that you would be comfortable with the stability of 
the filesystem and be worry free in terms of faulty hardware? 

>> System details :
>> Kernel: 2.6.18
>> Controller: 3ware 9550SX-8LP (RAID 10)
>Can you describe your dm/md volume layout?

one unit, 8HDDs, a stripe of 4 mirrors.

>> We are wondering here if this problem is an indicator to data corruption
>> disk ?
>It might be. You didn't run xfs_check or xfs_repair, so we don't know if
>there is any on disk corruption here.
>> is it really necessary to run xfs_repair ?
>If you want to know if you haven't left any landmines around for the
>filesystem to trip over again. i.e. You should run repair after any
>sort of XFS shutdown to make sure nothing is corrupted on disk.
>If nothing is corrupted on disk, then we are looking at an in-memory
we will run repair tonight.

>> Do u recommend that we switch back to reiserfs ?
>Not yet.
>> Could it be a hardware related problems  ?
>Yes. Do you have ECC memory on your server? Have you run memtest86?
>Were there any I/O errors in the log prior to the shutdown message?
Yes, we have ECC memory.
We will try to run memtest86 as soon as possible.
There were no I/O errors in the log prior to the shutdown message.

Btw, this is a vmware image. /vol/6 is an exported physical partition.

>Dave Chinner
>Principal Engineer
>SGI Australian Software Group

             reply	other threads:[~2007-02-15 16:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-15 16:19 Ahmed El Zein [this message]
2007-02-16 17:58 ` David Chinner
2007-02-18 13:56 ` Leon Kolchinsky
  -- strict thread matches above, loose matches on Subject: below --
2007-02-14 10:24 Ramy M. Hassan 
2007-02-14 10:40 ` Jan-Benedict Glaw
2007-02-14 10:48 ` Patrick Ale
2007-02-15  9:16 ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \
    --subject='Re: xfs internal error on a new filesystem' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).