LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Theodore Tso <tytso@MIT.EDU>
To: Al Boldi <a1426z@gawab.com>
Cc: Valerie Henson <val.henson@gmail.com>,
	Rik van Riel <riel@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFD] Incremental fsck
Date: Sat, 12 Jan 2008 09:51:40 -0500	[thread overview]
Message-ID: <20080112145140.GB6751@mit.edu> (raw)
In-Reply-To: <200801091452.14890.a1426z@gawab.com>

On Wed, Jan 09, 2008 at 02:52:14PM +0300, Al Boldi wrote:
> 
> Ok, but let's look at this a bit more opportunistic / optimistic.
> 
> Even after a black-out shutdown, the corruption is pretty minimal, using 
> ext3fs at least.
>

After a unclean shutdown, assuming you have decent hardware that
doesn't lie about when blocks hit iron oxide, you shouldn't have any
corruption at all.  If you have crappy hardware, then all bets are off....

> So let's take advantage of this fact and do an optimistic fsck, to
> assure integrity per-dir, and assume no external corruption.  Then
> we release this checked dir to the wild (optionally ro), and check
> the next.  Once we find external inconsistencies we either fix it
> unconditionally, based on some preconfigured actions, or present the
> user with options.

So what can you check?  The *only* thing you can check is whether or
not the directory syntax looks sane, whether the inode structure looks
sane, and whether or not the blocks reported as belong to an inode
looks sane.

What is very hard to check is whether or not the link count on the
inode is correct.  Suppose the link count is 1, but there are actually
two directory entries pointing at it.  Now when someone unlinks the
file through one of the directory hard entries, the link count will go
to zero, and the blocks will start to get reused, even though the
inode is still accessible via another pathname.  Oops.  Data Loss.

This is why doing incremental, on-line fsck'ing is *hard*.  You're not
going to find this while doing each directory one at a time, and if
the filesystem is changing out from under you, it gets worse.  And
it's not just the hard link count.  There is a similar issue with the
block allocation bitmap.  Detecting the case where two files are
simultaneously can't be done if you are doing it incrementally, and if
the filesystem is changing out from under you, it's impossible, unless
you also have the filesystem telling you every single change while it
is happening, and you keep an insane amount of bookkeeping.

One that you *might* be able to do, is to mount a filesystem readonly,
check it in the background while you allow users to access it
read-only.  There are a few caveats, however ---- (1) some filesystem
errors may cause the data to be corrupt, or in the worst case, could
cause the system to panic (that's would arguably be a
filesystem/kernel bug, but we've not necessarily done as much testing
here as we should.)  (2) if there were any filesystem errors found,
you would beed to completely unmount the filesystem to flush the inode
cache and remount it before it would be safe to remount the filesystem
read/write.  You can't just do a "mount -o remount" if the filesystem
was modified under the OS's nose.

> All this could be per-dir or using some form of on-the-fly file-block-zoning.
> 
> And there probably is a lot more to it, but it should conceptually be 
> possible, with more thoughts though...

Many things are possible, in the NASA sense of "with enough thrust,
anything will fly".  Whether or not it is *useful* and *worthwhile*
are of course different questions!  :-)

						- Ted

  parent reply	other threads:[~2008-01-12 14:52 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-08 21:22 Al Boldi
2008-01-08 21:31 ` Alan
2008-01-09  9:16   ` Andreas Dilger
2008-01-12 23:55     ` Daniel Phillips
2008-01-08 21:41 ` Rik van Riel
2008-01-09  4:40   ` Al Boldi
2008-01-09  7:45     ` Valerie Henson
2008-01-09 11:52       ` Al Boldi
2008-01-09 14:44         ` Rik van Riel
2008-01-10 13:26           ` Al Boldi
2008-01-12 14:51         ` Theodore Tso [this message]
2008-01-13 11:05           ` Al Boldi
2008-01-13 17:19           ` Pavel Machek
2008-01-13 17:41             ` Alan Cox
2008-01-15 20:16               ` [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck) Pavel Machek
2008-01-15 21:43                 ` David Chinner
2008-01-15 23:07                   ` Pavel Machek
2008-01-15 23:44                     ` Daniel Phillips
2008-01-16  0:15                       ` Alan Cox
2008-01-16  1:24                         ` Daniel Phillips
2008-01-16  1:36                           ` Chris Mason
2008-01-17 20:54                             ` Pavel Machek
2008-01-16 19:06                           ` Bryan Henderson
2008-01-16 20:05                             ` Alan Cox
2008-01-17  2:02                             ` Daniel Phillips
2008-01-17 21:37                               ` Bryan Henderson
2008-01-17 22:45                               ` Theodore Tso
2008-01-17 22:58                                 ` Alan Cox
2008-01-17 23:18                                 ` Ric Wheeler
2008-01-18  0:31                                   ` Bryan Henderson
2008-01-18 14:23                                     ` Theodore Tso
2008-01-18 15:16                                       ` [Patch] document ext3 requirements (was Re: [RFD] Incrementalfsck) linux-os (Dick Johnson)
2008-01-19 14:53                                         ` Pavel Machek
2008-01-18 15:26                                       ` [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck) Ric Wheeler
2008-01-18 20:34                                         ` Jeff Garzik
2008-01-18 22:35                                           ` Bryan Henderson
2008-01-18 15:08                                     ` H. Peter Anvin
2008-01-18 17:43                                       ` Bryan Henderson
2008-01-16 21:28                         ` Eric Sandeen
2008-01-16 11:51                       ` Pavel Machek
2008-01-16 12:20                         ` Valdis.Kletnieks
2008-01-19 14:51                           ` Pavel Machek
2008-01-16 16:38                   ` Christoph Hellwig
2008-01-16  1:44                 ` Daniel Phillips
2008-01-16  3:05                   ` Rik van Riel
2008-01-17  7:38                     ` Andreas Dilger
2008-01-16 11:49                   ` Pavel Machek
2008-01-16 20:52                     ` Valerie Henson
2008-01-17 12:29                   ` Szabolcs Szakacsits
2008-01-17 22:51                     ` Daniel Phillips
2008-01-15  1:04             ` [RFD] Incremental fsck Ric Wheeler
2008-01-14  0:22           ` Daniel Phillips
2008-01-09  8:04     ` Valdis.Kletnieks
     [not found] <9JubJ-5mo-57@gated-at.bofh.it>
     [not found] ` <9JB3e-85S-13@gated-at.bofh.it>
     [not found]   ` <9JDRm-4bR-1@gated-at.bofh.it>
     [not found]     ` <9JHLl-2dL-1@gated-at.bofh.it>
2008-01-11 14:20       ` Bodo Eggert
2008-01-12 10:20         ` Al Boldi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080112145140.GB6751@mit.edu \
    --to=tytso@mit.edu \
    --cc=a1426z@gawab.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@redhat.com \
    --cc=val.henson@gmail.com \
    --subject='Re: [RFD] Incremental fsck' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).