LKML Archive on
help / color / mirror / Atom feed
From: "Jörn Engel" <>
To: Sorin Faibish <>
Cc: Bill Davidsen <>,
	Juan Piernas Canovas <>,
	Jan Engelhardt <>,
	kernel list <>
Subject: Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation
Date: Sun, 18 Feb 2007 05:59:37 +0000	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <op.tnwuonh3rwwil4@muki-1mj5c5mxes>

On Sat, 17 February 2007 15:47:01 -0500, Sorin Faibish wrote:
> DualFS can probably get around this corner case as it is up to the user
> to select the size of the MD device size. If you want to prevent this
> corner case you can always use a device bigger than 10% of the data device
> which is exagerate for any FS assuming that the directory files are so
> large (this is when you have billions of files with long names).
> In general the problem you mention is mainly due to the data blocks
> filling the file system. In DualFS case you have the choice of selecting
> different sizes for the MD and Data volume. When Data volume gets full
> the GC will have a problem but the MD device will not have a problem.
> It is my understanding that most of the GC problem you mention is
> due to the filling of the FS with data and the result is a MD operation
> being disrupted by the filling of the FS with data blocks. As about the
> performance impact on solving this problem, as you mentioned all
> journal FSs will have this problem, I am sure that DualFS performance
> impact will be less than others at least due to using only one MD
> write instead of 2.

You seem to make the usual mistakes when people start to think about
this problem.  But I could misinterpret you, so let me paraphrase your
mail in questions and answer what I believe you said.

Q: Are journaling filesystems identical to log-structured filesystems?

Not quite.  Journaling filesystems usually have a very small journal (or
log, same thing) and only store the information necessary for atomic
transactions in the journal.  Not sure what a "journal FS" is, but the
name seems closer to a journaling filesystem.

Q: DualFS seperates Data and Metadata.  Does that make a difference?

Not really.  What I called "data" in my previous mail is a
log-structured filesystems view of data.  DualFS stored file content
seperately, so from an lfs view, that doesn't even exist.  But directory
content exists and behaves just like file content wrt. the deadlock
problem.  Any data or metadata that cannot be GC'd by simply copying but
requires writing further information like indirect blocks, B-Tree nodes,
etc. will cause the problem.

Q: If the user simply reserves some extra space, does the problem go

Definitely not.  It will be harder to hit, but a rare deadlock is still
a deadlock.  Again, this is only concerned with the log-structured part
of DualFS, so we can ignore the Data volume.

When data is spread perfectly across all segments, the best segment one
can pick for GC is just as bad as the worst.  So let us take some
examples.  If 50% of the lfs is free, you can pick a 50% segment for GC.
Writing every single block in it may require writing one additional
indirect block, so GC is required to write out a 100% segment.  It
doesn't make any progress at 50% (in a worst case scenario) and could
deadlock if less than 50% were free.

If, however, GC has to write out a singly and a doubly indirect block,
67% of the lfs need to be free.  In general, if the maximum height of
your tree is N, you need (N-1)/N * 100% free space.  Most people refer
to that as "too much".

If you have less free space, the filesystem will work just fine "most of
the time".  That is nice and cool, but it won't help your rare user that
happens to hit the rare deadlock.  Any lfs needs a strategy to prevent
this deadlock for good, not just make it mildly unlikely.


"Error protection by error detection and correction."
-- from a university class

  reply	other threads:[~2007-02-18  6:11 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <>
2007-02-14 21:10 ` sfaibish
2007-02-14 21:57   ` Jan Engelhardt
2007-02-15 18:38     ` Juan Piernas Canovas
2007-02-15 20:09       ` Jörn Engel
2007-02-15 22:59         ` Juan Piernas Canovas
2007-02-16  9:13           ` Jörn Engel
2007-02-16 11:05             ` Benny Amorsen
2007-02-16 23:47             ` Bill Davidsen
2007-02-17 15:11               ` Jörn Engel
2007-02-17 18:10                 ` Bill Davidsen
2007-02-17 18:36                   ` Jörn Engel
2007-02-17 20:47                     ` Sorin Faibish
2007-02-18  5:59                       ` Jörn Engel [this message]
2007-02-18 12:46                         ` Jörn Engel
2007-02-19 23:57                         ` Juan Piernas Canovas
2007-02-20  0:10                           ` Bron Gondwana
2007-02-20  0:30                           ` Jörn Engel
2007-02-21  4:36                             ` Juan Piernas Canovas
2007-02-21 12:37                               ` Jörn Engel
2007-02-21 18:31                                 ` Juan Piernas Canovas
2007-02-21 19:25                                   ` Jörn Engel
2007-02-22  4:30                                     ` Juan Piernas Canovas
2007-02-22 16:25                                       ` Jörn Engel
2007-02-22 19:57                                         ` Juan Piernas Canovas
2007-02-23 13:26                                           ` Jörn Engel
2007-02-24 22:35                                             ` Sorin Faibish
2007-02-25  2:41                                             ` Juan Piernas Canovas
2007-02-25 12:01                                               ` Jörn Engel
2007-02-26  3:48                                                 ` Juan Piernas Canovas
2007-02-20 20:43                           ` Bill Davidsen
2007-02-15 20:38       ` Andi Kleen
2007-02-15 19:46         ` Jan Engelhardt
2007-02-16  1:43           ` sfaibish
2007-02-15 21:09         ` Juan Piernas Canovas
2007-02-15 23:57           ` Andi Kleen
2007-02-16  4:57             ` Juan Piernas Canovas
2007-02-26 11:49   ` Yakov Lerner
2007-02-26 13:08     ` Matthias Schniedermeyer
2007-02-26 13:24     ` Sorin Faibish

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \
    --subject='Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).