LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Juan Piernas Canovas <piernas@ditec.um.es>
To: "Jörn Engel" <joern@lazybastard.org>
Cc: Sorin Faibish <sfaibish@emc.com>,
kernel list <linux-kernel@vger.kernel.org>
Subject: Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation
Date: Sun, 25 Feb 2007 03:41:40 +0100 (CET) [thread overview]
Message-ID: <Pine.LNX.4.61.0702250255530.18915@ditec.inf.um.es> (raw)
In-Reply-To: <20070223132645.GB11653@lazybastard.org>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4620 bytes --]
Hi Jörn,
On Fri, 23 Feb 2007, [utf-8] Jörn Engel wrote:
> On Thu, 22 February 2007 20:57:12 +0100, Juan Piernas Canovas wrote:
>>
>> I do not agree with this picture, because it does not show that all the
>> indirect blocks which point to a direct block are along with it in the
>> same segment. That figure should look like:
>>
>> Segment 1: [some data] [ DA D1' D2' ] [more data]
>> Segment 2: [some data] [ D0 D1' D2' ] [more data]
>> Segment 3: [some data] [ DB D1 D2 ] [more data]
>>
>> where D0, DA, and DB are datablocks, D1 and D2 indirect blocks which
>> point to the datablocks, and D1' and D2' obsolete copies of those
>> indirect blocks. By using this figure, is is clear that if you need to
>> move D0 to clean the segment 2, you will need only one free segment at
>> most, and not more. You will get:
>>
>> Segment 1: [some data] [ DA D1' D2' ] [more data]
>> Segment 2: [ free ]
>> Segment 3: [some data] [ DB D1' D2' ] [more data]
>> ......
>> Segment n: [ D0 D1 D2 ] [ empty ]
>>
>> That is, D0 needs in the new segment the same space that it needs in the
>> previous one.
>>
>> The differences are subtle but important.
>
> Ah, now I see. Yes, that is deadlock-free. If you are not accounting
> the bytes of used space but the number of used segments, and you count
> each partially used segment the same as a 100% used segment, there is no
> deadlock.
>
> Some people may consider this to be cheating, however. It will cause
> more than 50% wasted space. All obsolete copies are garbage, after all.
> With a maximum tree height of N, you can have up to (N-1) / N of your
> filesystem occupied by garbage.
I do not agree. Fortunately, the greatest part of the files are written at
once, so what you usually have is:
Segment 1: [ data ]
Segment 2: [some data] [ D0 DA DB D1 D2 ] [more data]
Segment 3: [ data ]
......
On the other hand, the DualFS cleaner tries to clean several segments
everytime it runs. Therefore, if you have the following case:
Segment 1: [some data] [ DA D1' D2' ] [more data]
Segment 2: [some data] [ D0 D1' D2' ] [more data]
Segment 3: [some data] [ DB D1' D2' ] [more data]
......
after cleaning, you can have this one:
Segment 3: [ free ]
Segment 3: [ free ]
Segment 3: [ free ]
......
Segment i: [D0 DA DB D1 D2 ] [ more data ]
Moreover, if the cleaner starts running when the free space drops below a
specific threshold, it is very difficult to waste more than 50% of disk
space, specially with meta-data (actually, I am unable to imagine that
situation :).
> Another downside is that with large amounts of garbage between otherwise
> useful data, your disk cache hit rate goes down. Read performance is
> suffering. But that may be a fair tradeoff and will only show up in
> large metadata reads in the uncached (per Linux) case. Seems fair.
Well, our experimental results say another thing. As I have said, the
greatest part of the files are written at once, so their meta-data blocks
are together on disk. This allows DualFS to implement an explicit
prefetching of meta-data blocks which is quite effective, specially when
there are several processes reading from disk at the same time.
On the other hand, DualFS also implements an on-line meta-data relocation
mechanism which can help to improve meta-data prefetching, and garbage
collection.
Obviously, there can be some slow-growing files that can produce some
garbage, but they do not hurt the overall performance of the file system.
>
> Quite interesting, actually. The costs of your design are disk space,
> depending on the amount and depth of your metadata, and metadata read
> performance. Disk space is cheap and metadata reads tend to be slow for
> most filesystems, in comparison to data reads. You gain faster metadata
> writes and loss of journal overhead. I like the idea.
>
Yeah :) If you have taken a look to my presentation at LFS07, the disk
traffic of meta-data blocks is dominated by writes.
> Jörn
>
Juan.
--
D. Juan Piernas Cánovas
Departamento de Ingeniería y Tecnología de Computadores
Facultad de Informática. Universidad de Murcia
Campus de Espinardo - 30080 Murcia (SPAIN)
Tel.: +34968367657 Fax: +34968364151
email: piernas@ditec.um.es
PGP public key:
http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index
*** Por favor, envíeme sus documentos en formato texto, HTML, PDF o PostScript :-) ***
next prev parent reply other threads:[~2007-02-25 2:42 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <op.tnkdlbgsrwwil4@brcsmondepl2c.corp.emc.com>
2007-02-14 21:10 ` sfaibish
2007-02-14 21:57 ` Jan Engelhardt
2007-02-15 18:38 ` Juan Piernas Canovas
2007-02-15 20:09 ` Jörn Engel
2007-02-15 22:59 ` Juan Piernas Canovas
2007-02-16 9:13 ` Jörn Engel
2007-02-16 11:05 ` Benny Amorsen
2007-02-16 23:47 ` Bill Davidsen
2007-02-17 15:11 ` Jörn Engel
2007-02-17 18:10 ` Bill Davidsen
2007-02-17 18:36 ` Jörn Engel
2007-02-17 20:47 ` Sorin Faibish
2007-02-18 5:59 ` Jörn Engel
2007-02-18 12:46 ` Jörn Engel
2007-02-19 23:57 ` Juan Piernas Canovas
2007-02-20 0:10 ` Bron Gondwana
2007-02-20 0:30 ` Jörn Engel
2007-02-21 4:36 ` Juan Piernas Canovas
2007-02-21 12:37 ` Jörn Engel
2007-02-21 18:31 ` Juan Piernas Canovas
2007-02-21 19:25 ` Jörn Engel
2007-02-22 4:30 ` Juan Piernas Canovas
2007-02-22 16:25 ` Jörn Engel
2007-02-22 19:57 ` Juan Piernas Canovas
2007-02-23 13:26 ` Jörn Engel
2007-02-24 22:35 ` Sorin Faibish
2007-02-25 2:41 ` Juan Piernas Canovas [this message]
2007-02-25 12:01 ` Jörn Engel
2007-02-26 3:48 ` Juan Piernas Canovas
2007-02-20 20:43 ` Bill Davidsen
2007-02-15 20:38 ` Andi Kleen
2007-02-15 19:46 ` Jan Engelhardt
2007-02-16 1:43 ` sfaibish
2007-02-15 21:09 ` Juan Piernas Canovas
2007-02-15 23:57 ` Andi Kleen
2007-02-16 4:57 ` Juan Piernas Canovas
2007-02-26 11:49 ` Yakov Lerner
2007-02-26 13:08 ` Matthias Schniedermeyer
2007-02-26 13:24 ` Sorin Faibish
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.61.0702250255530.18915@ditec.inf.um.es \
--to=piernas@ditec.um.es \
--cc=joern@lazybastard.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sfaibish@emc.com \
--subject='Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).