LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Juan Piernas Canovas <piernas@ditec.um.es>
To: "Jörn Engel" <joern@lazybastard.org>
Cc: Sorin Faibish <sfaibish@emc.com>,
	kernel list <linux-kernel@vger.kernel.org>
Subject: Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation
Date: Sun, 25 Feb 2007 03:41:40 +0100 (CET)	[thread overview]
Message-ID: <Pine.LNX.4.61.0702250255530.18915@ditec.inf.um.es> (raw)
In-Reply-To: <20070223132645.GB11653@lazybastard.org>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4620 bytes --]

Hi Jörn,

On Fri, 23 Feb 2007, [utf-8] Jörn Engel wrote:

> On Thu, 22 February 2007 20:57:12 +0100, Juan Piernas Canovas wrote:
>>
>> I do not agree with this picture, because it does not show that all the
>> indirect blocks which point to a direct block are along with it in the
>> same segment. That figure should look like:
>>
>> Segment 1: [some data] [ DA D1' D2' ] [more data]
>> Segment 2: [some data] [ D0 D1' D2' ] [more data]
>> Segment 3: [some data] [ DB D1  D2  ] [more data]
>>
>> where D0, DA, and DB are datablocks, D1 and D2 indirect blocks which
>> point to the datablocks, and D1' and D2' obsolete copies of those
>> indirect blocks. By using this figure, is is clear that if you need to
>> move D0 to clean the segment 2, you will need only one free segment at
>> most, and not more. You will get:
>>
>> Segment 1: [some data] [ DA D1' D2' ] [more data]
>> Segment 2: [                free                ]
>> Segment 3: [some data] [ DB D1' D2' ] [more data]
>> ......
>> Segment n: [ D0 D1 D2 ] [         empty         ]
>>
>> That is, D0 needs in the new segment the same space that it needs in the
>> previous one.
>>
>> The differences are subtle but important.
>
> Ah, now I see.  Yes, that is deadlock-free.  If you are not accounting
> the bytes of used space but the number of used segments, and you count
> each partially used segment the same as a 100% used segment, there is no
> deadlock.
>
> Some people may consider this to be cheating, however.  It will cause
> more than 50% wasted space.  All obsolete copies are garbage, after all.
> With a maximum tree height of N, you can have up to (N-1) / N of your
> filesystem occupied by garbage.

I do not agree. Fortunately, the greatest part of the files are written at 
once, so what you usually have is:

Segment 1: [                  data                  ]
Segment 2: [some data] [ D0 DA DB D1 D2 ] [more data]
Segment 3: [                  data                  ]
......

On the other hand, the DualFS cleaner tries to clean several segments 
everytime it runs. Therefore, if you have the following case:

Segment 1: [some data] [ DA D1' D2' ] [more data]
Segment 2: [some data] [ D0 D1' D2' ] [more data]
Segment 3: [some data] [ DB D1' D2' ] [more data]
......

after cleaning, you can have this one:

Segment 3: [                  free                  ]
Segment 3: [                  free                  ]
Segment 3: [                  free                  ]
......
Segment i: [D0 DA DB D1 D2 ] [       more data      ]

Moreover, if the cleaner starts running when the free space drops below a 
specific threshold, it is very difficult to waste more than 50% of disk 
space, specially with meta-data (actually, I am unable to imagine that 
situation :).

> Another downside is that with large amounts of garbage between otherwise
> useful data, your disk cache hit rate goes down.  Read performance is
> suffering.  But that may be a fair tradeoff and will only show up in
> large metadata reads in the uncached (per Linux) case.  Seems fair.

Well, our experimental results say another thing. As I have said, the 
greatest part of the files are written at once, so their meta-data blocks 
are together on disk. This allows DualFS to implement an explicit 
prefetching of meta-data blocks which is quite effective, specially when 
there are several processes reading from disk at the same time.

On the other hand, DualFS also implements an on-line meta-data relocation 
mechanism which can help to improve meta-data prefetching, and garbage 
collection.

Obviously, there can be some slow-growing files that can produce some 
garbage, but they do not hurt the overall performance of the file system.

>
> Quite interesting, actually.  The costs of your design are disk space,
> depending on the amount and depth of your metadata, and metadata read
> performance.  Disk space is cheap and metadata reads tend to be slow for
> most filesystems, in comparison to data reads.  You gain faster metadata
> writes and loss of journal overhead.  I like the idea.
>

Yeah :) If you have taken a look to my presentation at LFS07, the disk 
traffic of meta-data blocks is dominated by writes.

> Jörn
>

 	Juan.
-- 
D. Juan Piernas Cánovas
Departamento de Ingeniería y Tecnología de Computadores
Facultad de Informática. Universidad de Murcia
Campus de Espinardo - 30080 Murcia (SPAIN)
Tel.: +34968367657    Fax: +34968364151
email: piernas@ditec.um.es
PGP public key:
http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index

*** Por favor, envíeme sus documentos en formato texto, HTML, PDF o PostScript :-) ***

  parent reply	other threads:[~2007-02-25  2:42 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <op.tnkdlbgsrwwil4@brcsmondepl2c.corp.emc.com>
2007-02-14 21:10 ` sfaibish
2007-02-14 21:57   ` Jan Engelhardt
2007-02-15 18:38     ` Juan Piernas Canovas
2007-02-15 20:09       ` Jörn Engel
2007-02-15 22:59         ` Juan Piernas Canovas
2007-02-16  9:13           ` Jörn Engel
2007-02-16 11:05             ` Benny Amorsen
2007-02-16 23:47             ` Bill Davidsen
2007-02-17 15:11               ` Jörn Engel
2007-02-17 18:10                 ` Bill Davidsen
2007-02-17 18:36                   ` Jörn Engel
2007-02-17 20:47                     ` Sorin Faibish
2007-02-18  5:59                       ` Jörn Engel
2007-02-18 12:46                         ` Jörn Engel
2007-02-19 23:57                         ` Juan Piernas Canovas
2007-02-20  0:10                           ` Bron Gondwana
2007-02-20  0:30                           ` Jörn Engel
2007-02-21  4:36                             ` Juan Piernas Canovas
2007-02-21 12:37                               ` Jörn Engel
2007-02-21 18:31                                 ` Juan Piernas Canovas
2007-02-21 19:25                                   ` Jörn Engel
2007-02-22  4:30                                     ` Juan Piernas Canovas
2007-02-22 16:25                                       ` Jörn Engel
2007-02-22 19:57                                         ` Juan Piernas Canovas
2007-02-23 13:26                                           ` Jörn Engel
2007-02-24 22:35                                             ` Sorin Faibish
2007-02-25  2:41                                             ` Juan Piernas Canovas [this message]
2007-02-25 12:01                                               ` Jörn Engel
2007-02-26  3:48                                                 ` Juan Piernas Canovas
2007-02-20 20:43                           ` Bill Davidsen
2007-02-15 20:38       ` Andi Kleen
2007-02-15 19:46         ` Jan Engelhardt
2007-02-16  1:43           ` sfaibish
2007-02-15 21:09         ` Juan Piernas Canovas
2007-02-15 23:57           ` Andi Kleen
2007-02-16  4:57             ` Juan Piernas Canovas
2007-02-26 11:49   ` Yakov Lerner
2007-02-26 13:08     ` Matthias Schniedermeyer
2007-02-26 13:24     ` Sorin Faibish

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.61.0702250255530.18915@ditec.inf.um.es \
    --to=piernas@ditec.um.es \
    --cc=joern@lazybastard.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sfaibish@emc.com \
    --subject='Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).