LKML Archive on
help / color / mirror / Atom feed
From: "Michael Rubin" <>
To: "Fengguang Wu" <>
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
Date: Thu, 17 Jan 2008 13:07:05 -0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Jan 17, 2008 1:41 AM, Fengguang Wu <> wrote:
> On Tue, Jan 15, 2008 at 12:09:21AM -0800, Michael Rubin wrote:
> The main benefit of rbtree is possibly better support of future policies.
> Can you demonstrate an example?

These are ill-formed thoughts as of now on my end but the idea was
that keeping one tree sorted via a scheme might be simpler than
multiple list_heads.

> Bugs can only be avoided by good understanding of all possible cases.)

I take the above statement as a tautology.  And am trying my best to do so. :-)

> The most tricky writeback issues could be starvation prevention
> between

>         - small/large files
>         - new/old files
>         - superblocks

So I have written tests and believe I have covered these issues. If
you are concerned in specific on any and have a test case please let
me know.

> Some kind of limit should be applied for each. They used to be:
>         - requeue to s_more_io whenever MAX_WRITEBACK_PAGES is reached
>           this preempts big files

The patch uses th same limit.

>         - refill s_io iif it is drained
>           this prevents promotion of big/old files

Once a big file gets its first do_writepages it is moved behind the
other smaller files via i_flushed_when. And the same in reverse for
big vs old.

>         - return from sync_sb_inodes() after one go of s_io

I am not sure how this limit helps things out. Is this for superblock
starvation? Can you elaborate?

> Michael, could you sort out and document the new starvation prevention schemes?

The basic idea behind the writeback algorithm to handle starvation.
The over arching idea is that we want to preserve order of writeback
based on when an inode was dirtied and also preserve the dirtied_when
contents until the inode has been written back (partially or fully)

Every sync_sb_inodes we find the least recent inodes dirtied. To deal
with large or small starvation we have a s_flush_gen for each
iteration of sync_sb_inodes every time we issue a writeback we mark
that the inode cannot be processed until the next s_flush_gen. This
way we don't process one get to the rest since we keep pushing them
into subsequent s_fush_gen's.

Let me know if you want more detail or structured responses.

> Introduce i_flush_gen to help restarting from the last inode?
> Well, it's not as simple as list_heads.
> > 2) Added an inode flag to allow inodes to be marked so that they
> >    are never written back to disk.
> >
> >    The motivation behind this change is several fold. The first is
> >    to insure fairness in the writeback algorithm. The second is to
> What do you mean by fairness?

So originally this comment was written when I was trying to fix a bug
in 2.6.23. The one where we were starving large files from being
flushed. There was a fairness issue where small files were being
flushed but the large ones were just ballooning in memory.

> Why cannot I_WRITEBACK_NEVER be in a decoupled standalone patch?

The WRITEBACK_NEVER could be in a previous patch to the rbtree. But
not a subsequent patch to the rbtree. The rbtree depends on the
WRITEBACK_NEVER patch otherwise we run in to problems in
generic_delete_inode. Now that you point it out I think I can split
this patch into two patches and make the WRITEBACK_NEVER in the first

> More details about the fixings, please?

So again this comment was written against 2.6.23. The biggest fix is
the starving of big files. I remember there were other smaller issues,
but there have been so many changes in the patch sets that I need to
go back to quantify.

  parent reply	other threads:[~2008-01-17 21:07 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-15  8:09 [patch] Converting writeback linked lists to a tree based data structure Michael Rubin
2008-01-15  8:46 ` Peter Zijlstra
2008-01-15 17:53   ` Michael Rubin
     [not found]     ` <>
2008-01-16  3:01       ` Fengguang Wu
2008-01-16  3:44       ` Andrew Morton
     [not found]         ` <>
2008-01-16  4:25           ` Fengguang Wu
2008-01-16  4:42           ` Andrew Morton
     [not found]             ` <>
2008-01-16  4:55               ` Fengguang Wu
2008-01-16  5:51               ` Andrew Morton
     [not found]                 ` <>
2008-01-16  9:07                   ` Fengguang Wu
2008-01-16 22:35                     ` David Chinner
     [not found]                       ` <>
2008-01-17  3:16                         ` Fengguang Wu
2008-01-17  5:21                           ` David Chinner
2008-01-18  7:36                   ` Mike Waychison
2008-01-16  7:55         ` David Chinner
2008-01-16  8:13           ` Andrew Morton
     [not found]             ` <>
2008-01-16 13:06               ` Fengguang Wu
2008-01-16 18:55       ` Michael Rubin
     [not found]         ` <>
2008-01-17  3:31           ` Fengguang Wu
     [not found] ` <>
2008-01-17  9:41   ` Fengguang Wu
2008-01-17 21:07   ` Michael Rubin [this message]
2008-01-18  5:01     ` David Chinner
2008-01-18  5:38       ` Michael Rubin
2008-01-18  8:54         ` David Chinner
2008-01-18  9:26           ` Michael Rubin
     [not found]       ` <>
2008-01-18  5:41         ` Fengguang Wu
2008-01-19  2:50           ` David Chinner
     [not found]     ` <>
2008-01-18  4:56       ` Fengguang Wu
2008-01-18  5:41       ` Andi Kleen
     [not found]         ` <>
2008-01-18  6:01           ` Fengguang Wu
2008-01-18  7:48         ` Mike Waychison
2008-01-18  6:43       ` Michael Rubin
     [not found]         ` <>
2008-01-18  9:32           ` Fengguang Wu
  -- strict thread matches above, loose matches on Subject: below --
2007-12-13  0:32 Michael Rubin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).