LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: David Chinner <dgc@sgi.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michael Rubin <mrubin@google.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
Date: Thu, 17 Jan 2008 16:21:29 +1100	[thread overview]
Message-ID: <20080117052129.GJ155259@sgi.com> (raw)
In-Reply-To: <E1JFLEW-0002oE-G1@localhost.localdomain>

On Thu, Jan 17, 2008 at 11:16:00AM +0800, Fengguang Wu wrote:
> On Thu, Jan 17, 2008 at 09:35:10AM +1100, David Chinner wrote:
> > On Wed, Jan 16, 2008 at 05:07:20PM +0800, Fengguang Wu wrote:
> > > On Tue, Jan 15, 2008 at 09:51:49PM -0800, Andrew Morton wrote:
> > > > > Then to do better ordering by adopting radix tree(or rbtree
> > > > > if radix tree is not enough),
> > > > 
> > > > ordering of what?
> > > 
> > > Switch from time to location.
> > 
> > Note that data writeback may be adversely affected by location
> > based writeback rather than time based writeback - think of
> > the effect of location based data writeback on an app that
> > creates lots of short term (<30s) temp files and then removes
> > them before they are written back.
> 
> A small(e.g. 5s) time window can still be enforced, but...

Yes, you could, but that will then result in non-deterministic
performance for repeated workloads because the order of file
writeback will not be consistent.

e.g.  the first run is fast because the output file is at lower
offset than the temp file meaning the temp file gets deleted
without being written.

The second run is slow because the location of the files is
reversed and the temp file is written to disk before the
final output file and hence the run is much slower because
it writes much more.

The third run is also slow, but the files are like the first
fast run. However, pdflush tries to write the temp file back
within 5s of it being dirtied so it skips it and writes
the output file first.

The difference between the first+second case can be found by
knowing that inode number determines writeback order, but
there is no obvious clue as to why the first+third runs are
different.

This is exactly the sort of non-deterministic behaviour we 
want to avoid in a writeback algorithm.

> > Hmmmm - I'm wondering if we'd do better to split data writeback from
> > inode writeback. i.e. we do two passes.  The first pass writes all
> > the data back in time order, the second pass writes all the inodes
> > back in location order.
> > 
> > Right now we interleave data and inode writeback, (i.e.  we do data,
> > inode, data, inode, data, inode, ....). I'd much prefer to see all
> > data written out first, then the inodes. ->writepage often dirties
> > the inode and hence if we need to do multiple do_writepages() calls
> > on an inode to flush all the data (e.g. congestion, large amounts of
> > data to be written, etc), we really shouldn't be calling
> > write_inode() after every do_writepages() call. The inode
> > should not be written until all the data is written....
> 
> That may do good to XFS. Another case is documented as follows:
> "the write_inode() function of a typical fs will perform no I/O, but
> will mark buffers in the blockdev mapping as dirty."

Yup, but in that situation ->write_inode() does not do any I/O, so
it will work with any high level inode writeback ordering or timing
scheme equally well.  As a result, that's not the case we need to
optimise at all.

FWIW, the NFS client is likely to work better with split data/
inode writeback as it also has to mark the inode dirty on async
write completion (to get ->write_inode called to issue a commit
RPC). Hence delaying the inode write until after all the data
is written makes sense there as well....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

  reply	other threads:[~2008-01-17  5:22 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-15  8:09 [patch] Converting writeback linked lists to a tree based data structure Michael Rubin
2008-01-15  8:46 ` Peter Zijlstra
2008-01-15 17:53   ` Michael Rubin
     [not found]     ` <400452490.28636@ustc.edu.cn>
2008-01-16  3:01       ` Fengguang Wu
2008-01-16  3:44       ` Andrew Morton
     [not found]         ` <400457571.32162@ustc.edu.cn>
2008-01-16  4:25           ` Fengguang Wu
2008-01-16  4:42           ` Andrew Morton
     [not found]             ` <400459376.04290@ustc.edu.cn>
2008-01-16  4:55               ` Fengguang Wu
2008-01-16  5:51               ` Andrew Morton
     [not found]                 ` <400474447.19383@ustc.edu.cn>
2008-01-16  9:07                   ` Fengguang Wu
2008-01-16 22:35                     ` David Chinner
     [not found]                       ` <400539769.00869@ustc.edu.cn>
2008-01-17  3:16                         ` Fengguang Wu
2008-01-17  5:21                           ` David Chinner [this message]
2008-01-18  7:36                   ` Mike Waychison
2008-01-16  7:55         ` David Chinner
2008-01-16  8:13           ` Andrew Morton
     [not found]             ` <400488821.15609@ustc.edu.cn>
2008-01-16 13:06               ` Fengguang Wu
2008-01-16 18:55       ` Michael Rubin
     [not found]         ` <400540692.29046@ustc.edu.cn>
2008-01-17  3:31           ` Fengguang Wu
     [not found] ` <400562938.07583@ustc.edu.cn>
2008-01-17  9:41   ` Fengguang Wu
2008-01-17 21:07   ` Michael Rubin
2008-01-18  5:01     ` David Chinner
2008-01-18  5:38       ` Michael Rubin
2008-01-18  8:54         ` David Chinner
2008-01-18  9:26           ` Michael Rubin
     [not found]       ` <400634919.20750@ustc.edu.cn>
2008-01-18  5:41         ` Fengguang Wu
2008-01-19  2:50           ` David Chinner
     [not found]     ` <400632190.14601@ustc.edu.cn>
2008-01-18  4:56       ` Fengguang Wu
2008-01-18  5:41       ` Andi Kleen
     [not found]         ` <400644314.11994@ustc.edu.cn>
2008-01-18  6:01           ` Fengguang Wu
2008-01-18  7:48         ` Mike Waychison
2008-01-18  6:43       ` Michael Rubin
     [not found]         ` <400651538.20437@ustc.edu.cn>
2008-01-18  9:32           ` Fengguang Wu
  -- strict thread matches above, loose matches on Subject: below --
2007-12-13  0:32 Michael Rubin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080117052129.GJ155259@sgi.com \
    --to=dgc@sgi.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mrubin@google.com \
    --cc=wfg@mail.ustc.edu.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).