LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: David Chinner <dgc@sgi.com>, Michael Tokarev <mjt@tls.msk.ru>,
	Ric Wheeler <ric@emc.com>,
	device-mapper development <dm-devel@redhat.com>,
	Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Date: Tue, 19 Feb 2008 16:36:40 +1100	[thread overview]
Message-ID: <20080219053640.GC155407@sgi.com> (raw)
In-Reply-To: <20080219025643.GD4066@agk.fab.redhat.com>

On Tue, Feb 19, 2008 at 02:56:43AM +0000, Alasdair G Kergon wrote:
> On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote:
> > Surely any hardware that doesn't support barrier
> > operations can emulate them with cache flushes when they receive a
> > barrier I/O from the filesystem....
>  
> My complaint about having to support them within dm when more than one
> device is involved is because any efficiencies disappear: you can't send
> further I/O to any one device until all the other devices have completed
> their barrier (or else later I/O to that device could overtake the
> barrier on another device).

Right - it's a horrible performance hit.

But - how is what you describe any different to the filesystem doing:

	- flush block device
	- issue I/O
	- wait for completion
	- flush block device

around any I/O that it would otherwise simply tag as a barrier?
That serialisation at the filesystem layer is a horrible, horrible
performance hi.

And then there's the fact that we can't implement that in XFS
because all the barrier I/Os we issue are asynchronous.  We'd
basically have to serialise all metadata operations and now we
are talking about far worse performance hits than implementing
barrier emulation in the block device.

Also, it's instructive to look at the implementation of
blkdev_issue_flush() - the API one is supposed to use to trigger a
full block device flush. It doesn't work on DM/MD either, because
it uses a no-I/O barrier bio:

        bio->bi_end_io = bio_end_empty_barrier;
        bio->bi_private = &wait;
        bio->bi_bdev = bdev;
        submit_bio(1 << BIO_RW_BARRIER, bio);

        wait_for_completion(&wait);

So, if the underlying block device doesn't support barriers,
there's no point in changing the filesystem to issue flushes,
either...

> And then I argue that it would be better
> for the filesystem to have the information that these are not hardware
> barriers so it has the opportunity of tuning its behaviour (e.g.
> flushing less often because it's a more expensive operation).

There is generally no option from the filesystem POV to "flush
less". Either we use barrier I/Os where we need to and are safe with
volatile caches or we corrupt filesystems with volatile caches when
power loss occurs. There is no in-between where "flushing less"
will save us from corruption....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

  reply	other threads:[~2008-02-19  5:37 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-15 12:08 [PATCH] Implement barrier support for single device DM devices Andi Kleen
2008-02-15 12:20 ` Alasdair G Kergon
2008-02-15 13:07   ` Michael Tokarev
2008-02-15 14:20     ` Andi Kleen
2008-02-15 14:12       ` [dm-devel] " Alasdair G Kergon
2008-02-15 15:34         ` Andi Kleen
2008-02-15 15:31           ` Alan Cox
2008-02-18 12:48         ` Ric Wheeler
2008-02-18 13:24           ` Michael Tokarev
2008-02-18 13:52             ` Ric Wheeler
2008-02-19  2:45               ` Alasdair G Kergon
2008-05-16 19:55                 ` Mike Snitzer
2008-05-16 21:48                   ` Andi Kleen
2008-02-18 22:16             ` David Chinner
2008-02-19  2:56               ` Alasdair G Kergon
2008-02-19  5:36                 ` David Chinner [this message]
2008-02-19  9:43                 ` Andi Kleen
2008-02-19  7:19               ` Jeremy Higdon
2008-02-19  7:58                 ` Michael Tokarev
2008-02-20 13:38                 ` Ric Wheeler
2008-02-21  3:29                 ` Neil Brown
2008-02-21  3:39               ` Neil Brown
2008-02-17 23:31     ` David Chinner
2008-02-19  2:39     ` Alasdair G Kergon
2008-02-19 11:12       ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080219053640.GC155407@sgi.com \
    --to=dgc@sgi.com \
    --cc=andi@firstfloor.org \
    --cc=dm-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjt@tls.msk.ru \
    --cc=ric@emc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).