LKML Archive on
help / color / mirror / Atom feed
From: Ric Wheeler <>
To: Michael Tokarev <>
Cc: device-mapper development <>,
	Andi Kleen <>,
Subject: Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Date: Mon, 18 Feb 2008 08:52:10 -0500	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Michael Tokarev wrote:
> Ric Wheeler wrote:
>> Alasdair G Kergon wrote:
>>> On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote:
>>>> On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
>>>>> I wonder if it's worth the effort to try to implement this.
>>> My personal view (which seems to be in the minority) is that it's a
>>> waste of our development time *except* in the (rare?) cases similar to
>>> the ones Andi is talking about.
>> Using working barriers is important for normal users when you really
>> care about data loss and have normal drives in a box. We do power fail
>> testing on boxes (with reiserfs and ext3) and can definitely see a lot
>> of file system corruption eliminated over power failures when barriers
>> are enabled properly.
>> It is not unreasonable for some machines to disable barriers to get a
>> performance boost, but I would not do that when you are storing things
>> you really need back.
> The talk here is about something different - about supporting barriers
> on md/dm devices, i.e., on pseudo-devices which uses multiple real devices
> as components (software RAIDs etc).  In this "world" it's nearly impossible
> to support barriers if there are more than one underlying component device,
> barriers only works if there's only one component.  And the talk is about
> supporting barriers only in "minority" of cases - mostly for simplest
> device-mapper case only, NOT covering any raid1 or other "fancy" configurations.

I understand that. Most of the time, dm or md devices are composed of 
uniform components which will uniformly support (or not) the cache flush 
commands used by barriers.

>> Of course, you don't need barriers when you either disable the write
>> cache on the drives or use a battery backed RAID array which gives you a
>> write cache that will survive power outages...
> Two things here.
> First, I still don't understand why in God's sake barriers are "working"
> while regular cache flushes are not.  Almost no consumer-grade hard drive
> supports write barriers, but they all support regular cache flushes, and
> the latter should be enough (while not the most speed-optimal) to ensure
> data safety.  Why to require write cache disable (like in XFS FAQ) instead
> of going the flush-cache-when-appropriate (as opposed to write-barrier-
> when-appropriate) way?

Barriers have different flavors, but can be composed of "cache" flushes 
which are supported on all drives that I have seen (S-ATA and ATA) for 
many years now. That is the flavor of barriers that we test with S-ATA & 
ATA drives.

The issue is that without flushing/invalidating (or other way of 
controlling the behavior of your storage), the file system has no way to 
make sure that all data is on persistent & non-volatile media.

> And second, "surprisingly", battery-backed RAID write caches tends to fail
> too, sometimes... ;)  Usually, such a battery is enough to keep the data
> in memory for several hours only (sine many RAID controllers uses regular
> RAM for memory caches, which requires some power to keep its state), --
> I come across this issue the hard way, and realized that only very few
> persons around me who manages raid systems even knows about this problem -
> that the battery-backed cache is only for some time...  For example,
> power failed at evening, and by tomorrow morning, batteries are empty
> already.  Or, with better batteries, think about a weekend... ;)
> (I've seen some vendors now uses flash-based backing store for caches
> instead, which should ensure far better results here).
> /mjt

That is why you need to get a good array, not just a simple controller ;-)

Most arrays do not use batteries to hold up the write cache, they use 
the batteries to move any cached data to non-volatile media in the time 
that the batteries hold up.

You could certainly get this kind of behavior from the flash scheme you 
describe above as well...


  reply	other threads:[~2008-02-18 13:52 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-15 12:08 [PATCH] Implement barrier support for single device DM devices Andi Kleen
2008-02-15 12:20 ` Alasdair G Kergon
2008-02-15 13:07   ` Michael Tokarev
2008-02-15 14:20     ` Andi Kleen
2008-02-15 14:12       ` [dm-devel] " Alasdair G Kergon
2008-02-15 15:34         ` Andi Kleen
2008-02-15 15:31           ` Alan Cox
2008-02-18 12:48         ` Ric Wheeler
2008-02-18 13:24           ` Michael Tokarev
2008-02-18 13:52             ` Ric Wheeler [this message]
2008-02-19  2:45               ` Alasdair G Kergon
2008-05-16 19:55                 ` Mike Snitzer
2008-05-16 21:48                   ` Andi Kleen
2008-02-18 22:16             ` David Chinner
2008-02-19  2:56               ` Alasdair G Kergon
2008-02-19  5:36                 ` David Chinner
2008-02-19  9:43                 ` Andi Kleen
2008-02-19  7:19               ` Jeremy Higdon
2008-02-19  7:58                 ` Michael Tokarev
2008-02-20 13:38                 ` Ric Wheeler
2008-02-21  3:29                 ` Neil Brown
2008-02-21  3:39               ` Neil Brown
2008-02-17 23:31     ` David Chinner
2008-02-19  2:39     ` Alasdair G Kergon
2008-02-19 11:12       ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).