LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@us.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>,
axboe@kernel.dk, tytso@mit.edu, shli@kernel.org, neilb@suse.de,
adilger.kernel@dilger.ca, jack@suse.cz, snitzer@redhat.com,
linux-kernel@vger.kernel.org, kmannth@us.ibm.com, cmm@us.ibm.com,
linux-ext4@vger.kernel.org, rwheeler@redhat.com, hch@lst.de,
josef@redhat.com
Subject: Re: [PATCH 3/3] block: reimplement FLUSH/FUA to support merge
Date: Mon, 24 Jan 2011 12:31:55 -0800 [thread overview]
Message-ID: <20110124203155.GA32261@tux1.beaverton.ibm.com> (raw)
In-Reply-To: <20110123102526.GA23121@htj.dyndns.org>
On Sun, Jan 23, 2011 at 11:25:26AM +0100, Tejun Heo wrote:
> Hello,
>
> On Fri, Jan 21, 2011 at 01:56:17PM -0500, Vivek Goyal wrote:
> > > + * Currently, the following conditions are used to determine when to issue
> > > + * flush.
> > > + *
> > > + * C1. At any given time, only one flush shall be in progress. This makes
> > > + * double buffering sufficient.
> > > + *
> > > + * C2. Flush is not deferred if any request is executing DATA of its
> > > + * sequence. This avoids issuing separate POSTFLUSHes for requests
> > > + * which shared PREFLUSH.
> >
> > Tejun, did you mean "Flush is deferred" instead of "Flush is not deferred"
> > above?
>
> Oh yeah, I did. :-)
>
> > IIUC, C2 might help only if requests which contain data are also going to
> > issue postflush. Couple of cases come to mind.
>
> That's true. I didn't want to go too advanced on it. I wanted
> something which is fairly mechanical (without intricate parameters)
> and effective enough for common cases.
>
> > - If queue supports FUA, I think we will not issue POSTFLUSH. In that
> > case issuing next PREFLUSH which data is in flight might make sense.
> >
> > - Even if queue does not support FUA and we are only getting requests
> > with REQ_FLUSH then also waiting for data requests to finish before
> > issuing next FLUSH might not help.
> >
> > - Even if queue does not support FUA and say we have a mix of REQ_FUA
> > and REQ_FLUSH, then this will help only if in a batch we have more
> > than 1 request which is going to issue POSTFLUSH and those postflush
> > will be merged.
>
> Sure, not applying C2 and 3 if the underlying device supports REQ_FUA
> would probably be the most compelling change of the bunch; however,
> please keep in mind that issuing flush as soon as possible doesn't
> necessarily result in better performance. It's inherently a balancing
> act between latency and throughput. Even inducing artificial issue
> latencies is likely to help if done right (as the ioscheds do).
>
> So, I think it's better to start with something simple and improve it
> with actual testing. If the current simple implementation can match
> Darrick's previous numbers, let's first settle the mechanisms. We can
Yep, the fsync-happy numbers more or less match... at least for 2.6.37:
http://tinyurl.com/4q2xeao
I'll give 2.6.38-rc2 a try later, though -rc1 didn't boot for me, so these
numbers are based on a backport to .37. :(
In general, the effect of this patchset is to change a 100% drop in fsync-happy
performance into a 20% drop. As always, the higher the average flush time, the
more the storage system benefits from having flush coordination. The only
exception to that is elm3b231_ipr, which is a md array of disks that are
attached to a controller that is now throwing errors, so I'm not sure I
entirely trust that machine's numbers.
As for elm3c44_sas, I'm not sure why enabling flushes always increases
performance, other than to say that I suspect it has something to do with
md-raid'ing disk trays together, because elm3a4_sas and elm3c71_extsas consist
of the same configuration of disk trays, only without the md. I've also been
told by our storage folks that md atop raid trays is not really a recommended
setup anyway.
The long and short of it is that this latest patchset looks and delivers the
behavior that I was aiming for. :)
> tune the latency/throughput balance all we want later. Other than the
> double buffering contraint (which can be relaxed too but I don't think
> that would be necessary or a good idea) things can be easily adjusted
> in blk_kick_flush(). It's intentionally designed that way.
>
> > - Ric Wheeler was once mentioning that there are boxes which advertise
> > writeback cache but are battery backed so they ignore flush internally and
> > signal completion immediately. I am not sure how prevalent those
> > cases are but I think waiting for data to finish will delay processing
> > of new REQ_FLUSH requests in pending queue for such array. There
> > we will not anyway benefit from merging of FLUSH.
>
> I don't really think we should design the whole thing around broken
> devices which incorrectly report writeback cache when it need not.
> The correct place to work around that is during device identification
> not in the flush logic.
elm3a4_sas and elm3c71_extsas advertise writeback cache yet the flush completion
times are suspiciously low. I suppose it could be useful to disable flushes to
squeeze out that last bit of performance, though I don't know how one goes
about querying the disk array to learn if there's a battery behind the cache.
I guess the current mechanism (admin knob that picks a safe default) is good
enough.
> > Given that C2 is going to benefit primarily only if queue does not support
> > FUA and we have many requets with REQ_FUA set, will it make sense to
> > put additional checks for C2. Atleast a simple queue support FUA
> > check might help.
> >
> > In practice does C2 really help or we can get rid of it entirely?
>
> Again, issuing flushes as fast as possible isn't necessarily better.
> It might feel counter-intuitive but it generally makes sense to delay
> flush if there are a lot of concurrent flush activities going on.
> Another related interesting point is that with flush merging,
> depending on workload, there's a likelihood that FUA, even if the
> device supports it, might result in worse performance than merged DATA
> + single POSTFLUSH sequence.
>
> Thanks.
>
> --
> tejun
next prev parent reply other threads:[~2011-01-24 20:34 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-21 15:59 [PATCHSET] " Tejun Heo
2011-01-21 15:59 ` [PATCH 1/3] block: add REQ_FLUSH_SEQ Tejun Heo
2011-01-21 15:59 ` [PATCH 2/3] block: improve flush bio completion Tejun Heo
2011-01-21 15:59 ` [PATCH 3/3] block: reimplement FLUSH/FUA to support merge Tejun Heo
2011-01-21 18:56 ` Vivek Goyal
2011-01-21 19:19 ` Vivek Goyal
2011-01-23 10:25 ` Tejun Heo
2011-01-23 10:29 ` Tejun Heo
2011-01-24 20:31 ` Darrick J. Wong [this message]
2011-01-25 10:21 ` Tejun Heo
2011-01-25 11:39 ` Jens Axboe
2011-03-23 23:37 ` Darrick J. Wong
2011-01-25 22:56 ` Darrick J. Wong
2011-01-22 0:49 ` Mike Snitzer
2011-01-23 10:31 ` Tejun Heo
2011-01-25 20:46 ` Vivek Goyal
2011-01-25 21:04 ` Mike Snitzer
2011-01-23 10:48 ` [PATCH UPDATED " Tejun Heo
2011-01-25 20:41 ` [KNOWN BUGGY RFC PATCH 4/3] block: skip elevator initialization for flush requests Mike Snitzer
2011-01-25 21:55 ` Mike Snitzer
2011-01-26 5:27 ` [RFC PATCH 4/3] block: skip elevator initialization for flush requests -- was never BUGGY relative to upstream Mike Snitzer
2011-01-26 10:03 ` [KNOWN BUGGY RFC PATCH 4/3] block: skip elevator initialization for flush requests Tejun Heo
2011-01-26 10:05 ` Tejun Heo
2011-02-01 17:38 ` [RFC " Mike Snitzer
2011-02-01 18:52 ` Tejun Heo
2011-02-01 22:46 ` [PATCH v2 1/2] " Mike Snitzer
2011-02-02 21:51 ` Vivek Goyal
2011-02-02 22:06 ` Mike Snitzer
2011-02-02 22:55 ` [PATCH v3 1/2] block: skip elevator data " Mike Snitzer
2011-02-03 9:28 ` Tejun Heo
2011-02-03 14:48 ` [PATCH v4 " Mike Snitzer
2011-02-03 13:24 ` [PATCH v3 " Jens Axboe
2011-02-03 13:38 ` Tejun Heo
2011-02-04 15:04 ` Vivek Goyal
2011-02-04 15:08 ` Tejun Heo
2011-02-04 16:58 ` [PATCH v5 " Mike Snitzer
2011-02-03 14:54 ` [PATCH v3 " Mike Snitzer
2011-02-01 22:46 ` [PATCH v2 2/2] block: share request flush fields with elevator_private Mike Snitzer
2011-02-02 21:52 ` Vivek Goyal
2011-02-03 9:24 ` Tejun Heo
2011-02-11 10:08 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110124203155.GA32261@tux1.beaverton.ibm.com \
--to=djwong@us.ibm.com \
--cc=adilger.kernel@dilger.ca \
--cc=axboe@kernel.dk \
--cc=cmm@us.ibm.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=josef@redhat.com \
--cc=kmannth@us.ibm.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=rwheeler@redhat.com \
--cc=shli@kernel.org \
--cc=snitzer@redhat.com \
--cc=tj@kernel.org \
--cc=tytso@mit.edu \
--cc=vgoyal@redhat.com \
--subject='Re: [PATCH 3/3] block: reimplement FLUSH/FUA to support merge' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).