LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org,
virtualization@lists.linux-foundation.org, jens.axboe@oracle.com,
Hirokazu Takahashi <taka@valinux.co.jp>,
Ryo Tsuruta <ryov@valinux.co.jp>,
Andrea Righi <righi.andrea@gmail.com>,
Satoshi UCHIDA <s-uchida@ap.jp.nec.com>,
fernando@oss.ntt.co.jp, balbir@linux.vnet.ibm.com,
Andrew Morton <akpm@linux-foundation.org>,
menage@google.com, ngupta@google.com,
Jeff Moyer <jmoyer@redhat.com>
Subject: Re: [patch 0/4] [RFC] Another proportional weight IO controller
Date: Sun, 9 Nov 2008 20:40:24 +1100 [thread overview]
Message-ID: <20081109094024.GE2373@disturbed> (raw)
In-Reply-To: <1226053904.7803.5856.camel@twins>
On Fri, Nov 07, 2008 at 11:31:44AM +0100, Peter Zijlstra wrote:
> On Fri, 2008-11-07 at 11:41 +1100, Dave Chinner wrote:
> > On Thu, Nov 06, 2008 at 06:11:27PM +0100, Peter Zijlstra wrote:
> > > On Thu, 2008-11-06 at 11:57 -0500, Rik van Riel wrote:
> > > > Peter Zijlstra wrote:
> > > >
> > > > > The only real issue I can see is with linear volumes, but
> > > > > those are stupid anyway - non of the gains but all the
> > > > > risks.
> > > >
> > > > Linear volumes may well be the most common ones.
> > > >
> > > > People start out with the filesystems at a certain size,
> > > > increasing onto a second (new) disk later, when more space
> > > > is required.
> > >
> > > Are they aware of how risky linear volumes are? I would
> > > discourage anyone from using them.
> >
> > In what way are they risky?
>
> You loose all your data when one disk dies, so your mtbf decreases
> with the number of disks in your linear span. And you get non of
> the benefits from having multiple disks, like extra speed from
> striping, or redundancy from raid.
Fmeh. Step back and think for a moment. How does every major
distro build redundant root drives?
Yeah, they build a mirror and then put LVM on top of the mirror
to partition it. Each partition is a *linear volume*, but
no single disk failure is going to lose data because it's
been put on top of a mirror.
IOWs, reliability of linear volumes is only an issue if you don't
build redundancy into your storage stack. Just like RAID0, a single
disk failure will lose data. So, most people use linear volumes on
top of RAID1 or RAID5 to avoid such a single disk failure problem.
People do the same thing with RAID0 - it's what RAID10 and RAID50
do....
Also, linear volume performance scalability is on a different axis
to striping. Striping improves bandwidth, but each disk in a stripe
tends to make the same head movements. Hence striping improves
sequential throughput but only provides limited iops scalability.
Effectively, striping only improves throughput while the disks are
not seeking a lot. Add a few parallel I/O streams, and a stripe will
start to slow down as each disk seeks between streams. i.e. disks
in stripes cannot be considered to be able to operate independently.
Linear voulmes create independent regions within the address space -
the regions can seek independently when under concurrent I/O and
hence iops scalability is much greater. Aggregate bandwidth is the
same a striping, it's just that a single stream is limited in
throughput. If you want to improve single stream throughput,
you stripe before you concatenate.
That's why people create layered storage systems like this:
linear volume
|->stripe
|-> md RAID5
|-> disk
|-> disk
|-> disk
|-> disk
|-> disk
|-> md RAID5
|-> disk
|-> disk
|-> disk
|-> disk
|-> disk
|->stripe
|-> md RAID5
......
|->stripe
......
What you then need is a filesystem that can spread the load over
such a layout. Lets use, for argument's sake, XFS and tell it the
geometry of the RAID5 luns that make up the volume so that it's
allocation is all nicely aligned. Then we match the allocation
group size to the size of each independent part of the linear
volume. Now when XFS spreads it's inodes and data over multiple
AGs, it's spreading the load across disks that can operate
concurrently....
Effectively, linear volumes are about as dangerous as striping.
If you don't build in redundancy at a level below the linear
volume or stripe, then you lose when something fails.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2008-11-09 9:40 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-06 15:30 vgoyal
2008-11-06 15:30 ` [patch 1/4] io controller: documentation vgoyal
2008-11-07 2:32 ` KAMEZAWA Hiroyuki
2008-11-07 14:27 ` Vivek Goyal
2008-11-10 2:48 ` Li Zefan
2008-11-10 13:44 ` Vivek Goyal
2008-11-06 15:30 ` [patch 2/4] io controller: biocgroup implementation vgoyal
2008-11-07 2:50 ` KAMEZAWA Hiroyuki
2008-11-07 4:19 ` Hirokazu Takahashi
2008-11-07 14:44 ` Vivek Goyal
2008-11-06 15:30 ` [patch 3/4] io controller: Core IO controller implementation logic vgoyal
2008-11-07 3:21 ` KAMEZAWA Hiroyuki
2008-11-07 14:50 ` Vivek Goyal
2008-11-08 2:35 ` [patch 3/4] io controller: Core IO controller implementationlogic KAMEZAWA Hiroyuki
2008-11-11 8:50 ` [patch 3/4] io controller: Core IO controller implementation logic Gui Jianfeng
2008-11-06 15:30 ` [patch 4/4] io controller: Put IO controller to use in device mapper and standard make_request() function vgoyal
2008-11-06 15:49 ` [patch 0/4] [RFC] Another proportional weight IO controller Peter Zijlstra
2008-11-06 16:01 ` Vivek Goyal
2008-11-06 16:16 ` Peter Zijlstra
2008-11-06 16:39 ` Vivek Goyal
2008-11-06 16:52 ` Peter Zijlstra
2008-11-06 16:57 ` Rik van Riel
2008-11-06 17:11 ` Peter Zijlstra
2008-11-07 0:41 ` Dave Chinner
2008-11-07 10:31 ` Peter Zijlstra
2008-11-09 9:40 ` Dave Chinner [this message]
2008-11-06 17:08 ` Vivek Goyal
2008-11-06 23:07 ` Nauman Rafique
2008-11-07 14:19 ` Vivek Goyal
2008-11-07 21:36 ` Nauman Rafique
2008-11-10 14:11 ` Vivek Goyal
2008-11-11 19:55 ` Nauman Rafique
2008-11-11 22:30 ` Vivek Goyal
2008-11-12 21:20 ` Nauman Rafique
2008-11-13 13:49 ` Fabio Checconi
2008-11-13 18:08 ` Vivek Goyal
2008-11-13 19:15 ` Fabio Checconi
2008-11-13 22:27 ` Nauman Rafique
2008-11-13 23:10 ` Fabio Checconi
2008-11-14 4:58 ` Satoshi UCHIDA
2008-11-14 8:02 ` Peter Zijlstra
2008-11-14 10:06 ` Satoshi UCHIDA
2008-11-06 16:47 ` Rik van Riel
2008-11-07 2:36 ` Gui Jianfeng
2008-11-07 13:38 ` Vivek Goyal
2008-11-13 9:05 ` Ryo Tsuruta
2008-11-13 15:58 ` Vivek Goyal
2008-11-13 18:41 ` Divyesh Shah
2008-11-13 21:46 ` Vivek Goyal
2008-11-13 22:57 ` Divyesh Shah
2008-11-14 16:05 ` Vivek Goyal
2008-11-14 22:44 ` Nauman Rafique
2008-11-17 14:23 ` Vivek Goyal
2008-11-18 2:02 ` Li Zefan
2008-11-18 5:01 ` Nauman Rafique
2008-11-18 7:42 ` Li Zefan
2008-11-18 22:23 ` Nauman Rafique
2008-11-18 12:05 ` Fabio Checconi
2008-11-18 14:07 ` Vivek Goyal
2008-11-18 14:41 ` Fabio Checconi
2008-11-18 19:12 ` Jens Axboe
2008-11-18 19:47 ` Vivek Goyal
2008-11-18 21:14 ` Fabio Checconi
2008-11-19 1:52 ` Aaron Carroll
2008-11-19 10:17 ` Fabio Checconi
2008-11-19 11:06 ` Fabio Checconi
2008-11-20 4:45 ` Aaron Carroll
2008-11-20 6:56 ` Fabio Checconi
2008-11-19 14:30 ` Jens Axboe
2008-11-19 15:52 ` Fabio Checconi
2008-11-18 23:07 ` Nauman Rafique
2008-11-19 14:24 ` Jens Axboe
2008-11-20 0:12 ` Divyesh Shah
2008-11-20 8:16 ` Jens Axboe
2008-11-20 13:40 ` Vivek Goyal
2008-11-20 19:54 ` Nauman Rafique
2008-11-20 21:15 ` Vivek Goyal
2008-11-20 22:42 ` Nauman Rafique
2008-11-21 15:22 ` Vivek Goyal
2008-11-26 6:40 ` Fernando Luis Vázquez Cao
2008-11-26 15:18 ` Vivek Goyal
2008-11-20 21:31 ` Vivek Goyal
2008-11-21 3:05 ` Fabio Checconi
2008-11-21 14:58 ` Vivek Goyal
2008-11-21 15:21 ` Fabio Checconi
2008-11-18 22:33 ` Nauman Rafique
2008-11-18 23:44 ` Fabio Checconi
2008-11-19 7:09 ` Paolo Valente
2008-11-13 22:13 ` Vivek Goyal
2008-11-20 9:20 ` Ryo Tsuruta
2008-11-20 13:47 ` Vivek Goyal
2008-11-25 2:33 ` Ryo Tsuruta
2008-11-25 16:27 ` Vivek Goyal
2008-11-25 22:38 ` Nauman Rafique
2008-11-26 14:06 ` Paolo Valente
2008-11-26 19:41 ` Nauman Rafique
2008-11-26 22:21 ` Fabio Checconi
2008-11-26 11:55 ` Fernando Luis Vázquez Cao
2008-11-26 12:47 ` Ryo Tsuruta
2008-11-26 16:08 ` Vivek Goyal
2008-11-27 8:43 ` Fernando Luis Vázquez Cao
2008-11-28 3:09 ` Ryo Tsuruta
2008-11-28 13:33 ` Ryo Tsuruta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081109094024.GE2373@disturbed \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.linux-foundation.org \
--cc=fernando@oss.ntt.co.jp \
--cc=jens.axboe@oracle.com \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=ngupta@google.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=taka@valinux.co.jp \
--cc=vgoyal@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
--subject='Re: [patch 0/4] [RFC] Another proportional weight IO controller' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).