LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Tomasz Chmielewski <mangoo@wpkg.org>
To: Theodore Tso <tytso@mit.edu>, Andi Kleen <andi@firstfloor.org>,
LKML <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: very poor ext3 write performance on big filesystems?
Date: Wed, 27 Feb 2008 12:20:19 +0100 [thread overview]
Message-ID: <47C54773.4040402@wpkg.org> (raw)
In-Reply-To: <20080218141640.GC12568@mit.edu>
Theodore Tso schrieb:
> On Mon, Feb 18, 2008 at 03:03:44PM +0100, Andi Kleen wrote:
>> Tomasz Chmielewski <mangoo@wpkg.org> writes:
>>> Is it normal to expect the write speed go down to only few dozens of
>>> kilobytes/s? Is it because of that many seeks? Can it be somehow
>>> optimized?
>> I have similar problems on my linux source partition which also
>> has a lot of hard linked files (although probably not quite
>> as many as you do). It seems like hard linking prevents
>> some of the heuristics ext* uses to generate non fragmented
>> disk layouts and the resulting seeking makes things slow.
A follow-up to this thread.
Using small optimizations like playing with /proc/sys/vm/* didn't help
much, increasing "commit=" ext3 mount option helped only a tiny bit.
What *did* help a lot was... disabling the internal bitmap of the RAID-5
array. "rm -rf" doesn't "pause" for several seconds any more.
If md and dm supported barriers, it would be even better I guess (I
could enable write cache with some degree of confidence).
This is "iostat sda -d 10" output without the internal bitmap.
The system mostly tries to read (Blk_read/s), and once in a while it
does a big commit (Blk_wrtn/s):
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 164,67 2088,62 0,00 20928 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 180,12 1999,60 0,00 20016 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 172,63 2587,01 0,00 25896 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 156,53 2054,64 0,00 20608 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 170,20 3013,60 0,00 30136 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 119,46 1377,25 5264,67 13800 52752
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 154,05 1897,10 0,00 18952 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 197,70 2177,02 0,00 21792 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 166,47 1805,19 0,00 18088 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 150,95 1552,05 0,00 15536 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 158,44 1792,61 0,00 17944 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 132,47 1399,40 3781,82 14008 37856
With the bitmap enabled, it sometimes behave similarly, but mostly, I
can see as reads compete with writes, and both have very low numbers then:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 112,57 946,11 5837,13 9480 58488
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 157,24 1858,94 0,00 18608 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 116,90 1173,60 44,00 11736 440
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 24,05 85,43 172,46 856 1728
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 25,60 90,40 165,60 904 1656
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 25,05 276,25 180,44 2768 1808
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 22,70 65,60 229,60 656 2296
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 21,66 202,79 786,43 2032 7880
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 20,90 83,20 1800,00 832 18000
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 51,75 237,36 479,52 2376 4800
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 35,43 129,34 245,91 1296 2464
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 34,50 88,00 270,40 880 2704
Now, let's disable the bitmap in the RAID-5 array:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 110,59 536,26 973,43 5368 9744
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 119,68 533,07 1574,43 5336 15760
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 123,78 368,43 2335,26 3688 23376
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 122,48 315,68 1990,01 3160 19920
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 117,08 580,22 1009,39 5808 10104
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 119,50 324,00 1080,80 3240 10808
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 118,36 353,69 1926,55 3544 19304
And let's enable it again - after a while, it degrades again, and I can
see "rm -rf" stops for longer periods:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 162,70 2213,60 0,00 22136 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 165,73 1639,16 0,00 16408 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 119,76 1192,81 3722,16 11952 37296
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 178,70 1855,20 0,00 18552 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 162,64 1528,07 0,80 15296 8
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 182,87 2082,07 0,00 20904 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 168,93 1692,71 0,00 16944 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 177,45 1572,06 0,00 15752 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 123,10 1436,00 4941,60 14360 49416
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 201,30 1984,03 0,00 19880 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 165,50 1555,20 22,40 15552 224
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 25,35 273,05 189,22 2736 1896
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 22,58 63,94 165,43 640 1656
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 69,40 435,20 262,40 4352 2624
There is a related thread (although not much kernel-related) on a
BackupPC mailing list:
http://thread.gmane.org/gmane.comp.sysutils.backup.backuppc.general/14009
As it's BackupPC software which makes this amount of hardlinks (but hey,
I can keep ~14 TB of data on a 1.2 TB filesystem which is not even 65%
full).
--
Tomasz Chmielewski
http://wpkg.org
next prev parent reply other threads:[~2008-02-27 11:20 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-18 12:57 Tomasz Chmielewski
2008-02-18 14:03 ` Andi Kleen
2008-02-18 14:16 ` Theodore Tso
2008-02-18 15:02 ` Tomasz Chmielewski
2008-02-18 15:16 ` Theodore Tso
2008-02-18 15:57 ` Andi Kleen
2008-02-18 15:35 ` Theodore Tso
2008-02-20 10:57 ` Jan Engelhardt
2008-02-20 17:44 ` David Rees
2008-02-20 18:08 ` Jan Engelhardt
2008-02-18 16:16 ` Tomasz Chmielewski
2008-02-18 18:45 ` Theodore Tso
2008-02-18 15:18 ` Andi Kleen
2008-02-18 15:03 ` Theodore Tso
2008-02-19 14:54 ` Tomasz Chmielewski
2008-02-19 15:06 ` Chris Mason
2008-02-19 15:21 ` Tomasz Chmielewski
2008-02-19 16:04 ` Chris Mason
2008-02-19 18:29 ` Mark Lord
2008-02-19 18:41 ` Mark Lord
2008-02-19 18:58 ` Paulo Marques
2008-02-19 22:33 ` Mark Lord
2008-02-27 11:20 ` Tomasz Chmielewski [this message]
2008-02-27 20:03 ` Andreas Dilger
2008-02-27 20:25 ` Tomasz Chmielewski
2008-03-01 20:04 ` Bill Davidsen
2008-02-19 9:24 ` Vladislav Bolkhovitin
[not found] <9YdLC-75W-51@gated-at.bofh.it>
[not found] ` <9YeRh-Gq-39@gated-at.bofh.it>
[not found] ` <9Yf0W-SX-19@gated-at.bofh.it>
[not found] ` <9YfNi-2da-23@gated-at.bofh.it>
[not found] ` <9YfWL-2pZ-1@gated-at.bofh.it>
[not found] ` <9Yg6H-2DJ-23@gated-at.bofh.it>
2008-02-19 13:14 ` Paul Slootman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47C54773.4040402@wpkg.org \
--to=mangoo@wpkg.org \
--cc=andi@firstfloor.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--subject='Re: very poor ext3 write performance on big filesystems?' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).