LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* iowait problems on 2.6, not on 2.4
@ 2004-05-26 15:43 Antonio Larrosa Jiménez
2004-05-27 3:52 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Antonio Larrosa Jiménez @ 2004-05-26 15:43 UTC (permalink / raw)
To: linux-kernel
Hello,
I'm doing some benchmarking for my company with our own applications, and I
got some unexpected results (sorry for the long mail, but I think it's faster
to provide as much feedback as I can on the first mail).
The machine is a 4 cpu Pentium III (Cascades) system with
four SCSI SEAGATE ST336704 hard disks connected to an Adaptec AIC-7899P
U160/m, and a external RAID connected to a QLA2200/QLA2xxx FC-SCSI Host Bus
Adapter. The machine has 1Gb RAM.
I did two fresh test installations on that machine:
1. SuSE 9.0 (kernel 2.4.21 smp)
Using one partition on the first HD for the root partition,
the second HD for oracle
first half of the RAID device for the data storage
all the filesystems use ext3 (with defaults option in fstab)
2. SuSE 9.1 (kernel 2.6.4 smp)
Using another partition on the first HD for the root partition,
the third HD for oracle
the second partition on the RAID device for the data storage
all the filesystems use reiserfs (with acl,user_xattr,notail options in fstab)
Only the swap partition (on the first HD) is shared between both
installations.
The application I'm benchmarking forks 4 childs who read a big file (all read
the same 1Gb file in these tests), they analyse it, and write the result eacg
child to a different file (also on the same partition) while they analyse,
when one of this childs finishes (they take some seconds when running in
solitaire) the parent forks another one, until it does 500 analysis.
The problem I have is that the application takes twice as long to finish on
2.6.4 than on 2.4.21.
Looking at top, I can see that on the 2.4.21 system, the 4 processes that
actually do the work at once are getting nearly 90% of the cpu, and the whole
500 tasks finishes in around 53 minutes.
But on the 2.6.4 system, the 4 processes rarely go over 10% of cpu usage each,
while io-wait is usually between 80% and 98%. The result is that the same
operation now takes over 2 hours (2h 10m). I know that previously "io-wait"
was just inside "idle", but why then the 2.4 kernel had idle so low compared
to io-wait ?
Would that mean I have some problem with the kernel? I've tried to use noapic
and acpi=off but got the same results (once I found a hyperthreading machine
were cpu usage was constantly at 25% unless I used noapic, so I first thought
this was related). I also tried to use the ext3 data partition instead of the
reiserfs one on suse 9.1 (just to check that it wasn't a filesystem issue),
and still the same result (over 2 hours to complete the task).
The cpu usage was something like this on 2.4 while the 4 processes were
running:
load average: 4.62, 2.46, 0.98
Cpu(s): 0.2% user, 73.7% system, 0.2% nice, 25.9% idle
with some oscillation of idle and system (but idle almost never got over 30%)
While on 2.6 it was basically like this all the time:
load average: 13.66, 9.75, 5.01
Cpu(s): 0.3% us, 6.2% sy, 0.0% ni, 3.0% id, 90.0% wa, 0.1% hi, 0.3% si
I've seen other people getting slower performance on 2.6 than on 2.4 under
special situations with some raids, so I wondered if it's because of some
kernel/raid issue.
There have been other people with similar problems, and this case took my
attention:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-11/417index.html
(althought it doesn't seem to be exactly the same problem)
I've done similar tests than those that were requested to the person who
had that problem, both on 2.4 and 2.6 systems, I can provide the actual
results if requested, but I'll just briefly comment them for brevity.
Note that from now on, I'm just doing some tests that are different from those
for which I need some help for, but I hope they help to identify the problem.
On 2.6.4 with reiserfs, a "dd if=/dev/zero of=x bs=1M count=2048 ; sync" took
23 seconds, and io/bo was mostly around 97000 blocks/s, the same on 2.6.4
with the ext3 data partition got around 85000 (and took 29 seconds). The same
test on 2.4.21 (ext3) took 30 seconds, but curiously, but io/bo wasn't
anything like stable. Sometimes it was over 99000 and sometimes it only got
24 blocks out, just to get 68000 on the next second (it always did that
between running dd and sync, but I'm talking about doing it while dd is
running).
On 2.6.4 with reiserfs, running "dd if=x of=/dev/null bs=1M count=2048 ; sync"
takes 1m 26s, which is basically, the same than with ext3, both taking around
24000 blocks/s on io/bi .
On this case, 2.4.21 (ext3) did much better than both, taking just 48s, and
with an average of 43000 blocks/s on io/bi . This is nearly 1.8 times the
speed of 2.6.4 on this RAID.
Finally, I tried benchmarking "dd if=x of=y bs=1M count=2048 ; sync"
and the result was: on 2.6.4, it took 1m57s, and in 2.4.21, just 1m30s on the
same reiserfs partition.
In this case, the main difference I noticed on vmstat was that on 2.4.21, it
seemed it was either reading or writing, with vmstat data like this:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 101740 13872 18328 892068 0 0 27648 28 1076 219 0 15 84 0
2 1 101740 12492 18348 893476 0 0 9732 81540 1201 235 0 8 92 0
0 1 101740 12956 18356 894964 0 0 10240 65296 1175 224 0 10 89 0
1 0 101740 12944 18352 895116 0 0 32256 24 1087 231 0 18 82 0
1 0 101740 12972 18352 895116 0 0 32768 0 1084 229 0 18 82 0
0 1 101740 13020 17928 895412 0 0 31744 0 1080 222 0 18 82 0
0 1 101740 13348 17932 894772 0 0 25092 24 1070 228 0 14 86 0
0 2 101740 13516 17948 894528 0 0 9216 86992 1198 250 0 8 92 0
1 0 101740 12744 17956 895752 0 0 14848 54060 1164 206 0 13 87 0
1 0 101740 12636 17956 895624 0 0 33792 24 1089 242 0 18 82 0
1 0 101740 13396 17956 894476 0 0 34304 0 1088 231 0 19 81 0
0 1 101740 13108 17932 894384 0 0 34160 0 1090 261 0 19 81 0
1 0 101740 13460 17932 893724 0 0 28308 24 1077 257 0 15 84 0
0 2 101740 12552 17948 899252 0 0 5632 95664 1212 195 0 6 94 0
1 0 101740 13484 17956 898180 0 0 13824 56456 1164 204 0 11 89 0
while on 2.6.4, it was more stable, writing and reading nearly in parallel:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 3 100064 6056 19260 936356 0 0 15232 24576 1254 576 0 6 51 42
0 3 100064 5468 19272 936752 0 0 16000 20480 1263 612 0 7 50 44
0 2 100064 5140 19288 938028 0 0 16640 2956 1253 1094 0 8 59 32
1 1 100064 4564 19452 938272 0 0 20096 24280 1213 728 0 8 54 38
0 2 100064 5844 19464 936900 0 0 13696 32716 1286 530 0 6 63 30
1 1 100064 5844 19476 937500 0 0 11904 21704 1310 841 0 6 62 32
0 2 100064 5964 19620 937016 0 0 20868 24444 1245 877 0 8 51 40
0 2 100064 5396 19636 937408 0 0 16000 16384 1223 572 0 7 51 43
0 2 100064 5012 19652 938140 0 0 15488 9852 1263 887 0 6 59 35
0 2 100064 5716 19768 937208 0 0 19712 24512 1238 880 0 8 55 37
0 2 100064 4692 19788 938072 0 0 17024 20424 1243 619 0 7 67 27
0 3 100064 4628 19908 938224 0 0 14340 25860 1287 1372 0 9 59 32
I guess that for desktop systems, the 2.6.4 behaviour is better, but for a
server, I'd prefer to improve long-term performance, so, any idea of what can
I try in order to improve it?
In all the tests, "cron" and "at" were stopped.
My next test will be to do the "dd tests" on one of the internal hard disks
and use it for the data instead of the external raid. If someone wants the
result of those, please ask. I won't send them to the list unless someone
asks for them or I find something of relevance.
Btw, our application doesn't use threads (just processes), and oracle is
running, but it's not a critical part of the system (just handles a couple of
queries for each working process on a very small table).
If really needed, I could try testing the latest kernel, or some patches, but
since I've had a look at changelogs and didn't find anything related to this
mentioned and I'd need a "certified kernel" for Oracle, I'd prefer to find a
solution that works with the unmodified SuSE's kernel.
Thanks for any help,
Even a "I've thought about it but cannot find a solution" will be welcomed.
Greetings,
--
Antonio Larrosa
Tecnologias Digitales Audiovisuales, S.L.
http://www.tedial.com
Parque Tecnologico de Andalucia . Málaga (Spain)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iowait problems on 2.6, not on 2.4
2004-05-26 15:43 iowait problems on 2.6, not on 2.4 Antonio Larrosa Jiménez
@ 2004-05-27 3:52 ` Andrew Morton
2004-05-28 13:16 ` Antonio Larrosa Jiménez
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2004-05-27 3:52 UTC (permalink / raw)
To: Antonio Larrosa Jiménez; +Cc: linux-kernel
Antonio Larrosa Jiménez <antlarr@tedial.com> wrote:
>
> My next test will be to do the "dd tests" on one of the internal hard disks
> and use it for the data instead of the external raid.
That's a logical next step. The reduced read bandwith on the raid array
should be fixed up before we can go any further. I don't recall any
reports of qlogic fc-scsi performance regressions though.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iowait problems on 2.6, not on 2.4
2004-05-27 3:52 ` Andrew Morton
@ 2004-05-28 13:16 ` Antonio Larrosa Jiménez
2004-05-28 22:45 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Antonio Larrosa Jiménez @ 2004-05-28 13:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
On Thursday 27 May 2004 05:52, you wrote:
> Antonio Larrosa Jiménez <antlarr@tedial.com> wrote:
> > My next test will be to do the "dd tests" on one of the internal hard
> > disks and use it for the data instead of the external raid.
>
> That's a logical next step. The reduced read bandwith on the raid array
> should be fixed up before we can go any further. I don't recall any
> reports of qlogic fc-scsi performance regressions though.
Ok, let's analyze that first.
The dd tests gave the following results:
ext3 on the internal scsi HD:
2.4.21:
writing : 1m14s
reading : 1m2s
reading+writing : 2m16s
2.6.4:
writing : 1m19s
reading : 59s
reading+writing : 2m24s
reiserfs on the internal scsi HD:
2.4.21:
writing : 1m15s
reading : 1m1s
reading+writing : 2m22s
2.6.4:
writing : 1m19s
reading : 1m
reading+writing : 2m25s
ext3 on the raid using qlogic fc-scsi:
2.4.21:
writing : 30s
reading : 51s
reading+writing : 1m29s
2.6.4:
writing : 28s
reading : 1m26s
reading+writing : 2m19s
reiserfs on the raid using qlogic fc-scsi:
2.4.21:
writing : 37s
reading : 52s
reading+writing : 1m37s
2.6.4:
writing : 25s
reading : 1m27s
reading+writing : 2m3s
All the tests were made 3 times, and the average taken. In the cases where
there was too much variance, I repeated the tests some more times.
All the tests used 2Gb reads/writes (. I tried to make 8Gb reads/writes too,
but they got up to a minute variance (maybe the HD slowed itself down due to
temperature issues sometimes? I really don't know why this happened, but in
any case, I couldn't make reliable tests with files of that size).
So basically, there's no difference between 2.4.21 and 2.6.4 when using the
internal HD, but 2.6.4 is much slower when using the raid.
What I found strange is that writing to that raid is a bit faster on 2.6.4
while reading is much slower, which I suppose is what makes the difference.
So yes, I suppose there's a regression on the qlogic fc-scsi module.
Btw, the tests I timed were:
count=2048
write() { dd if=/dev/zero of=x bs=1M count=$count ; sync }
read() { dd if=x of=/dev/null bs=1M count=$count }
readwrite() { dd if=x of=y bs=1M count=$count ; sync }
In the case of read, I did the sync just before and after the timing, but
didn't include the sync inside the timed test.
As I said in my other mail, I can test any patch if needed.
Greetings and thanks for any help
--
Antonio Larrosa
Tecnologias Digitales Audiovisuales, S.L.
http://www.tedial.com
Parque Tecnologico de Andalucia . Málaga (Spain)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iowait problems on 2.6, not on 2.4
2004-05-28 13:16 ` Antonio Larrosa Jiménez
@ 2004-05-28 22:45 ` Andrew Morton
2004-05-30 0:46 ` Doug Ledford
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2004-05-28 22:45 UTC (permalink / raw)
To: Antonio Larrosa Jiménez; +Cc: linux-kernel, linux-scsi
Antonio Larrosa Jiménez <antlarr@tedial.com> wrote:
>
> On Thursday 27 May 2004 05:52, you wrote:
> > Antonio Larrosa Jiménez <antlarr@tedial.com> wrote:
> > > My next test will be to do the "dd tests" on one of the internal hard
> > > disks and use it for the data instead of the external raid.
> >
> > That's a logical next step. The reduced read bandwith on the raid array
> > should be fixed up before we can go any further. I don't recall any
> > reports of qlogic fc-scsi performance regressions though.
>
> Ok, let's analyze that first.
>
> The dd tests gave the following results:
Let me cc linux-scsi.
Guys: poke. Does anyone know why this:
The machine is a 4 cpu Pentium III (Cascades) system with four SCSI
SEAGATE ST336704 hard disks connected to an Adaptec AIC-7899P U160/m, and
a external RAID connected to a QLA2200/QLA2xxx FC-SCSI Host Bus Adapter.
The machine has 1Gb RAM.
got all slow at reads?
> ext3 on the internal scsi HD:
> 2.4.21:
> writing : 1m14s
> reading : 1m2s
> reading+writing : 2m16s
> 2.6.4:
> writing : 1m19s
> reading : 59s
> reading+writing : 2m24s
>
> reiserfs on the internal scsi HD:
> 2.4.21:
> writing : 1m15s
> reading : 1m1s
> reading+writing : 2m22s
> 2.6.4:
> writing : 1m19s
> reading : 1m
> reading+writing : 2m25s
>
> ext3 on the raid using qlogic fc-scsi:
> 2.4.21:
> writing : 30s
> reading : 51s
> reading+writing : 1m29s
> 2.6.4:
> writing : 28s
> reading : 1m26s
> reading+writing : 2m19s
>
> reiserfs on the raid using qlogic fc-scsi:
> 2.4.21:
> writing : 37s
> reading : 52s
> reading+writing : 1m37s
> 2.6.4:
> writing : 25s
> reading : 1m27s
> reading+writing : 2m3s
>
> All the tests were made 3 times, and the average taken. In the cases where
> there was too much variance, I repeated the tests some more times.
>
> All the tests used 2Gb reads/writes (. I tried to make 8Gb reads/writes too,
> but they got up to a minute variance (maybe the HD slowed itself down due to
> temperature issues sometimes? I really don't know why this happened, but in
> any case, I couldn't make reliable tests with files of that size).
>
> So basically, there's no difference between 2.4.21 and 2.6.4 when using the
> internal HD, but 2.6.4 is much slower when using the raid.
> What I found strange is that writing to that raid is a bit faster on 2.6.4
> while reading is much slower, which I suppose is what makes the difference.
>
> So yes, I suppose there's a regression on the qlogic fc-scsi module.
>
> Btw, the tests I timed were:
>
> count=2048
> write() { dd if=/dev/zero of=x bs=1M count=$count ; sync }
> read() { dd if=x of=/dev/null bs=1M count=$count }
> readwrite() { dd if=x of=y bs=1M count=$count ; sync }
>
> In the case of read, I did the sync just before and after the timing, but
> didn't include the sync inside the timed test.
>
> As I said in my other mail, I can test any patch if needed.
>
> Greetings and thanks for any help
>
> --
> Antonio Larrosa
> Tecnologias Digitales Audiovisuales, S.L.
> http://www.tedial.com
> Parque Tecnologico de Andalucia . Málaga (Spain)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iowait problems on 2.6, not on 2.4
2004-05-28 22:45 ` Andrew Morton
@ 2004-05-30 0:46 ` Doug Ledford
2004-05-30 4:52 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Doug Ledford @ 2004-05-30 0:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Antonio Larrosa Jiménez, linux-kernel, linux-scsi mailing list
On Fri, 2004-05-28 at 18:45, Andrew Morton wrote:
> Antonio Larrosa Jiménez <antlarr@tedial.com> wrote:
> >
> > On Thursday 27 May 2004 05:52, you wrote:
> > > Antonio Larrosa Jiménez <antlarr@tedial.com> wrote:
> > > > My next test will be to do the "dd tests" on one of the internal hard
> > > > disks and use it for the data instead of the external raid.
> > >
> > > That's a logical next step. The reduced read bandwith on the raid array
> > > should be fixed up before we can go any further. I don't recall any
> > > reports of qlogic fc-scsi performance regressions though.
> >
> > Ok, let's analyze that first.
> >
> > The dd tests gave the following results:
>
> Let me cc linux-scsi.
>
> Guys: poke. Does anyone know why this:
>
> The machine is a 4 cpu Pentium III (Cascades) system with four SCSI
> SEAGATE ST336704 hard disks connected to an Adaptec AIC-7899P U160/m, and
> a external RAID connected to a QLA2200/QLA2xxx FC-SCSI Host Bus Adapter.
> The machine has 1Gb RAM.
>
> got all slow at reads?
My first guess would be read ahead values. Try poking around with
those. When using a hard disk vs. a raid array, it's easier to trigger
firmware read ahead since all reads go to a single physical device and
that in turn compensates for lack of reasonable OS read ahead. On a
raid array, depending on the vendor, there may be next to no firmware
initiated read ahead and that can drastically reduce read performance on
sequential reads.
> > So yes, I suppose there's a regression on the qlogic fc-scsi module.
A regression, yes. In the qlogic-fc driver? I doubt it. Performance
tuning is what I think this basically boils down to.
But, I could be wrong. Give it a try and see what happens. In the 2.4
kernels I would tell you to tweak /proc/sys/vm/{min,max}-readahead,
don't know if those two knobs still exist in 2.6 and if they have the
same effect. Andrew?
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iowait problems on 2.6, not on 2.4
2004-05-30 0:46 ` Doug Ledford
@ 2004-05-30 4:52 ` Andrew Morton
2004-05-31 11:24 ` Antonio Larrosa Jiménez
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2004-05-30 4:52 UTC (permalink / raw)
To: Doug Ledford; +Cc: antlarr, linux-kernel, linux-scsi
Doug Ledford <dledford@redhat.com> wrote:
>
> But, I could be wrong. Give it a try and see what happens. In the 2.4
> kernels I would tell you to tweak /proc/sys/vm/{min,max}-readahead,
> don't know if those two knobs still exist in 2.6 and if they have the
> same effect. Andrew?
blockdev --setra N /dev/sda (N is in 512 byte units)
echo N > /sys/block/sda/queue/read_ahead_kb (N is in kilobytes)
Also there was breakage (recently fixed) in which /dev/sdaN's readahead
setting was being inherited from the blockdev on which /dev resides. But
reading regular files is OK.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iowait problems on 2.6, not on 2.4
2004-05-30 4:52 ` Andrew Morton
@ 2004-05-31 11:24 ` Antonio Larrosa Jiménez
0 siblings, 0 replies; 7+ messages in thread
From: Antonio Larrosa Jiménez @ 2004-05-31 11:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: Doug Ledford, linux-kernel, linux-scsi
On Sunday 30 May 2004 06:52, Andrew Morton wrote:
> Also there was breakage (recently fixed) in which /dev/sdaN's readahead
> setting was being inherited from the blockdev on which /dev resides. But
> reading regular files is OK.
>
I don't think I have that fix applied here.
I couldn't find any read_ahead* nor *ra_* under /sys, but blockdev works.
Btw, by default it was using a read ahead of 128 Kb (blockdev returned 256)
I re-run the dd read tests on reiserfs on the qla device (the one that took
1m27s before). These are the results on 2.6.4 :
blockdev --setra 1024 /dev/sde : 28s
(but this is the dd test in a most probably unfragmented file, so it's not a
value that can be used in real life)
blockdev --setra 512 /dev/sde : 51s (approx. what 2.4.21 took by default)
blockdev --setra 256 /dev/sde : 1m27s
blockdev --setra 128 /dev/sde : 2m9s
blockdev --setra 64 /dev/sde : 2m55s
blockdev --setra 0 /dev/sde : 13m33s
On 2.4.21:
By default, /proc/sys/vm/min-readahead is 3 and /proc/sys/vm/max-readahead is
127. I suppose those are Kb, since it didn't allow me to set a read ahead
value over 255 when using blockdev.
blockdev --setra 1024 /dev/sde : n/a (BLKRASET: Invalid argument)
blockdev --setra 512 /dev/sde : n/a (BLKRASET: Invalid argument)
blockdev --setra 256 /dev/sde : n/a (BLKRASET: Invalid argument)
blockdev --setra 255 /dev/sde : 52s
blockdev --setra 0 /dev/sde : 54s
As this was unexpected (it seems blockdev is a NOP on 2.4.21 ?), I tried
modifying /proc/sys/vm/min-readahead and max.
I set both to 512 (which should match blockdev --setra 1024 on 2.6.4) and got
25s reading
Setting both to 128kb (blockdev --setra 256), it took 49s.
Setting both to 0 (disabling read-ahead), it took 3m29s (compare that to
13m33s on 2.6.4, so definitely, there's some problem here).
Anything else I can do to help to find the problem?
--
Antonio Larrosa
Tecnologias Digitales Audiovisuales, S.L.
http://www.tedial.com
Parque Tecnologico de Andalucia . Málaga (Spain)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-05-31 11:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-26 15:43 iowait problems on 2.6, not on 2.4 Antonio Larrosa Jiménez
2004-05-27 3:52 ` Andrew Morton
2004-05-28 13:16 ` Antonio Larrosa Jiménez
2004-05-28 22:45 ` Andrew Morton
2004-05-30 0:46 ` Doug Ledford
2004-05-30 4:52 ` Andrew Morton
2004-05-31 11:24 ` Antonio Larrosa Jiménez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).