LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* HPL Benchmark performance degradation of kernel 2.6.24.3 vs 2.6.23.14
@ 2008-03-02  5:34 Allan Menezes
  2008-03-02 18:48 ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Allan Menezes @ 2008-03-02  5:34 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

Hi,
    I have a five node intel Q6600 quad core cluster and I benchmarked 
it with open source open mpi software using fc8 and it's supplied 
kernels recompiled and that of kernel.org with kernel 2.6.23.14 and 
2.6.24.3.
With GotoBlas v 1.24 and open mpi beta both cases (v 1.3a) for kernels 
2.6.23.14 with web100 i get 158GFlops.
But when i recompile with web100 for kernel 2.6.24 / without web100 and 
having 6gig DDR2800MHz ram on each node i get only 28GFLOPS AND 
22GFLOPS  for 5 nodes whereas with or without web 100 for kernel 
2.6.23.14 i get 156-8 GfLOPS. wITH OR WITHOUT web 100 i get for kernel 
2.6.24.3 22- 28 Gflops for 5 nodes.!
Why is there a performance drop in kernel 2.6.24.3 All else hardware is 
the same!
For inter node communication i use three pci express gig eth cards ( 2 
intel and one syskonnect ) per node and using nptcp of netpipe their 
performance of intel and syskonnect cards in both kernels measured point 
to point is 880MBPS approx for all three cards with measured using 
netpipe for tcp with kernel 2.6.24.3 and 2.6.23.14 . I am also using 
three switches gigabit with high bisection b/w for these eth cards 
(copper) with 3 different subnets
Yet I am getting a substantial performance drop keeping the hardware and 
openmpi and hpl and gotoblas same. Can some one help me figure out why?
Please find attached my kernel's .config
Cheers,
Allan Menezes


[-- Attachment #2: .config --]
[-- Type: application/xml, Size: 63826 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: HPL Benchmark performance degradation of kernel 2.6.24.3 vs 2.6.23.14
  2008-03-02  5:34 HPL Benchmark performance degradation of kernel 2.6.24.3 vs 2.6.23.14 Allan Menezes
@ 2008-03-02 18:48 ` Eric Dumazet
  2008-03-02 19:02   ` Peter Zijlstra
  2008-03-02 20:07   ` Roger Heflin
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Dumazet @ 2008-03-02 18:48 UTC (permalink / raw)
  To: Allan Menezes; +Cc: linux-kernel

Allan Menezes a écrit :
> Hi,
>    I have a five node intel Q6600 quad core cluster and I benchmarked it 
> with open source open mpi software using fc8 and it's supplied kernels 
> recompiled and that of kernel.org with kernel 2.6.23.14 and 2.6.24.3.
> With GotoBlas v 1.24 and open mpi beta both cases (v 1.3a) for kernels 
> 2.6.23.14 with web100 i get 158GFlops.
> But when i recompile with web100 for kernel 2.6.24 / without web100 and 
> having 6gig DDR2800MHz ram on each node i get only 28GFLOPS AND 
> 22GFLOPS  for 5 nodes whereas with or without web 100 for kernel 
> 2.6.23.14 i get 156-8 GfLOPS. wITH OR WITHOUT web 100 i get for kernel 
> 2.6.24.3 22- 28 Gflops for 5 nodes.!
> Why is there a performance drop in kernel 2.6.24.3 All else hardware is 
> the same!
> For inter node communication i use three pci express gig eth cards ( 2 
> intel and one syskonnect ) per node and using nptcp of netpipe their 
> performance of intel and syskonnect cards in both kernels measured point 
> to point is 880MBPS approx for all three cards with measured using 
> netpipe for tcp with kernel 2.6.24.3 and 2.6.23.14 . I am also using 
> three switches gigabit with high bisection b/w for these eth cards 
> (copper) with 3 different subnets
> Yet I am getting a substantial performance drop keeping the hardware and 
> openmpi and hpl and gotoblas same. Can some one help me figure out why?
> Please find attached my kernel's .config

Hi Allan

Your setup is quite complex, so you should give more information if you want 
some help here.

Is this benchmark stressing disk IO, task scheduler, network stack, memory, 
swap... hard to tell in fact.

Examining your .config, I would point out CONFIG_SLUB_DEBUG=y
You really should disable this expensive option.
(and possibly use CONFIG_SLAB instead of CONFIG_SLUB)

You probably should try to use oprofile tool, because its results are probably 
a good way to give hints about bad configuration, or kernel regressions.

opcontrol --vmfile=/boot/vmlinux-2.6.24.3 --start
<benchmarking>
opreport -l /boot/vmlinux-2.6.24.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: HPL Benchmark performance degradation of kernel 2.6.24.3 vs 2.6.23.14
  2008-03-02 18:48 ` Eric Dumazet
@ 2008-03-02 19:02   ` Peter Zijlstra
  2008-03-02 20:07   ` Roger Heflin
  1 sibling, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2008-03-02 19:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Allan Menezes, linux-kernel


On Sun, 2008-03-02 at 19:48 +0100, Eric Dumazet wrote:

> Examining your .config, I would point out CONFIG_SLUB_DEBUG=y
> You really should disable this expensive option.

CONFIG_SLUB_DEBUG_ON is the expensive one.

> (and possibly use CONFIG_SLAB instead of CONFIG_SLUB)

That is a good thing to test indeed.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: HPL Benchmark performance degradation of kernel 2.6.24.3 vs 2.6.23.14
  2008-03-02 18:48 ` Eric Dumazet
  2008-03-02 19:02   ` Peter Zijlstra
@ 2008-03-02 20:07   ` Roger Heflin
  1 sibling, 0 replies; 4+ messages in thread
From: Roger Heflin @ 2008-03-02 20:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Allan Menezes, linux-kernel

Eric Dumazet wrote:
> Allan Menezes a écrit :
>> Hi,
>>    I have a five node intel Q6600 quad core cluster and I benchmarked 
>> it with open source open mpi software using fc8 and it's supplied 
>> kernels recompiled and that of kernel.org with kernel 2.6.23.14 and 
>> 2.6.24.3.
>> With GotoBlas v 1.24 and open mpi beta both cases (v 1.3a) for kernels 
>> 2.6.23.14 with web100 i get 158GFlops.
>> But when i recompile with web100 for kernel 2.6.24 / without web100 
>> and having 6gig DDR2800MHz ram on each node i get only 28GFLOPS AND 
>> 22GFLOPS  for 5 nodes whereas with or without web 100 for kernel 
>> 2.6.23.14 i get 156-8 GfLOPS. wITH OR WITHOUT web 100 i get for kernel 
>> 2.6.24.3 22- 28 Gflops for 5 nodes.!
>> Why is there a performance drop in kernel 2.6.24.3 All else hardware 
>> is the same!
>> For inter node communication i use three pci express gig eth cards ( 2 
>> intel and one syskonnect ) per node and using nptcp of netpipe their 
>> performance of intel and syskonnect cards in both kernels measured 
>> point to point is 880MBPS approx for all three cards with measured 
>> using netpipe for tcp with kernel 2.6.24.3 and 2.6.23.14 . I am also 
>> using three switches gigabit with high bisection b/w for these eth 
>> cards (copper) with 3 different subnets
>> Yet I am getting a substantial performance drop keeping the hardware 
>> and openmpi and hpl and gotoblas same. Can some one help me figure out 
>> why?
>> Please find attached my kernel's .config
> 
> Hi Allan
> 
> Your setup is quite complex, so you should give more information if you 
> want some help here.
> 
> Is this benchmark stressing disk IO, task scheduler, network stack, 
> memory, swap... hard to tell in fact.
> 
> Examining your .config, I would point out CONFIG_SLUB_DEBUG=y
> You really should disable this expensive option.
> (and possibly use CONFIG_SLAB instead of CONFIG_SLUB)
> 
> You probably should try to use oprofile tool, because its results are 
> probably a good way to give hints about bad configuration, or kernel 
> regressions.
> 
> opcontrol --vmfile=/boot/vmlinux-2.6.24.3 --start
> <benchmarking>
> opreport -l /boot/vmlinux-2.6.24.3

I am not the original reporter, to get good numbers HPL tests cpu and mostly
networking speed (if more than one machine is being used), if local it
test whichever interprocess communication is being used.

It is floating point with communications to sync the different processes
together.

Generally if it is abnormally slow, you either have a errant process
on a machine, a problem with one machine, or a problem with networking
latency, or possibly a problem with some other latency.

I have never seen the scheduler make a big difference (unless the scheduler is 
really really broken), and it if configured for speed it does little or no swap 
(but I have seem machines that were tuned to page out early cause slight slow 
downs in the numbers when things should have nicely fit in memory), and it does 
little or no disk IO in the timed speed calculation areas.

It is pretty much all network latency and floating point speed.


                               Roger

                             Roger


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-03-02 20:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-02  5:34 HPL Benchmark performance degradation of kernel 2.6.24.3 vs 2.6.23.14 Allan Menezes
2008-03-02 18:48 ` Eric Dumazet
2008-03-02 19:02   ` Peter Zijlstra
2008-03-02 20:07   ` Roger Heflin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).