LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
@ 2008-01-09  9:35 Zhang, Yanmin
  2008-01-09 11:48 ` David Miller
  2008-01-11  9:30 ` Zhang, Yanmin
  0 siblings, 2 replies; 12+ messages in thread
From: Zhang, Yanmin @ 2008-01-09  9:35 UTC (permalink / raw)
  To: LKML

The regression is:
1)stoakley with 2 qual-core processors: 11%;
2)Tulsa with 4 dual-core(+hyperThread) processors:13%;

The test command is:
#sudo taskset -c 7 ./netserver
#sudo taskset -c 0 ./netperf -t TCP_RR -l 60 -H 127.0.0.1 -i 50,3 -I 99,5 -- -r 1,1

As a matter of fact, 2.6.23 has about 6% regression and 2.6.24-rc's
regression is between 16%~11%.

I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
but the bisected kernel wasn't stable and went crazy.

I tried both CONFIG_SLUB=y and CONFIG_SLAB=y to make sure SLUB isn't the
culprit.

The oprofile data of CONFIG_SLAB=y. Top cpu utilizations are:
1) 2.6.22 
2067379   9.4888  vmlinux                  schedule
1873604   8.5994  vmlinux                  mwait_idle
1568131   7.1974  vmlinux                  resched_task
1066976   4.8972  vmlinux                  tcp_v4_rcv
986641    4.5285  vmlinux                  tcp_rcv_established
979518    4.4958  vmlinux                  find_busiest_group
767069    3.5207  vmlinux                  sock_def_readable
736808    3.3818  vmlinux                  tcp_sendmsg
595889    2.7350  vmlinux                  task_rq_lock
557193    2.5574  vmlinux                  tcp_ack
470570    2.1598  vmlinux                  __mod_timer
392220    1.8002  vmlinux                  __alloc_skb
358106    1.6436  vmlinux                  skb_release_data
313372    1.4383  vmlinux                  skb_clone

2) 2.6.24-rc7
2668426  12.4497  vmlinux                  vmlinux                  schedule
955698    4.4589  vmlinux                  vmlinux                  skb_release_data
836311    3.9018  vmlinux                  vmlinux                  tcp_v4_rcv
762398    3.5570  vmlinux                  vmlinux                  skb_release_all
728907    3.4007  vmlinux                  vmlinux                  task_rq_lock
705037    3.2894  vmlinux                  vmlinux                  __wake_up
694206    3.2388  vmlinux                  vmlinux                  __mod_timer
617616    2.8815  vmlinux                  vmlinux                  mwait_idle

It looks like tcp in 2.6.22 sends more packets, but frees far less skb than 2.6.24-rc6.
tcp_rcv_established in 2.6.22 is highlighted on cpu utilization.

-yanmin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-09  9:35 Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22 Zhang, Yanmin
@ 2008-01-09 11:48 ` David Miller
  2008-01-11  9:30 ` Zhang, Yanmin
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2008-01-09 11:48 UTC (permalink / raw)
  To: yanmin_zhang; +Cc: linux-kernel


Nobody is going to look directly into networking regressions on lkml,
please at least CC: netdev@vger.kernel.org for networking issues.

Thank you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-09  9:35 Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22 Zhang, Yanmin
  2008-01-09 11:48 ` David Miller
@ 2008-01-11  9:30 ` Zhang, Yanmin
  2008-01-11 17:56   ` Rick Jones
  2008-01-14  8:44   ` Ilpo Järvinen
  1 sibling, 2 replies; 12+ messages in thread
From: Zhang, Yanmin @ 2008-01-11  9:30 UTC (permalink / raw)
  To: LKML; +Cc: netdev

On Wed, 2008-01-09 at 17:35 +0800, Zhang, Yanmin wrote: 
> The regression is:
> 1)stoakley with 2 qual-core processors: 11%;
> 2)Tulsa with 4 dual-core(+hyperThread) processors:13%;
I have new update on this issue and also cc to netdev maillist.
Thank David Miller for pointing me the netdev maillist.

> 
> The test command is:
> #sudo taskset -c 7 ./netserver
> #sudo taskset -c 0 ./netperf -t TCP_RR -l 60 -H 127.0.0.1 -i 50,3 -I 99,5 -- -r 1,1
> 
> As a matter of fact, 2.6.23 has about 6% regression and 2.6.24-rc's
> regression is between 16%~11%.
> 
> I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
> but the bisected kernel wasn't stable and went crazy.
> 
> I tried both CONFIG_SLUB=y and CONFIG_SLAB=y to make sure SLUB isn't the
> culprit.
> 
> The oprofile data of CONFIG_SLAB=y. Top cpu utilizations are:
> 1) 2.6.22 
> 2067379   9.4888  vmlinux                  schedule
> 1873604   8.5994  vmlinux                  mwait_idle
> 1568131   7.1974  vmlinux                  resched_task
> 1066976   4.8972  vmlinux                  tcp_v4_rcv
> 986641    4.5285  vmlinux                  tcp_rcv_established
> 979518    4.4958  vmlinux                  find_busiest_group
> 767069    3.5207  vmlinux                  sock_def_readable
> 736808    3.3818  vmlinux                  tcp_sendmsg
> 595889    2.7350  vmlinux                  task_rq_lock
> 557193    2.5574  vmlinux                  tcp_ack
> 470570    2.1598  vmlinux                  __mod_timer
> 392220    1.8002  vmlinux                  __alloc_skb
> 358106    1.6436  vmlinux                  skb_release_data
> 313372    1.4383  vmlinux                  skb_clone
> 
> 2) 2.6.24-rc7
> 2668426  12.4497  vmlinux                  vmlinux                  schedule
> 955698    4.4589  vmlinux                  vmlinux                  skb_release_data
> 836311    3.9018  vmlinux                  vmlinux                  tcp_v4_rcv
> 762398    3.5570  vmlinux                  vmlinux                  skb_release_all
> 728907    3.4007  vmlinux                  vmlinux                  task_rq_lock
> 705037    3.2894  vmlinux                  vmlinux                  __wake_up
> 694206    3.2388  vmlinux                  vmlinux                  __mod_timer
> 617616    2.8815  vmlinux                  vmlinux                  mwait_idle
> 
> It looks like tcp in 2.6.22 sends more packets, but frees far less skb than 2.6.24-rc6.
> tcp_rcv_established in 2.6.22 is highlighted on cpu utilization.
I instrumented kernel to capure the function call numbers.
1) 2.6.22
skb_release_data:50148649
tcp_ack:	 25062858	
tcp_transmit_skb:25063150	
tcp_v4_rcv:	 25063279	

2) 2.6.24-rc6
skb_release_data:21429692	
tcp_ack:	 10707710	
tcp_transmit_skb:10707866
tcp_v4_rcv:	 10707959		

The data doesn't show that 2.6.22 sends more packets while freeing far less skb than
2.6.24-rc6.

The data showed skb_release_data of kernel 2.6.22 is more than double of the one of
2.6.24-rc6. But netperf result just showed about 10% regression.

As the packet only has 1 byte, so I suspect 2.6.24-rc6 tries to merge packets after waiting for
a latency. 2.6.22 might haven't the wait latency or the latency is very small, so 2.6.22 almost
sends the packets immediately. I will check the source codes later.

-yanmin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-11  9:30 ` Zhang, Yanmin
@ 2008-01-11 17:56   ` Rick Jones
  2008-01-14  3:11     ` Zhang, Yanmin
  2008-01-14  8:44   ` Ilpo Järvinen
  1 sibling, 1 reply; 12+ messages in thread
From: Rick Jones @ 2008-01-11 17:56 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: LKML, netdev

>>The test command is:
>>#sudo taskset -c 7 ./netserver
>>#sudo taskset -c 0 ./netperf -t TCP_RR -l 60 -H 127.0.0.1 -i 50,3 -I 99,5 -- -r 1,1

A couple of comments/questions on the command lines:

*) netperf/netserver support CPU affinity within themselves with the 
global -T option to netperf.  Is the result with taskset much different? 
   The equivalent to the above would be to run netperf with:

./netperf -T 0,7 ...

The one possibly salient difference between the two is that when done 
within netperf, the initial process creation will take place wherever 
the scheduler wants it.

*) The -i option to set the confidence iteration count will silently cap 
the max at 30.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-11 17:56   ` Rick Jones
@ 2008-01-14  3:11     ` Zhang, Yanmin
  2008-01-14 17:46       ` Rick Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Zhang, Yanmin @ 2008-01-14  3:11 UTC (permalink / raw)
  To: Rick Jones; +Cc: LKML, netdev

On Fri, 2008-01-11 at 09:56 -0800, Rick Jones wrote:
> >>The test command is:
> >>#sudo taskset -c 7 ./netserver
> >>#sudo taskset -c 0 ./netperf -t TCP_RR -l 60 -H 127.0.0.1 -i 50,3 -I 99,5 -- -r 1,1
> 
> A couple of comments/questions on the command lines:
Thanks for your kind comments.

> 
> *) netperf/netserver support CPU affinity within themselves with the 
> global -T option to netperf.  Is the result with taskset much different? 
>    The equivalent to the above would be to run netperf with:
> 
> ./netperf -T 0,7 ..
I checked the source codes and didn't find this option.
I use netperf V2.3 (I found the number in the makefile).

> .
> 
> The one possibly salient difference between the two is that when done 
> within netperf, the initial process creation will take place wherever 
> the scheduler wants it.
> 
> *) The -i option to set the confidence iteration count will silently cap 
> the max at 30.
Indeed, you are right.

-yanmin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-11  9:30 ` Zhang, Yanmin
  2008-01-11 17:56   ` Rick Jones
@ 2008-01-14  8:44   ` Ilpo Järvinen
  2008-01-14  9:21     ` Ilpo Järvinen
  2008-01-14 10:53     ` Herbert Xu
  1 sibling, 2 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2008-01-14  8:44 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: LKML, Netdev

On Fri, 11 Jan 2008, Zhang, Yanmin wrote:

> On Wed, 2008-01-09 at 17:35 +0800, Zhang, Yanmin wrote: 
> > The regression is:
> > 1)stoakley with 2 qual-core processors: 11%;
> > 2)Tulsa with 4 dual-core(+hyperThread) processors:13%;
> I have new update on this issue and also cc to netdev maillist.
> Thank David Miller for pointing me the netdev maillist.
> 
> > 
> > The test command is:
> > #sudo taskset -c 7 ./netserver
> > #sudo taskset -c 0 ./netperf -t TCP_RR -l 60 -H 127.0.0.1 -i 50,3 -I 99,5 -- -r 1,1
> > 
> > As a matter of fact, 2.6.23 has about 6% regression and 2.6.24-rc's
> > regression is between 16%~11%.
> > 
> > I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
> > but the bisected kernel wasn't stable and went crazy.

TCP work between that is very much non-existing.

Using git-reset's to select a nearby merge point instead of default 
commit where bisection lands might be help in case the bisected kernel 
breaks.

Also, limiting bisection under a subsystem might reduce probability of 
brokeness (might at least be able to narrow it down quite a lot), e.g.

git bisect start net/


-- 
 i.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-14  8:44   ` Ilpo Järvinen
@ 2008-01-14  9:21     ` Ilpo Järvinen
  2008-01-14  9:38       ` Zhang, Yanmin
  2008-01-14 10:53     ` Herbert Xu
  1 sibling, 1 reply; 12+ messages in thread
From: Ilpo Järvinen @ 2008-01-14  9:21 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: LKML, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 628 bytes --]

On Mon, 14 Jan 2008, Ilpo Järvinen wrote:

> On Fri, 11 Jan 2008, Zhang, Yanmin wrote:
> 
> > On Wed, 2008-01-09 at 17:35 +0800, Zhang, Yanmin wrote: 
> > > 
> > > As a matter of fact, 2.6.23 has about 6% regression and 2.6.24-rc's
> > > regression is between 16%~11%.
> > > 
> > > I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
> > > but the bisected kernel wasn't stable and went crazy.
> 
> TCP work between that is very much non-existing.

I _really_ meant 2.6.22 - 2.6.23-rc1, not 2.6.24-rc1 in case you had a 
typo there which is not that uncommon while typing kernel versions... :-)

-- 
 i.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-14  9:21     ` Ilpo Järvinen
@ 2008-01-14  9:38       ` Zhang, Yanmin
  0 siblings, 0 replies; 12+ messages in thread
From: Zhang, Yanmin @ 2008-01-14  9:38 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: LKML, Netdev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 1064 bytes --]

On Mon, 2008-01-14 at 11:21 +0200, Ilpo Järvinen wrote:
> On Mon, 14 Jan 2008, Ilpo Järvinen wrote:
> 
> > On Fri, 11 Jan 2008, Zhang, Yanmin wrote:
> > 
> > > On Wed, 2008-01-09 at 17:35 +0800, Zhang, Yanmin wrote: 
> > > > 
> > > > As a matter of fact, 2.6.23 has about 6% regression and 2.6.24-rc's
> > > > regression is between 16%~11%.
> > > > 
> > > > I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
> > > > but the bisected kernel wasn't stable and went crazy.
> > 
> > TCP work between that is very much non-existing.
> 
> I _really_ meant 2.6.22 - 2.6.23-rc1, not 2.6.24-rc1 in case you had a 
> typo
I did bisect 2.6.22 - 2.6.23-rc1. I also tested it on the latest 2.6.24-rc.

>  there which is not that uncommon while typing kernel versions... :-)
Thanks. I will retry bisect and bind the server/client to the same logical processor, where
I hope the result is stable this time when bisecting.

Manual testing showed there is still same or more regression if I bind the
processes on the same cpu.


Thanks a lot!

-yanmin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-14  8:44   ` Ilpo Järvinen
  2008-01-14  9:21     ` Ilpo Järvinen
@ 2008-01-14 10:53     ` Herbert Xu
  2008-01-16  0:34       ` Zhang, Yanmin
  1 sibling, 1 reply; 12+ messages in thread
From: Herbert Xu @ 2008-01-14 10:53 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Zhang, Yanmin, LKML, Netdev

On Mon, Jan 14, 2008 at 08:44:40AM +0000, Ilpo Järvinen wrote:
>
> > > I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
> > > but the bisected kernel wasn't stable and went crazy.
> 
> TCP work between that is very much non-existing.

Make sure you haven't switched between SLAB/SLUB while testing this.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-14  3:11     ` Zhang, Yanmin
@ 2008-01-14 17:46       ` Rick Jones
  0 siblings, 0 replies; 12+ messages in thread
From: Rick Jones @ 2008-01-14 17:46 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: LKML, netdev

>>*) netperf/netserver support CPU affinity within themselves with the 
>>global -T option to netperf.  Is the result with taskset much different? 
>>   The equivalent to the above would be to run netperf with:
>>
>>./netperf -T 0,7 ..
> 
> I checked the source codes and didn't find this option.
> I use netperf V2.3 (I found the number in the makefile).

Indeed, that version pre-dates the -T option.  If you weren't already 
chasing a regression I'd suggest an upgrade to 2.4.mumble.  Once you are 
at a point where changing another variable won't muddle things you may 
want to consider upgrading.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-14 10:53     ` Herbert Xu
@ 2008-01-16  0:34       ` Zhang, Yanmin
  2008-01-16  7:15         ` Zhang, Yanmin
  0 siblings, 1 reply; 12+ messages in thread
From: Zhang, Yanmin @ 2008-01-16  0:34 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Ilpo Järvinen, LKML, Netdev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 539 bytes --]

On Mon, 2008-01-14 at 21:53 +1100, Herbert Xu wrote:
> On Mon, Jan 14, 2008 at 08:44:40AM +0000, Ilpo Järvinen wrote:
> >
> > > > I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
> > > > but the bisected kernel wasn't stable and went crazy.
> > 
> > TCP work between that is very much non-existing.
> 
> Make sure you haven't switched between SLAB/SLUB while testing this.
I can make sure. In addition, I tried both SLAB and SLUB and make sure the 
regression is still there if CONFIG_SLAB=y.

Thanks,
-yanmin


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
  2008-01-16  0:34       ` Zhang, Yanmin
@ 2008-01-16  7:15         ` Zhang, Yanmin
  0 siblings, 0 replies; 12+ messages in thread
From: Zhang, Yanmin @ 2008-01-16  7:15 UTC (permalink / raw)
  To: LKML; +Cc: Ilpo Järvinen, Netdev, Herbert Xu, Ingo Molnar

On Wed, 2008-01-16 at 08:34 +0800, Zhang, Yanmin wrote:
> On Mon, 2008-01-14 at 21:53 +1100, Herbert Xu wrote:
> > On Mon, Jan 14, 2008 at 08:44:40AM +0000, Ilpo Jrvinen wrote:
> > >
> > > > > I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
> > > > > but the bisected kernel wasn't stable and went crazy.
> > > 
> > > TCP work between that is very much non-existing.
> > 
> > Make sure you haven't switched between SLAB/SLUB while testing this.
> I can make sure. In addition, I tried both SLAB and SLUB and make sure the 
> regression is still there if CONFIG_SLAB=y.
I retried bisect between 2.6.22 and 2.6.23-rc1. This time, I enabled CONFIG_SLAB=y,
and deleted the warmup procedure in the testing scripts. In addition, bind the 2
processes on the same logical processor. The regression is about 20% which is larger
than the one when binding 2 processes to different core.

The new bisect reported cfs core patch causes it. The results of every step look
stable.

dd41f596cda0d7d6e4a8b139ffdfabcefdd46528 is first bad commit
commit dd41f596cda0d7d6e4a8b139ffdfabcefdd46528
Author: Ingo Molnar <mingo@elte.hu>
Date:   Mon Jul 9 18:51:59 2007 +0200

    sched: cfs core code
    
    apply the CFS core code.
    
    this change switches over the scheduler core to CFS's modular
    design and makes use of kernel/sched_fair/rt/idletask.c to implement
    Linux's scheduling policies.
    
    thanks to Andrew Morton and Thomas Gleixner for lots of detailed review
    feedback and for fixlets.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
    Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>


-yanmin



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-01-16  7:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-09  9:35 Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22 Zhang, Yanmin
2008-01-09 11:48 ` David Miller
2008-01-11  9:30 ` Zhang, Yanmin
2008-01-11 17:56   ` Rick Jones
2008-01-14  3:11     ` Zhang, Yanmin
2008-01-14 17:46       ` Rick Jones
2008-01-14  8:44   ` Ilpo Järvinen
2008-01-14  9:21     ` Ilpo Järvinen
2008-01-14  9:38       ` Zhang, Yanmin
2008-01-14 10:53     ` Herbert Xu
2008-01-16  0:34       ` Zhang, Yanmin
2008-01-16  7:15         ` Zhang, Yanmin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).