LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC PATCH 0/8]: uninline & uninline
@ 2008-02-20 13:35 Ilpo Järvinen
  0 siblings, 0 replies; 9+ messages in thread
From: Ilpo Järvinen @ 2008-02-20 13:35 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: David Miller, Arnaldo Carvalho de Melo

Hi all,

I run some lengthy tests to measure cost of inlines in headers under
include/, simple coverage calculations yields to 89% but most of the
failed compiles are due to preprocessor cutting the tested block away
anyway. Test setup: v2.6.24-mm1, make allyesconfig, 32-bit x86,
gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13). Because one inline was
tested (function uninlined) at a time, the actual benefits of removing
multiple inlines may well be below what the sum of those individually
is (especially when something calls __-func with equal name).

Ok, here's the top of the list (10000+ bytes):

-110805  869 f, 198 +, 111003 -, diff: -110805  skb_put 
-41525  2066 f, 3370 +, 44895 -, diff: -41525  IS_ERR 
-36290  42 f, 197 +, 36487 -, diff: -36290  cfi_build_cmd 
-35698  1234 f, 2391 +, 38089 -, diff: -35698  atomic_dec_and_test 
-28162  354 f, 3005 +, 31167 -, diff: -28162  skb_pull 
-23668  392 f, 104 +, 23772 -, diff: -23668  dev_alloc_skb 
-22212  415 f, 130 +, 22342 -, diff: -22212  __dev_alloc_skb 
-21593  356 f, 2418 +, 24011 -, diff: -21593  skb_push 
-19036  478 f, 259 +, 19295 -, diff: -19036  netif_wake_queue 
-18409  396 f, 6447 +, 24856 -, diff: -18409  __skb_pull 
-16420  187 f, 103 +, 16523 -, diff: -16420  dst_release 
-16025  13 f, 280 +, 16305 -, diff: -16025  cfi_send_gen_cmd 
-14925  486 f, 978 +, 15903 -, diff: -14925  add_timer 
-14896  199 f, 558 +, 15454 -, diff: -14896  sg_page 
-12870  36 f, 121 +, 12991 -, diff: -12870  le_key_k_type 
-12310  692 f, 7215 +, 19525 -, diff: -12310  signal_pending 
-11640  251 f, 118 +, 11758 -, diff: -11640  __skb_trim 
-11059  111 f, 293 +, 11352 -, diff: -11059  __nlmsg_put 
-10976  209 f, 123 +, 11099 -, diff: -10976  skb_trim 
-10344  125 f, 462 +, 10806 -, diff: -10344  pskb_may_pull 
-10061  300 f, 1163 +, 11224 -, diff: -10061  try_module_get 
-10008  75 f, 341 +, 10349 -, diff: -10008  nlmsg_put 

~250 are in 1000+ bytes category and ~440 in 500+. Full list
has some entries without number because given config doesn't
build them, and therefore nothing got uninlined, and the missing
entries consists solely of compile failures, available here:

  http://www.cs.helsinki.fi/u/ijjarvin/inlines/sorted

I made some patches to uninline couple of them (picked mostly
net related) to stir up some discussion, however, some of them
are not ready for inclusion as is (see patch descriptions).
The cases don't represent all top 8 cases because some of the
cases require a bit more analysis (e.g., config dependant,
comments about gcc optimizations).

The tools I used are available here except the site-specific
distribute machinery (in addition one needs pretty late
codiff from Arnaldo's toolset because there were some inline
related bugs fixed lately):

  http://www.cs.helsinki.fi/u/ijjarvin/inline-tools.git/

I'm planning to run similar tests also on inlines in headers that
are not under include/ but it requires minor modifications to
those tools.

--
 i.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/8]: uninline & uninline
  2008-02-23 19:58       ` Hua Zhong
@ 2008-02-23 21:02         ` Andi Kleen
  0 siblings, 0 replies; 9+ messages in thread
From: Andi Kleen @ 2008-02-23 21:02 UTC (permalink / raw)
  To: Hua Zhong
  Cc: 'Andrew Morton', 'Andi Kleen',
	'Ilpo Järvinen',
	netdev, linux-kernel, 'David Miller',
	'Arnaldo Carvalho de Melo'

> Is it possible to catch this automatically, like, by re-defining
> likely/unlikely to the raw form in specific file(s)?

Sure it would be possible to define a IN_VDSO symbol in all vdso
related files and then do that. Might be useful for other things
too. vdso has some very specific requirements.

-Andi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC PATCH 0/8]: uninline & uninline
  2008-02-23 18:55     ` Andrew Morton
@ 2008-02-23 19:58       ` Hua Zhong
  2008-02-23 21:02         ` Andi Kleen
  0 siblings, 1 reply; 9+ messages in thread
From: Hua Zhong @ 2008-02-23 19:58 UTC (permalink / raw)
  To: 'Andrew Morton', 'Andi Kleen'
  Cc: 'Ilpo Järvinen',
	netdev, linux-kernel, 'David Miller',
	'Arnaldo Carvalho de Melo'

> > Is there any reason they couldn't just be merged to mainline?
> >
> > I think it's a useful facility.
> 
> ummm, now why did we made that decision...  I think we decided that
> it's the sort of thing which one person can run once per few months
> and that will deliver its full value.  I can maintain it in -mm and
> we're happy - no need to add it to mainline.  No strong feelings
> either way really.

Apparently nobody has been doing it for a while. :-) Last time I did it it
was around the submission time and I actually patched it into mainline
kernel to do so. Not particularly hard to do, but sitting in mm-only does
make it a bit harder, and there are the vdso problem you just mentioned that
one has to fix for himself if it exists in mainline.

> It does have the downside that the kernel explodes if someone adds
> unlikely or likely to the vdso code and I need to occasionally hunt
> down new additions and revert them in that patch.  That makes it a
> bit of a maintenance burden.

Is it possible to catch this automatically, like, by re-defining
likely/unlikely to the raw form in specific file(s)?

Hua



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/8]: uninline & uninline
  2008-02-23 13:15   ` Andi Kleen
  2008-02-23 18:06     ` Ilpo Järvinen
@ 2008-02-23 18:55     ` Andrew Morton
  2008-02-23 19:58       ` Hua Zhong
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2008-02-23 18:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ilpo Järvinen, netdev, linux-kernel, David Miller,
	Arnaldo Carvalho de Melo

On Sat, 23 Feb 2008 14:15:06 +0100 Andi Kleen <andi@firstfloor.org> wrote:

> Andrew Morton <akpm@linux-foundation.org> writes:
> 
> 
> >> -41525  2066 f, 3370 +, 44895 -, diff: -41525  IS_ERR 
> >
> > This is a surprise.  I expect that the -mm-only
> > profile-likely-unlikely-macros.patch is the cause of this and mainline
> > doesn't have this problem.
> 
> Shouldn't they only have overhead when the respective CONFIG is enabled?

yup.

> > If true, then this likely/unlikely bloat has probably spread into a lot of
> > your other results and it all should be redone against mainline, sorry :(
> >
> > (I'm not aware of anyone having used profile-likely-unlikely-macros.patch
> > in quite some time.  That's unfortunate because it has turned up some
> > fairly flagrant code deoptimisations)
> 
> Is there any reason they couldn't just be merged to mainline? 
> 
> I think it's a useful facility.

ummm, now why did we made that decision...  I think we decided that it's
the sort of thing which one person can run once per few months and that
will deliver its full value.  I can maintain it in -mm and we're happy - no
need to add it to mainline.  No strong feelings either way really.

It does have the downside that the kernel explodes if someone adds unlikely
or likely to the vdso code and I need to occasionally hunt down new
additions and revert them in that patch.  That makes it a bit of a
maintenance burden.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/8]: uninline & uninline
  2008-02-23 13:15   ` Andi Kleen
@ 2008-02-23 18:06     ` Ilpo Järvinen
  2008-02-23 18:55     ` Andrew Morton
  1 sibling, 0 replies; 9+ messages in thread
From: Ilpo Järvinen @ 2008-02-23 18:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Netdev, LKML, David Miller, Arnaldo Carvalho de Melo

On Sat, 23 Feb 2008, Andi Kleen wrote:

> Andrew Morton <akpm@linux-foundation.org> writes:
> 
> 
> >> -41525  2066 f, 3370 +, 44895 -, diff: -41525  IS_ERR 
> >
> > This is a surprise.  I expect that the -mm-only
> > profile-likely-unlikely-macros.patch is the cause of this and mainline
> > doesn't have this problem.
> 
> Shouldn't they only have overhead when the respective CONFIG is enabled?

I uninlined those with allyesconfig. I'll anyway try later on without a 
number of debug related CONFIGs.


-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/8]: uninline & uninline
  2008-02-23  8:02 ` Andrew Morton
  2008-02-23 10:11   ` Ilpo Järvinen
@ 2008-02-23 13:15   ` Andi Kleen
  2008-02-23 18:06     ` Ilpo Järvinen
  2008-02-23 18:55     ` Andrew Morton
  1 sibling, 2 replies; 9+ messages in thread
From: Andi Kleen @ 2008-02-23 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ilpo Järvinen, netdev, linux-kernel, David Miller,
	Arnaldo Carvalho de Melo

Andrew Morton <akpm@linux-foundation.org> writes:


>> -41525  2066 f, 3370 +, 44895 -, diff: -41525  IS_ERR 
>
> This is a surprise.  I expect that the -mm-only
> profile-likely-unlikely-macros.patch is the cause of this and mainline
> doesn't have this problem.

Shouldn't they only have overhead when the respective CONFIG is enabled?

> If true, then this likely/unlikely bloat has probably spread into a lot of
> your other results and it all should be redone against mainline, sorry :(
>
> (I'm not aware of anyone having used profile-likely-unlikely-macros.patch
> in quite some time.  That's unfortunate because it has turned up some
> fairly flagrant code deoptimisations)

Is there any reason they couldn't just be merged to mainline? 

I think it's a useful facility.

-Andi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/8]: uninline & uninline
  2008-02-23  8:02 ` Andrew Morton
@ 2008-02-23 10:11   ` Ilpo Järvinen
  2008-02-23 13:15   ` Andi Kleen
  1 sibling, 0 replies; 9+ messages in thread
From: Ilpo Järvinen @ 2008-02-23 10:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Netdev, LKML, David Miller, Arnaldo Carvalho de Melo

On Sat, 23 Feb 2008, Andrew Morton wrote:

> On Wed, 20 Feb 2008 15:47:10 +0200 "Ilpo J__rvinen" <ilpo.jarvinen@helsinki.fi> wrote:
> 
> > -41525  2066 f, 3370 +, 44895 -, diff: -41525  IS_ERR 
> 
> This is a surprise. 

It surprised me as well, there were something like 10 bytes I just 
couldn't explain in IS_ERR size (kernel/uninlined.c: IS_ERR | +25). I was 
to look into it deeper but didn't have the .o's at hand right away, not so 
trivial to store results of 5000+ build results except some carefully 
picked textual output :-)... Hmm, I'll add logging for the disassembly of 
the uninlined stuff into the next run, that won't cost too much...

> I expect that the -mm-only
> profile-likely-unlikely-macros.patch is the cause of this and mainline
> doesn't have this problem.

Ahaa, this explain it, I suspected that there was something (un)likely 
related elsewhere as well, no wonder why I couldn't reproduce the 25 bytes 
result in my quick copy-pasted non-kernel test...

> If true, then this likely/unlikely bloat has probably spread into a lot of
> your other results and it all should be redone against mainline, sorry :(

It isn't that troublesome to redo them, it's mainly automatic combined 
with impatient waiting from my behalf :-)... The spreading problem is 
probably true, to some extent. I did some runs also with carefully 
selected CONFIG.*DEBUG.* off under include/net/ earlier, in general it 
made very little difference, if something bloats, it usually does that 
regardless of .config. There are probably couple of exceptions when the 
size is on the boundary where it's very close of being useful to uninline 
it.

One interesting thing in there was that the largest offenders are quite 
small per call-site but due to vast number of them even a small benefit 
buys off a lot in kernel wide results. I suspect the differences due to 
(un)likely will be negligle, because the IS_ERR with all its trivialness 
is now mostly -15, so anything clearly larger than that will likely still 
be a win x n (where n is quite large).

Anyway, I'll see when I get the new set of tests running... :-) I'd prefer 
with CONFIG.*DEBUG off to get larger picture about non-debug builds too
(especially the scatterlist.h things interest me a lot), do you think that 
would do as well?

Thanks for your comments, I found them very useful.


-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/8]: uninline & uninline
  2008-02-20 13:47 Ilpo Järvinen
@ 2008-02-23  8:02 ` Andrew Morton
  2008-02-23 10:11   ` Ilpo Järvinen
  2008-02-23 13:15   ` Andi Kleen
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2008-02-23  8:02 UTC (permalink / raw)
  To: "Ilpo Järvinen"
  Cc: netdev, linux-kernel, David Miller, Arnaldo Carvalho de Melo

On Wed, 20 Feb 2008 15:47:10 +0200 "Ilpo J__rvinen" <ilpo.jarvinen@helsinki.fi> wrote:

> Ok, here's the top of the list (10000+ bytes):

This is good stuff - thanks.

> -41525  2066 f, 3370 +, 44895 -, diff: -41525  IS_ERR 

This is a surprise.  I expect that the -mm-only
profile-likely-unlikely-macros.patch is the cause of this and mainline
doesn't have this problem.

If true, then this likely/unlikely bloat has probably spread into a lot of
your other results and it all should be redone against mainline, sorry :(

(I'm not aware of anyone having used profile-likely-unlikely-macros.patch
in quite some time.  That's unfortunate because it has turned up some
fairly flagrant code deoptimisations)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 0/8]: uninline & uninline
@ 2008-02-20 13:47 Ilpo Järvinen
  2008-02-23  8:02 ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2008-02-20 13:47 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: David Miller, Arnaldo Carvalho de Melo

Hi all,

I run some lengthy tests to measure cost of inlines in headers under
include/, simple coverage calculations yields to 89% but most of the
failed compiles are due to preprocessor cutting the tested block away
anyway. Test setup: v2.6.24-mm1, make allyesconfig, 32-bit x86,
gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13). Because one inline was
tested (function uninlined) at a time, the actual benefits of removing
multiple inlines may well be below what the sum of those individually
is (especially when something calls __-func with equal name).

Ok, here's the top of the list (10000+ bytes):

-110805  869 f, 198 +, 111003 -, diff: -110805  skb_put 
-41525  2066 f, 3370 +, 44895 -, diff: -41525  IS_ERR 
-36290  42 f, 197 +, 36487 -, diff: -36290  cfi_build_cmd 
-35698  1234 f, 2391 +, 38089 -, diff: -35698  atomic_dec_and_test 
-28162  354 f, 3005 +, 31167 -, diff: -28162  skb_pull 
-23668  392 f, 104 +, 23772 -, diff: -23668  dev_alloc_skb 
-22212  415 f, 130 +, 22342 -, diff: -22212  __dev_alloc_skb 
-21593  356 f, 2418 +, 24011 -, diff: -21593  skb_push 
-19036  478 f, 259 +, 19295 -, diff: -19036  netif_wake_queue 
-18409  396 f, 6447 +, 24856 -, diff: -18409  __skb_pull 
-16420  187 f, 103 +, 16523 -, diff: -16420  dst_release 
-16025  13 f, 280 +, 16305 -, diff: -16025  cfi_send_gen_cmd 
-14925  486 f, 978 +, 15903 -, diff: -14925  add_timer 
-14896  199 f, 558 +, 15454 -, diff: -14896  sg_page 
-12870  36 f, 121 +, 12991 -, diff: -12870  le_key_k_type 
-12310  692 f, 7215 +, 19525 -, diff: -12310  signal_pending 
-11640  251 f, 118 +, 11758 -, diff: -11640  __skb_trim 
-11059  111 f, 293 +, 11352 -, diff: -11059  __nlmsg_put 
-10976  209 f, 123 +, 11099 -, diff: -10976  skb_trim 
-10344  125 f, 462 +, 10806 -, diff: -10344  pskb_may_pull 
-10061  300 f, 1163 +, 11224 -, diff: -10061  try_module_get 
-10008  75 f, 341 +, 10349 -, diff: -10008  nlmsg_put 

~250 are in 1000+ bytes category and ~440 in 500+. Full list
has some entries without number because given config doesn't
build them, and therefore nothing got uninlined, and the missing
entries consists solely of compile failures, available here:

  http://www.cs.helsinki.fi/u/ijjarvin/inlines/sorted

I made some patches to uninline couple of them (picked mostly
net related) to stir up some discussion, however, some of them
are not ready for inclusion as is (see patch descriptions).
The cases don't represent all top 8 cases because some of the
cases require a bit more analysis (e.g., config dependant,
comments about gcc optimizations).

The tools I used are available here except the site-specific
distribute machinery (in addition one needs pretty late
codiff from Arnaldo's toolset because there were some inline
related bugs fixed lately):

  http://www.cs.helsinki.fi/u/ijjarvin/inline-tools.git/

I'm planning to run similar tests also on inlines in headers that
are not under include/ but it requires minor modifications to
those tools.

--
 i.

ps. I'm sorry about the duplicates, old git-send-email's
8-bit-header problem bit me again. :-(




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-02-23 21:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-20 13:35 [RFC PATCH 0/8]: uninline & uninline Ilpo Järvinen
2008-02-20 13:47 Ilpo Järvinen
2008-02-23  8:02 ` Andrew Morton
2008-02-23 10:11   ` Ilpo Järvinen
2008-02-23 13:15   ` Andi Kleen
2008-02-23 18:06     ` Ilpo Järvinen
2008-02-23 18:55     ` Andrew Morton
2008-02-23 19:58       ` Hua Zhong
2008-02-23 21:02         ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).