LKML Archive on
help / color / mirror / Atom feed
From: Linus Torvalds <>
To: Dexuan Cui <>
Cc: Ingo Molnar <>,
	Alexey Dobriyan <>,
	Andrew Morton <>,
	Peter Zijlstra <>,
	Thomas Gleixner <>,
	Greg Kroah-Hartman <>,
	Rakib Mullick <>,
	Linux Kernel Mailing List <>
Subject: Re: for_each_cpu() is buggy for UP kernel?
Date: Sun, 13 May 2018 11:21:31 -0700	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM>

On Tue, May 8, 2018 at 11:24 PM Dexuan Cui <> wrote:

> Should we fix the for_each_cpu() in include/linux/cpumask.h for UP?

As Thomas points out, this has come up before.

One of the issues is historical - we tried very hard to make the SMP code
not cause code generation problems for UP, and part of that was just that
all these loops were literally designed to entirely go away under UP. It
still *looks* syntactically like a loop, but an optimizing compiler will
see that there's nothing there, and "for_each_cpu(...) x" essentially just
turns into "x" on UP.  An empty mask simply generally doesn't make sense,
since opn UP you also don't have any masking of CPU ops, so the mask is
ignored, and that helps the code generation immensely.

If you have to load and test the mask, you immediately lose out badly in
code generation.

So honestly, I'd really prefer to keep our current behavior. Perhaps with a
debug option that actually tests (on SMP - because that's what every
developer is actually _using_ these days) that the mask isn't empty. But
I'm not sure that would find this case, since presumably on SMP it might
never be empty.

Now, there is likely a fairly good argument that UP is getting _so_
uninteresting that we shouldn't even worry about code generation. But the
counter-argument to that is that if people are using UP in this day and
age, they probably are using some really crappy hardware that needs all the
help it can get.

At least for now, I'd rather have this inconsistency, because it really
makes a surprisingly *big* difference in code generation.  From the little
test I just did, adding that mask testing to a *single* case of
for_each_cpu() added 20 instructions.  I didn't look at exactly why that
happened (because the code generation was so radically different), but it
was very noticeable. I used your macro replacement in kernel/taskstats.c in
case you want to try to dig into what happened, but I'm not surprised. It
really turns an unconditional trivial loop into a much more complex thing
that needs to look at and test a value that we didn't care about before.

Maybe we should introduce a "for_each_cpu_maybe_empty()" helper for cases
like this?


  parent reply	other threads:[~2018-05-13 18:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-09  6:24 Dexuan Cui
2018-05-09 23:20 ` Andrew Morton
2018-05-13 13:35   ` Thomas Gleixner
2018-05-13 18:21 ` Linus Torvalds [this message]
2018-05-14  7:28   ` Dmitry Vyukov
2018-05-15  3:02   ` Dexuan Cui
2018-05-15 17:21     ` Linus Torvalds
2018-05-15 20:10       ` Dexuan Cui
2018-05-15 19:52 ` [PATCH] tick/broadcast: Use for_each_cpu() specially on UP kernels Dexuan Cui
2018-05-15 20:48   ` [tip:timers/urgent] " tip-bot for Dexuan Cui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: for_each_cpu() is buggy for UP kernel?' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).