LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
[not found] ` <1YmjP-4jX-37@gated-at.bofh.it>
@ 2004-05-21 19:17 ` Andi Kleen
[not found] ` <1YmMN-4Kh-17@gated-at.bofh.it>
1 sibling, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2004-05-21 19:17 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: brettspamacct, linux-kernel
"Martin J. Bligh" <mbligh@aracnet.com> writes:
> There is no such thing as a homenode. What you describe is more or less why
> we ditched that concept.
numa api has a "prefered node", which is a bit similar to the old
home node. The main difference is that it does not affect the scheduler,
only the memory allocation. You can of course affect the scheduler too,
but that's a separate option now and more strict.
For historical reasons numactl still has a --homenode= alias for --prefered,
although it is undocumented and discouraged now.
-Andi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
[not found] ` <1Yn67-50q-7@gated-at.bofh.it>
@ 2004-05-21 19:19 ` Andi Kleen
2004-05-21 20:32 ` Martin J. Bligh
0 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2004-05-21 19:19 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel, brettspamacct
"Martin J. Bligh" <mbligh@aracnet.com> writes:
> For any given situation, you can come up with a scheduler mod that improves
> things. The problem is making something generic that works well in most
> cases.
The point behind numa api/numactl is that if the defaults
don't work well enough you can tune it by hand to be better.
There are some setups which can be significantly improved with some
hand tuning, although in many cases the default behaviour is good enough
too.
-Andi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 19:19 ` Andi Kleen
@ 2004-05-21 20:32 ` Martin J. Bligh
2004-05-21 23:42 ` Brett E.
0 siblings, 1 reply; 22+ messages in thread
From: Martin J. Bligh @ 2004-05-21 20:32 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, brettspamacct
> "Martin J. Bligh" <mbligh@aracnet.com> writes:
>
>> For any given situation, you can come up with a scheduler mod that improves
>> things. The problem is making something generic that works well in most
>> cases.
>
> The point behind numa api/numactl is that if the defaults
> don't work well enough you can tune it by hand to be better.
>
> There are some setups which can be significantly improved with some
> hand tuning, although in many cases the default behaviour is good enough
> too.
Oh, I'm not denying it can make things better ... just 90% of the people
who want to try it would be better off leaving it the hell alone ;-)
M.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 20:32 ` Martin J. Bligh
@ 2004-05-21 23:42 ` Brett E.
2004-05-22 6:13 ` Martin J. Bligh
2004-05-23 0:28 ` Bryan O'Sullivan
0 siblings, 2 replies; 22+ messages in thread
From: Brett E. @ 2004-05-21 23:42 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andi Kleen, linux-kernel
Martin J. Bligh wrote:
>>"Martin J. Bligh" <mbligh@aracnet.com> writes:
>>
>>
>>>For any given situation, you can come up with a scheduler mod that improves
>>>things. The problem is making something generic that works well in most
>>>cases.
>>
>>The point behind numa api/numactl is that if the defaults
>>don't work well enough you can tune it by hand to be better.
>>
>>There are some setups which can be significantly improved with some
>>hand tuning, although in many cases the default behaviour is good enough
>>too.
>
>
> Oh, I'm not denying it can make things better ... just 90% of the people
> who want to try it would be better off leaving it the hell alone ;-)
>
> M.
>
Right now, 5 processes are running taking up a good deal of the CPU
doing memory-intensive work(cacheing) and I notice that none of the
processes seem to have CPU affinity. top shows they execute pseudo
randomly on the CPU's.
At this point I'd like to decrease the number of processes to 4 and test
performance with and without setting CPU & memory allocation affinity.
I've read the archives and I'm not sure how to get numactl running, both
.5 and .6 versions give me:
# numactl --show
No NUMA support available on this system.
despite using kernel 2.6.6.
running numactl under strace gives me:
32144 sched_getaffinity(32144, 16, { 0 }) = 8
32144 syscall_239(0, 0, 0, 0, 0, 0, 0x401e30, 0x401e30, 0x401e30,
0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30,
0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30,
0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30,
0x401e30, 0x401e30) = -1 (errno 38)
32144 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 53), ...}) = 0
32144 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x2a9556c000
32144 write(1, "No NUMA support available on thi"..., 42) = 42
32144 munmap(0x2a9556c000, 4096) = 0
32144 exit_group(1) = ?
In case someone might have ran into this before.
Thanks,
Brett
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 23:42 ` Brett E.
@ 2004-05-22 6:13 ` Martin J. Bligh
2004-05-22 7:41 ` Andi Kleen
2004-05-23 0:28 ` Bryan O'Sullivan
1 sibling, 1 reply; 22+ messages in thread
From: Martin J. Bligh @ 2004-05-22 6:13 UTC (permalink / raw)
To: brettspamacct; +Cc: Andi Kleen, linux-kernel
> Right now, 5 processes are running taking up a good deal of the CPU doing memory-intensive work(cacheing) and I notice that none of the processes seem to have CPU affinity. top shows they execute pseudo randomly on the CPU's.
>
>
> At this point I'd like to decrease the number of processes to 4 and test performance with and without setting CPU & memory allocation affinity.
>
> I've read the archives and I'm not sure how to get numactl running, both .5 and .6 versions give me:
>
># numactl --show
> No NUMA support available on this system.
>
> despite using kernel 2.6.6.
>
> running numactl under strace gives me:
>
> 32144 sched_getaffinity(32144, 16, { 0 }) = 8
> 32144 syscall_239(0, 0, 0, 0, 0, 0, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30, 0x401e30) = -1 (errno 38)
> 32144 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 53), ...}) = 0
> 32144 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a9556c000
> 32144 write(1, "No NUMA support available on thi"..., 42) = 42
> 32144 munmap(0x2a9556c000, 4096) = 0
> 32144 exit_group(1) = ?
>
>
> In case someone might have ran into this before.
Did you turn on CONFIG_NUMA?
M.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-22 6:13 ` Martin J. Bligh
@ 2004-05-22 7:41 ` Andi Kleen
0 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2004-05-22 7:41 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: brettspamacct, Andi Kleen, linux-kernel
> >
> ># numactl --show
> > No NUMA support available on this system.
> >
> > despite using kernel 2.6.6.
You need 2.6.6-mm*. numa api hasn't been merged to mainline yet.
-Andi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 23:42 ` Brett E.
2004-05-22 6:13 ` Martin J. Bligh
@ 2004-05-23 0:28 ` Bryan O'Sullivan
2004-05-23 14:28 ` Andi Kleen
1 sibling, 1 reply; 22+ messages in thread
From: Bryan O'Sullivan @ 2004-05-23 0:28 UTC (permalink / raw)
To: brettspamacct; +Cc: Martin J. Bligh, Andi Kleen, linux-kernel
On Fri, 2004-05-21 at 16:42, Brett E. wrote:
> Right now, 5 processes are running taking up a good deal of the CPU
> doing memory-intensive work(cacheing) and I notice that none of the
> processes seem to have CPU affinity.
I don't know what kind of system you're running on, but if it's a
multi-CPU Opteron, it is normally a sufficient fudge to just use
sched_setaffinity to bind individual processes to specific CPUs. The
mainline kernel memory allocator does the right thing in that case, and
allocates memory locally when it can.
You can use the taskset command to get at this from the command line, so
you may not even need to modify your code.
<b
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
[not found] ` <1YRnC-3vk-5@gated-at.bofh.it>
@ 2004-05-23 11:57 ` Andi Kleen
0 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2004-05-23 11:57 UTC (permalink / raw)
To: davids; +Cc: linux-kernel mailing list, jbarnes
"David Schwartz" <davids@webmaster.com> writes:
> I don't think we've reached the point yet where treating x86-64 systems as
> NUMA machines makes very much sense.
Benchmarks disagree with you on that. In most cases local memory
policy seems to work better than BIOS interleaving. That's because
memory latency is usually more important than memory bandwidth.
-Andi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-23 0:28 ` Bryan O'Sullivan
@ 2004-05-23 14:28 ` Andi Kleen
2004-05-24 22:00 ` Andrew Theurer
0 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2004-05-23 14:28 UTC (permalink / raw)
To: Bryan O'Sullivan
Cc: brettspamacct, Martin J. Bligh, Andi Kleen, linux-kernel
On Sat, May 22, 2004 at 05:28:09PM -0700, Bryan O'Sullivan wrote:
> On Fri, 2004-05-21 at 16:42, Brett E. wrote:
>
> > Right now, 5 processes are running taking up a good deal of the CPU
> > doing memory-intensive work(cacheing) and I notice that none of the
> > processes seem to have CPU affinity.
>
> I don't know what kind of system you're running on, but if it's a
> multi-CPU Opteron, it is normally a sufficient fudge to just use
> sched_setaffinity to bind individual processes to specific CPUs. The
> mainline kernel memory allocator does the right thing in that case, and
> allocates memory locally when it can.
>
> You can use the taskset command to get at this from the command line, so
> you may not even need to modify your code.
Linus also merged the NUMA API support into mainline now with 2.6.7rc1, so you
can use numactl for more finegrained tuning.
-Andi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-23 14:28 ` Andi Kleen
@ 2004-05-24 22:00 ` Andrew Theurer
2004-05-25 0:27 ` Scott Robert Ladd
2004-05-25 1:09 ` Brett E.
0 siblings, 2 replies; 22+ messages in thread
From: Andrew Theurer @ 2004-05-24 22:00 UTC (permalink / raw)
To: Andi Kleen, Bryan O'Sullivan
Cc: brettspamacct, Martin J. Bligh, Andi Kleen, linux-kernel
On Sunday 23 May 2004 09:28, Andi Kleen wrote:
> On Sat, May 22, 2004 at 05:28:09PM -0700, Bryan O'Sullivan wrote:
> > On Fri, 2004-05-21 at 16:42, Brett E. wrote:
> > > Right now, 5 processes are running taking up a good deal of the CPU
> > > doing memory-intensive work(cacheing) and I notice that none of the
> > > processes seem to have CPU affinity.
> >
> > I don't know what kind of system you're running on, but if it's a
> > multi-CPU Opteron, it is normally a sufficient fudge to just use
> > sched_setaffinity to bind individual processes to specific CPUs. The
> > mainline kernel memory allocator does the right thing in that case, and
> > allocates memory locally when it can.
> >
> > You can use the taskset command to get at this from the command line, so
> > you may not even need to modify your code.
>
> Linus also merged the NUMA API support into mainline now with 2.6.7rc1, so
> you can use numactl for more finegrained tuning.
FYI Brett, some Opteron systems have a BIOS option to interleave memory. If
you are going to make use of NUMA, I think you want to not interleave.
Also, if you have a 25% imbalance within a domain/node, the scheduler can have
a tendency to bounce around a task for fairness. That might be why you are
seeing little/no affinity to a cpu (even top might be causing some of this).
Not sure what the threshold is between domains/nodes, but I am curious if it
still happens with CONFIG_NUMA on. If these are long lived cpu bound
processes, I would try to have the number of processes be a multiple of the
number of cpus.
-Andrew Theurer
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-24 22:00 ` Andrew Theurer
@ 2004-05-25 0:27 ` Scott Robert Ladd
2004-05-25 1:09 ` Brett E.
1 sibling, 0 replies; 22+ messages in thread
From: Scott Robert Ladd @ 2004-05-25 0:27 UTC (permalink / raw)
To: Andrew Theurer
Cc: Andi Kleen, Bryan O'Sullivan, brettspamacct, Martin J. Bligh,
linux-kernel
Andrew Theurer wrote:
> FYI Brett, some Opteron systems have a BIOS option to interleave memory. If
> you are going to make use of NUMA, I think you want to not interleave.
I can confirm this. On my Tyan Thunder K8W 2885 (dual Opteron), I had to
disable interleaving (under the Northbridge setup) before Linux
recognized it as a NUMA system.
--
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-24 22:00 ` Andrew Theurer
2004-05-25 0:27 ` Scott Robert Ladd
@ 2004-05-25 1:09 ` Brett E.
1 sibling, 0 replies; 22+ messages in thread
From: Brett E. @ 2004-05-25 1:09 UTC (permalink / raw)
To: Andrew Theurer
Cc: Andi Kleen, Bryan O'Sullivan, Martin J. Bligh, linux-kernel
Andrew Theurer wrote:
> On Sunday 23 May 2004 09:28, Andi Kleen wrote:
>
>>On Sat, May 22, 2004 at 05:28:09PM -0700, Bryan O'Sullivan wrote:
>>
>>>On Fri, 2004-05-21 at 16:42, Brett E. wrote:
>>>
>>>>Right now, 5 processes are running taking up a good deal of the CPU
>>>>doing memory-intensive work(cacheing) and I notice that none of the
>>>>processes seem to have CPU affinity.
>>>
>>>I don't know what kind of system you're running on, but if it's a
>>>multi-CPU Opteron, it is normally a sufficient fudge to just use
>>>sched_setaffinity to bind individual processes to specific CPUs. The
>>>mainline kernel memory allocator does the right thing in that case, and
>>>allocates memory locally when it can.
>>>
>>>You can use the taskset command to get at this from the command line, so
>>>you may not even need to modify your code.
>>
>>Linus also merged the NUMA API support into mainline now with 2.6.7rc1, so
>>you can use numactl for more finegrained tuning.
>
>
> FYI Brett, some Opteron systems have a BIOS option to interleave memory. If
> you are going to make use of NUMA, I think you want to not interleave.
Thanks for the heads up, I just disabled interleaving in the BIOS.
>
> Also, if you have a 25% imbalance within a domain/node, the scheduler can have
> a tendency to bounce around a task for fairness. That might be why you are
> seeing little/no affinity to a cpu (even top might be causing some of this).
> Not sure what the threshold is between domains/nodes, but I am curious if it
> still happens with CONFIG_NUMA on. If these are long lived cpu bound
> processes, I would try to have the number of processes be a multiple of the
> number of cpus.
I have CONFIG_NUMA on and yes these are long-lived processes(duration is
1 hour). And you and I are on the same wavelength, I'm imagining 3
processes per CPU with apache handing off work to the processes in
question. This is on a 1.6ghz 2-way opteron if that matters at all.
Hopefully I can make the modifications and test this tomorrow.
Thanks,
Brett
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 17:27 ` Brett E.
2004-05-21 17:46 ` Martin J. Bligh
@ 2004-05-23 2:49 ` David Schwartz
1 sibling, 0 replies; 22+ messages in thread
From: David Schwartz @ 2004-05-23 2:49 UTC (permalink / raw)
To: brettspamacct, Martin J. Bligh; +Cc: linux-kernel mailing list, jbarnes
> Let's say I have a 2 way opteron and want to run 4 long-lived processes.
> I fork and exec to create 1 of the processes, it chooses to run on
> processor 0 since processor 1 is overloaded at that time, so its
> homenode is processor 0. I fork and exec another, it chooses processor
> 0 since processors 1 is overloaded at that time. .. Let's say an uneven
> distribution is chosen for all 4 processes, with all processes mapped to
> processor 0. So they allocate on node 0 yet the scheduler will map these
> to both processors since CPU should be balanced. In this case, you will
> have a situation where the second processor will have to fetch memory
> from the other processor's memory.
>
> So a better solution would be to use numactl to set the homenodes
> explicitly, choosing processor 0 for 2 processes, processor 1 for the 2
> other processes.
>
> Is this incorrect?
Generally, yes, it is. Surprisingly so. If you assume everything is
perfect, then this seems true. But in the real world, it almost never works
that way.
Consider, for example, if process 1 is responsible for most of the memory
load at any particular time. If it's on CPU #1 and all its memory is on CPU
#1, then the memory controller on CPU #2 is underused and memory bandwidth
suffers. This is why most BIOSes interlave the memory pages across the CPUs.
This gives you the best chance of being able to use both memory controllers
under load.
I don't think we've reached the point yet where treating x86-64 systems as
NUMA machines makes very much sense.
DS
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 18:58 ` Jesse Barnes
@ 2004-05-21 19:08 ` Martin J. Bligh
0 siblings, 0 replies; 22+ messages in thread
From: Martin J. Bligh @ 2004-05-21 19:08 UTC (permalink / raw)
To: Jesse Barnes, brettspamacct; +Cc: linux-kernel mailing list
>> Without these two things the kernel will
>> just allocate on the currently running CPU whatever that may be when in
>> fact a preference must be given to a CPU at some point, hopefully early
>> on in the life of the process, in order to take advantage of NUMA.
>
> The kernel does a pretty good job of keeping processes on the CPU they've been
> running on, and thus they'll probably stay close to their memory. But
> without explicit pinning, there's no guarantee.
One thing I *do* want to do, is instead of explicit homenodes, is to take
into account the per-node RSS of processes when doing my migration decisions.
That gives us a kind of dynamic homenode. I have the per-node RSS code in
my tree already from someone (Matt?), but we haven't used it. Of course,
this throws off the O(1)-ness of balancing decisions, but as long as we
cap the number we look at, and only do it for cross-node rebalance migrates
(we don't need it for exec), I really don't care ;-)
M.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 18:14 ` Brett E.
2004-05-21 18:30 ` Martin J. Bligh
@ 2004-05-21 18:58 ` Jesse Barnes
2004-05-21 19:08 ` Martin J. Bligh
1 sibling, 1 reply; 22+ messages in thread
From: Jesse Barnes @ 2004-05-21 18:58 UTC (permalink / raw)
To: brettspamacct; +Cc: Martin J. Bligh, linux-kernel mailing list
On Friday, May 21, 2004 2:14 pm, Brett E. wrote:
> So could process 0 run on processor 0, allocating local to processor 0,
> then run on processor 1, allocating local to processor 1, this way
> allocating to both processors?
Yep.
> So over time process 0's allocations
> would be split up between both processors, defeating NUMA.
Well, I wouldn't put it that way, but it would give your app suboptimal
performance.
> The homenode
> concept + explicit CPU pinning seems useful in that they allow you to
> take advantage of NUMA better.
When you say homenode here, do you mean restricted memory allocation? If so,
then yes, that would be useful. And we have it already with libnuma and
sched_setaffinity. If you mean some sort of preferred allocation node, then
no, we don't have that, and it wouldn't be of much benefit I think (relative
to what we already have). The idea is to do "pretty good" by default, but
still allow the user full control if they want to be sure.
> Without these two things the kernel will
> just allocate on the currently running CPU whatever that may be when in
> fact a preference must be given to a CPU at some point, hopefully early
> on in the life of the process, in order to take advantage of NUMA.
The kernel does a pretty good job of keeping processes on the CPU they've been
running on, and thus they'll probably stay close to their memory. But
without explicit pinning, there's no guarantee.
Jesse
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 18:14 ` Brett E.
@ 2004-05-21 18:30 ` Martin J. Bligh
2004-05-21 18:58 ` Jesse Barnes
1 sibling, 0 replies; 22+ messages in thread
From: Martin J. Bligh @ 2004-05-21 18:30 UTC (permalink / raw)
To: brettspamacct; +Cc: linux-kernel mailing list, jbarnes
> So could process 0 run on processor 0, allocating local to processor 0, then run on processor 1, allocating local to processor 1, this way allocating to both processors? So over time process 0's allocations would be split up between both processors, defeating NUMA. The homenode concept + explicit CPU pinning seems useful in that they allow you to take advantage of NUMA better. Without these two things the kernel will just allocate on the currently running CPU whatever that may be when in fact a preference must be given to a CPU at some point, hopefully early on in the life of the process, in order to take advantage of NUMA.
For any given situation, you can come up with a scheduler mod that improves
things. The problem is making something generic that works well in most
cases.
I suggest: (1) read the archives. (2) Try implementing it if you still
don't beleive me ;-) The main problem with the homenode scheduler is that
you're more likely to end up in a situation where you're running "off-node"
and allocating stuff back from your "home-node". This stuff isn't
deterministic.
M.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 17:46 ` Martin J. Bligh
@ 2004-05-21 18:14 ` Brett E.
2004-05-21 18:30 ` Martin J. Bligh
2004-05-21 18:58 ` Jesse Barnes
0 siblings, 2 replies; 22+ messages in thread
From: Brett E. @ 2004-05-21 18:14 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel mailing list, jbarnes
Martin J. Bligh wrote:
>>>>Say you have a bunch of single-threaded processes on a NUMA machine.
>>>>Does the kernel make sure to prefer allocations using a certain CPU's
>>>>memory, preferring to run a given process on the CPU which contains
>>>>its memory? Or should I use the NUMA API(libnuma) to spell this out
>>>>to the kernel? Does the kernel do the right thing in this case?
>>>
>>>
>>>The kernel will generally do the right thing (process local alloc) by
>>>default. In 99% of cases, you don't want to muck with it - unless you're
>>>running one single app dominating the whole system, and nothing else is
>>>going on, you probably don't want to specify anything explicitly.
>>>
>>>M.
>>>
>>
>>Let's say I have a 2 way opteron and want to run 4 long-lived processes. I fork and exec to create 1 of the processes, it chooses to run on processor 0 since processor 1 is overloaded at that time, so its homenode is processor 0.
>
>
> There is no such thing as a homenode. What you describe is more or less why
> we ditched that concept.
>
>
>>I fork and exec another, it chooses processor 0 since processors 1 is overloaded at that time. .. Let's say an uneven distribution is chosen for all 4 processes, with all processes mapped to processor 0. So they allocate on node 0 yet the scheduler will map these to both processors since CPU should be balanced. In this case, you will have a situation where the second processor will have to fetch memory from the other processor's memory.
>
>
> Each process will allocate local to the CPU it is running on when it does the
> allocate, so it's difficult to get as off-kilter as you describe (though it
> is still possible).
So could process 0 run on processor 0, allocating local to processor 0,
then run on processor 1, allocating local to processor 1, this way
allocating to both processors? So over time process 0's allocations
would be split up between both processors, defeating NUMA. The homenode
concept + explicit CPU pinning seems useful in that they allow you to
take advantage of NUMA better. Without these two things the kernel will
just allocate on the currently running CPU whatever that may be when in
fact a preference must be given to a CPU at some point, hopefully early
on in the life of the process, in order to take advantage of NUMA.
I'm trying to play devil's advocate by the way so bear with me, you've
been very helpful and I'm learning a great deal from you. :)
Thanks,
Brett
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 17:27 ` Brett E.
@ 2004-05-21 17:46 ` Martin J. Bligh
2004-05-21 18:14 ` Brett E.
2004-05-23 2:49 ` David Schwartz
1 sibling, 1 reply; 22+ messages in thread
From: Martin J. Bligh @ 2004-05-21 17:46 UTC (permalink / raw)
To: brettspamacct; +Cc: linux-kernel mailing list, jbarnes
>>> Say you have a bunch of single-threaded processes on a NUMA machine.
>>> Does the kernel make sure to prefer allocations using a certain CPU's
>>> memory, preferring to run a given process on the CPU which contains
>>> its memory? Or should I use the NUMA API(libnuma) to spell this out
>>> to the kernel? Does the kernel do the right thing in this case?
>>
>>
>> The kernel will generally do the right thing (process local alloc) by
>> default. In 99% of cases, you don't want to muck with it - unless you're
>> running one single app dominating the whole system, and nothing else is
>> going on, you probably don't want to specify anything explicitly.
>>
>> M.
>>
> Let's say I have a 2 way opteron and want to run 4 long-lived processes. I fork and exec to create 1 of the processes, it chooses to run on processor 0 since processor 1 is overloaded at that time, so its homenode is processor 0.
There is no such thing as a homenode. What you describe is more or less why
we ditched that concept.
> I fork and exec another, it chooses processor 0 since processors 1 is overloaded at that time. .. Let's say an uneven distribution is chosen for all 4 processes, with all processes mapped to processor 0. So they allocate on node 0 yet the scheduler will map these to both processors since CPU should be balanced. In this case, you will have a situation where the second processor will have to fetch memory from the other processor's memory.
Each process will allocate local to the CPU it is running on when it does the
allocate, so it's difficult to get as off-kilter as you describe (though it
is still possible).
> So a better solution would be to use numactl to set the homenodes explicitly, choosing processor 0 for 2 processes, processor 1 for the 2 other processes.
In theory, it may be. If you ever had complete control of the system, and
started no other processes whatsoever. In practice, that's very unlikely,
so unless you're running a dedicated Oracle server or something, don't muck
with it - just let the OS sort it out ;-)
M.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 6:37 ` Martin J. Bligh
@ 2004-05-21 17:27 ` Brett E.
2004-05-21 17:46 ` Martin J. Bligh
2004-05-23 2:49 ` David Schwartz
0 siblings, 2 replies; 22+ messages in thread
From: Brett E. @ 2004-05-21 17:27 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel mailing list, jbarnes
Martin J. Bligh wrote:
>>Say you have a bunch of single-threaded processes on a NUMA machine.
>>Does the kernel make sure to prefer allocations using a certain CPU's
>>memory, preferring to run a given process on the CPU which contains
>>its memory? Or should I use the NUMA API(libnuma) to spell this out
>>to the kernel? Does the kernel do the right thing in this case?
>
>
> The kernel will generally do the right thing (process local alloc) by
> default. In 99% of cases, you don't want to muck with it - unless you're
> running one single app dominating the whole system, and nothing else is
> going on, you probably don't want to specify anything explicitly.
>
> M.
>
Let's say I have a 2 way opteron and want to run 4 long-lived processes.
I fork and exec to create 1 of the processes, it chooses to run on
processor 0 since processor 1 is overloaded at that time, so its
homenode is processor 0. I fork and exec another, it chooses processor
0 since processors 1 is overloaded at that time. .. Let's say an uneven
distribution is chosen for all 4 processes, with all processes mapped to
processor 0. So they allocate on node 0 yet the scheduler will map these
to both processors since CPU should be balanced. In this case, you will
have a situation where the second processor will have to fetch memory
from the other processor's memory.
So a better solution would be to use numactl to set the homenodes
explicitly, choosing processor 0 for 2 processes, processor 1 for the 2
other processes.
Is this incorrect?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 0:51 Brett E.
2004-05-21 1:29 ` Jesse Barnes
@ 2004-05-21 6:37 ` Martin J. Bligh
2004-05-21 17:27 ` Brett E.
1 sibling, 1 reply; 22+ messages in thread
From: Martin J. Bligh @ 2004-05-21 6:37 UTC (permalink / raw)
To: brettspamacct, linux-kernel mailing list
> Say you have a bunch of single-threaded processes on a NUMA machine.
> Does the kernel make sure to prefer allocations using a certain CPU's
> memory, preferring to run a given process on the CPU which contains
> its memory? Or should I use the NUMA API(libnuma) to spell this out
> to the kernel? Does the kernel do the right thing in this case?
The kernel will generally do the right thing (process local alloc) by
default. In 99% of cases, you don't want to muck with it - unless you're
running one single app dominating the whole system, and nothing else is
going on, you probably don't want to specify anything explicitly.
M.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?
2004-05-21 0:51 Brett E.
@ 2004-05-21 1:29 ` Jesse Barnes
2004-05-21 6:37 ` Martin J. Bligh
1 sibling, 0 replies; 22+ messages in thread
From: Jesse Barnes @ 2004-05-21 1:29 UTC (permalink / raw)
To: brettspamacct; +Cc: linux-kernel mailing list
On Thursday, May 20, 2004 8:51 pm, Brett E. wrote:
> Say you have a bunch of single-threaded processes on a NUMA machine.
> Does the kernel make sure to prefer allocations using a certain CPU's
> memory, preferring to run a given process on the CPU which contains its
> memory?
Well, it'll allocate memory from the node containing the CPU that the process
is running on, so if you've pinned your process (e.g. with schedutils) you'll
be ok unless you're short on memory. If it's not pinned, you'll run the risk
of having your process refer to memory on a remote node. Depending on what
type of system you're running on, this could be a very small performance
issue or a large one.
> Or should I use the NUMA API(libnuma) to spell this out to the
> kernel? Does the kernel do the right thing in this case?
The kernel, by default, will allocate memory on the node where the process is
running, and fall back to other nodes based on distance. That said, it's not
a bad idea to pin your process to a CPU and use libnuma to explicitly set
it's memory affinity.
Jesse
^ permalink raw reply [flat|nested] 22+ messages in thread
* How can I optimize a process on a NUMA architecture(x86-64 specifically)?
@ 2004-05-21 0:51 Brett E.
2004-05-21 1:29 ` Jesse Barnes
2004-05-21 6:37 ` Martin J. Bligh
0 siblings, 2 replies; 22+ messages in thread
From: Brett E. @ 2004-05-21 0:51 UTC (permalink / raw)
To: linux-kernel mailing list
Say you have a bunch of single-threaded processes on a NUMA machine.
Does the kernel make sure to prefer allocations using a certain CPU's
memory, preferring to run a given process on the CPU which contains its
memory? Or should I use the NUMA API(libnuma) to spell this out to the
kernel? Does the kernel do the right thing in this case?
Thanks,
Brett
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2004-05-25 1:10 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1Y6yr-eM-11@gated-at.bofh.it>
[not found] ` <1YbRm-4iF-11@gated-at.bofh.it>
[not found] ` <1Yma3-4cF-3@gated-at.bofh.it>
[not found] ` <1YmjP-4jX-37@gated-at.bofh.it>
2004-05-21 19:17 ` How can I optimize a process on a NUMA architecture(x86-64 specifically)? Andi Kleen
[not found] ` <1YmMN-4Kh-17@gated-at.bofh.it>
[not found] ` <1Yn67-50q-7@gated-at.bofh.it>
2004-05-21 19:19 ` Andi Kleen
2004-05-21 20:32 ` Martin J. Bligh
2004-05-21 23:42 ` Brett E.
2004-05-22 6:13 ` Martin J. Bligh
2004-05-22 7:41 ` Andi Kleen
2004-05-23 0:28 ` Bryan O'Sullivan
2004-05-23 14:28 ` Andi Kleen
2004-05-24 22:00 ` Andrew Theurer
2004-05-25 0:27 ` Scott Robert Ladd
2004-05-25 1:09 ` Brett E.
[not found] ` <1YRnC-3vk-5@gated-at.bofh.it>
2004-05-23 11:57 ` Andi Kleen
2004-05-21 0:51 Brett E.
2004-05-21 1:29 ` Jesse Barnes
2004-05-21 6:37 ` Martin J. Bligh
2004-05-21 17:27 ` Brett E.
2004-05-21 17:46 ` Martin J. Bligh
2004-05-21 18:14 ` Brett E.
2004-05-21 18:30 ` Martin J. Bligh
2004-05-21 18:58 ` Jesse Barnes
2004-05-21 19:08 ` Martin J. Bligh
2004-05-23 2:49 ` David Schwartz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).