LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree
[not found] <200705040104.l4414YFR026322@shell0.pdx.osdl.net>
@ 2007-05-04 1:38 ` Paul Jackson
2007-05-04 4:49 ` Ken Chen
2007-05-04 6:45 ` Bill Irwin
0 siblings, 2 replies; 10+ messages in thread
From: Paul Jackson @ 2007-05-04 1:38 UTC (permalink / raw)
To: linux-kernel; +Cc: akpm, mm-commits, kenchen, agl, hermes, wli, clameter
Adding Christoph Lameter <clameter@sgi.com> to the cc list, as he knows
more about hugetlb pages than I do.
This patch strikes me as a bit odd.
Granted, it's solving what could be a touchy problem with a fairly
simple solution, which is usually a Good Thing(tm).
However, the idea that different tasks would see different values for
the following fields in /proc/meminfo:
HugePages_Total: 0
HugePages_Free: 0
strikes me as odd, and risky. I would have thought that usually, all
tasks in the system should see the same values in the files in /proc
(as opposed to the files in particular task subdirectories /proc/<pid>.)
This patch strikes me as a bit of a hack, good for compatibility, but
hiding a booby trap that will bite some user code in the long run.
But I'm not enough of an expert to know what the right tradeoffs are
in this matter.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree
2007-05-04 1:38 ` + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree Paul Jackson
@ 2007-05-04 4:49 ` Ken Chen
2007-05-04 5:03 ` Andrew Morton
2007-05-04 5:12 ` Paul Jackson
2007-05-04 6:45 ` Bill Irwin
1 sibling, 2 replies; 10+ messages in thread
From: Ken Chen @ 2007-05-04 4:49 UTC (permalink / raw)
To: Paul Jackson; +Cc: linux-kernel, akpm, mm-commits, agl, hermes, wli, clameter
On 5/3/07, Paul Jackson <pj@sgi.com> wrote:
> Adding Christoph Lameter <clameter@sgi.com> to the cc list, as he knows
> more about hugetlb pages than I do.
>
> This patch strikes me as a bit odd.
>
> Granted, it's solving what could be a touchy problem with a fairly
> simple solution, which is usually a Good Thing(tm).
>
> However, the idea that different tasks would see different values for
> the following fields in /proc/meminfo:
>
> HugePages_Total: 0
> HugePages_Free: 0
>
> strikes me as odd, and risky. I would have thought that usually, all
> tasks in the system should see the same values in the files in /proc
> (as opposed to the files in particular task subdirectories /proc/<pid>.)
>
> This patch strikes me as a bit of a hack, good for compatibility, but
> hiding a booby trap that will bite some user code in the long run.
>
> But I'm not enough of an expert to know what the right tradeoffs are
> in this matter.
Would annotating the Hugepages_* field with name of cpuset help? I
orginally thought that since cpuset's mems are hirearchical in memory
assignment, it is fairly straightforward to understand what's going
on: parent cpuset stats include its and all of its children. For
example, if root cpuset has two sub job1 and job2 cpusets, each has 20
and 30 htlb pages, when query at each level, we have:
[root@box]# echo $$ > /dev/cpuset/tasks
[root@box]# grep HugePages_Total /proc/meminfo
HugePages_Total: 50
[root@box]# echo $$ > /dev/cpuset/job1/tasks
[root@box]# grep HugePages_Total /proc/meminfo
HugePages_Total: 20
[root@box]# echo $$ > /dev/cpuset/job2/tasks
[root@box]# grep HugePages_Total /proc/meminfo
HugePages_Total: 30
If this is odd, do you have any suggestions for alternative?
- Ken
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree
2007-05-04 4:49 ` Ken Chen
@ 2007-05-04 5:03 ` Andrew Morton
2007-05-04 5:58 ` Paul Jackson
2007-05-04 5:12 ` Paul Jackson
1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2007-05-04 5:03 UTC (permalink / raw)
To: Ken Chen; +Cc: Paul Jackson, linux-kernel, agl, hermes, wli, clameter
On Thu, 3 May 2007 21:49:12 -0700 "Ken Chen" <kenchen@google.com> wrote:
> On 5/3/07, Paul Jackson <pj@sgi.com> wrote:
> > Adding Christoph Lameter <clameter@sgi.com> to the cc list, as he knows
> > more about hugetlb pages than I do.
> >
> > This patch strikes me as a bit odd.
> >
> > Granted, it's solving what could be a touchy problem with a fairly
> > simple solution, which is usually a Good Thing(tm).
> >
> > However, the idea that different tasks would see different values for
> > the following fields in /proc/meminfo:
> >
> > HugePages_Total: 0
> > HugePages_Free: 0
> >
> > strikes me as odd, and risky. I would have thought that usually, all
> > tasks in the system should see the same values in the files in /proc
> > (as opposed to the files in particular task subdirectories /proc/<pid>.)
> >
> > This patch strikes me as a bit of a hack, good for compatibility, but
> > hiding a booby trap that will bite some user code in the long run.
> >
> > But I'm not enough of an expert to know what the right tradeoffs are
> > in this matter.
>
> Would annotating the Hugepages_* field with name of cpuset help?
There are existing programs which parse /proc/meminfo. If we're going to
do any of this then it would need to be via new fields.
I don't think we should be altering the meaning of the HugePages fields
like this. One can imagine scenarios in which such a change would cause
existing userspace scripts to fail. Plus it's Just Weird to use
/proc/meminfo in this manner.
> I
> orginally thought that since cpuset's mems are hirearchical in memory
> assignment, it is fairly straightforward to understand what's going
> on: parent cpuset stats include its and all of its children. For
> example, if root cpuset has two sub job1 and job2 cpusets, each has 20
> and 30 htlb pages, when query at each level, we have:
>
> [root@box]# echo $$ > /dev/cpuset/tasks
> [root@box]# grep HugePages_Total /proc/meminfo
> HugePages_Total: 50
>
> [root@box]# echo $$ > /dev/cpuset/job1/tasks
> [root@box]# grep HugePages_Total /proc/meminfo
> HugePages_Total: 20
>
> [root@box]# echo $$ > /dev/cpuset/job2/tasks
> [root@box]# grep HugePages_Total /proc/meminfo
> HugePages_Total: 30
>
> If this is odd, do you have any suggestions for alternative?
If it's per-cpuset information then shouldn't it be presented in
/dev/cpuset/something?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree
2007-05-04 4:49 ` Ken Chen
2007-05-04 5:03 ` Andrew Morton
@ 2007-05-04 5:12 ` Paul Jackson
2007-05-04 5:35 ` David Rientjes
1 sibling, 1 reply; 10+ messages in thread
From: Paul Jackson @ 2007-05-04 5:12 UTC (permalink / raw)
To: Ken Chen; +Cc: linux-kernel, akpm, mm-commits, agl, hermes, wli, clameter
Ken wrote:
> If this is odd, do you have any suggestions for alternative?
No, I don't. Sorry.
It's a touchy problem, and I'm not enough of an expert to know what the
right tradeoffs are in this matter.
I agree with your point that if you realize what's going on, namely
that what cpuset the task reading meminfo is in affects the HugePages
values that are read, then one can use the interface easily enough.
... how about:
1) don't change the existing HugePages_* values - keep them
system-wide, and
2) adding two new values, by such names as:
Current_Cpuset_HugePages_Total: 0
Current_Cpuset_HugePages_Free: 0
That's certainly an uglier proposal than yours ;). But at least it
seems clearer, and doesn't make incompatible changes to what's there.
It does require user level code change to actually benefit from the
new values, whereas your patch sort of sneaks them in, on the assumption
that the majority of reads of these values would really prefer getting
the cpuset relative totals instead.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree
2007-05-04 5:12 ` Paul Jackson
@ 2007-05-04 5:35 ` David Rientjes
2007-05-04 6:12 ` Paul Jackson
0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2007-05-04 5:35 UTC (permalink / raw)
To: Paul Jackson
Cc: Ken Chen, linux-kernel, akpm, mm-commits, agl, hermes, wli, clameter
On Thu, 3 May 2007, Paul Jackson wrote:
> 2) adding two new values, by such names as:
>
> Current_Cpuset_HugePages_Total: 0
> Current_Cpuset_HugePages_Free: 0
>
This information is already exported to userspace through sysfs. Simply
grab the N-mems allowed to your task from /proc/pid/status, cat
/sys/devices/system/node/nodeN/meminfo for each N, and add.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree
2007-05-04 5:03 ` Andrew Morton
@ 2007-05-04 5:58 ` Paul Jackson
2007-05-04 6:13 ` Ken Chen
0 siblings, 1 reply; 10+ messages in thread
From: Paul Jackson @ 2007-05-04 5:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: kenchen, linux-kernel, agl, hermes, wli, clameter
Andrew wrote:
> If it's per-cpuset information then shouldn't it be presented in
> /dev/cpuset/something?
Yeah - if huge pages were mainline future, rather than the more
controversial sideline they are now, then it would make more sense
to put in these stats in each cpuset.
Note, Ken, that if we did that, the calculation of these new Total and
Free stats would be a little different than your new code. Instead of
looping over the memory nodes in the current tasks mems_allowed mask,
we would loop over the memory nodes allowed in the cpuset being queried
(the cpuset whose 'hugepages_total' or 'hugepages_free' special
file we were reading, not the current tasks cpuset.)
But I'm reluctant to entertain such cpuset additions until I see more
of where my colleague Christoph is going in related work.
Clearly as can be seen on one of his posts on the parallel lkml thread:
Re: + pretend-cpuset-has-some-form-of-hugetlb-page-reservation.patch added to -mm tree
earlier today, Christoph is no great fan of the current implementation
of huge pages.
And clearly as memory continues to get bigger, we will be putting more
stress on these page size related mechanisms.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree
2007-05-04 5:35 ` David Rientjes
@ 2007-05-04 6:12 ` Paul Jackson
0 siblings, 0 replies; 10+ messages in thread
From: Paul Jackson @ 2007-05-04 6:12 UTC (permalink / raw)
To: David Rientjes
Cc: kenchen, linux-kernel, akpm, mm-commits, agl, hermes, wli, clameter
David wrote:
> This information is already exported to userspace through sysfs. Simply
> grab the N-mems allowed to your task from /proc/pid/status, cat
> /sys/devices/system/node/nodeN/meminfo for each N, and add.
Good point.
I don't see how this present patch, to change /proc/meminfo,
can be justified, given this.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree
2007-05-04 5:58 ` Paul Jackson
@ 2007-05-04 6:13 ` Ken Chen
2007-05-04 7:39 ` Paul Jackson
0 siblings, 1 reply; 10+ messages in thread
From: Ken Chen @ 2007-05-04 6:13 UTC (permalink / raw)
To: Paul Jackson; +Cc: Andrew Morton, linux-kernel, agl, hermes, wli, clameter
On 5/3/07, Paul Jackson <pj@sgi.com> wrote:
> Note, Ken, that if we did that, the calculation of these new Total and
> Free stats would be a little different than your new code. Instead of
> looping over the memory nodes in the current tasks mems_allowed mask,
> we would loop over the memory nodes allowed in the cpuset being queried
> (the cpuset whose 'hugepages_total' or 'hugepages_free' special
> file we were reading, not the current tasks cpuset.)
This is even more controversial and messy. akpm already dropped the
patch and expressed that he doesn't like it. And I won't go down
another messy path. I will let this idea RIP.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree
2007-05-04 1:38 ` + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree Paul Jackson
2007-05-04 4:49 ` Ken Chen
@ 2007-05-04 6:45 ` Bill Irwin
1 sibling, 0 replies; 10+ messages in thread
From: Bill Irwin @ 2007-05-04 6:45 UTC (permalink / raw)
To: Paul Jackson
Cc: linux-kernel, akpm, mm-commits, kenchen, agl, hermes, clameter
On Thu, May 03, 2007 at 06:38:21PM -0700, Paul Jackson wrote:
> Adding Christoph Lameter <clameter@sgi.com> to the cc list, as he knows
> more about hugetlb pages than I do.
> This patch strikes me as a bit odd.
> Granted, it's solving what could be a touchy problem with a fairly
> simple solution, which is usually a Good Thing(tm).
> However, the idea that different tasks would see different values for
> the following fields in /proc/meminfo:
> HugePages_Total: 0
> HugePages_Free: 0
> strikes me as odd, and risky. I would have thought that usually, all
> tasks in the system should see the same values in the files in /proc
> (as opposed to the files in particular task subdirectories /proc/<pid>.)
> This patch strikes me as a bit of a hack, good for compatibility, but
> hiding a booby trap that will bite some user code in the long run.
> But I'm not enough of an expert to know what the right tradeoffs are
> in this matter.
The semantics of the global /proc/meminfo should not change; a separate
per-cpuset reporting mechanism should really be used.
-- wli
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree
2007-05-04 6:13 ` Ken Chen
@ 2007-05-04 7:39 ` Paul Jackson
0 siblings, 0 replies; 10+ messages in thread
From: Paul Jackson @ 2007-05-04 7:39 UTC (permalink / raw)
To: Ken Chen; +Cc: akpm, linux-kernel, agl, hermes, wli, clameter
> I will let this idea RIP.
Agreed.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-05-04 7:39 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <200705040104.l4414YFR026322@shell0.pdx.osdl.net>
2007-05-04 1:38 ` + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree Paul Jackson
2007-05-04 4:49 ` Ken Chen
2007-05-04 5:03 ` Andrew Morton
2007-05-04 5:58 ` Paul Jackson
2007-05-04 6:13 ` Ken Chen
2007-05-04 7:39 ` Paul Jackson
2007-05-04 5:12 ` Paul Jackson
2007-05-04 5:35 ` David Rientjes
2007-05-04 6:12 ` Paul Jackson
2007-05-04 6:45 ` Bill Irwin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).