LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* committed memory, mmaps and shms
@ 2015-03-11 18:10 Marcos Dione
  2015-03-11 20:10 ` Martin Steigerwald
  2015-03-12 12:40 ` Michal Hocko
  0 siblings, 2 replies; 11+ messages in thread
From: Marcos Dione @ 2015-03-11 18:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: marcos-david.dione


    Hi everybody. First, I hope this is the right list for such
questions;  I searched in the list of lists[1] for a MM specific one, but
didn't find any. Second, I'm not subscribed, so please CC me and my other
address when answering.

    I'm trying to figure out how Linux really accounts for memory, both
globally and for each individual process. Most user's first approach  to
memory monitoring is running free (no pun intended):

$ free
             total       used       free     shared    buffers     cached
Mem:     396895176  395956332     938844          0       8972  356409952
-/+ buffers/cache:   39537408  357357768
Swap:      8385788    8385788          0

    This reports 378GiB of RAM, 377 used; of those 8MiB in buffers,
339GiB in cache, leaving only 38Gib for processes (for some reason this
value is not displayed, which should probably be a warning to what is to
come); and 1GiB free. So far all seems good.

    Now, this machine has (at least) a 108 GiB shm. All this memory is
clearly counted as cache. This is my first surprise. shms are not cache
of anything on disk, but spaces of shared memory (duh); at most, their
pages can end up in swap, but not in a file somewhere. Maybe I'm not
correctly interpreting the meaning of (what is accounted as) cache.

    The next tool in the toolbox is ps:

$ ps ux | grep 27595
USER       PID %CPU %MEM        VSZ      RSS TTY STAT START   TIME COMMAND
osysops  27595 49.5 12.7 5506723020 50525312   ?   Sl 05:20 318:02 otf_be v2.9.0.13 : FQ_E08AS FQ_E08-FQDSIALT #1 [processing daemon lib, msg type: undefined]

    This process is not only attached to that shm, it's also attached to 
5TiB of mmap'ed files (128 LMDB databases), for a total of 5251GiB. For
context, know that another 9 processes do the same. This tells me that
shms and mmaps are counted as part of their virtual size, which makes
sense. Of those, only 48GiB are resident... but a couple of paragraphs
before I said that there were only 38GiB used by processes. Clearly some
part of each individual process' RSS also counts at least some part of
the mmaps. /proc/27595/smaps has more info:

$ cat /proc/27595/smaps | awk 'BEGIN { count= 0; } /Rss/ { count = count + $2; print } /Pss/ { print } /Swap/ { print } /^Size/ { print } /-/ { print } END { print count }'
[...]
7f2987e92000-7f3387e92000 rw-s 00000000 fc:11 3225448420                 /instant/LMDBMedium_0000000000/data.mdb
Size:           41943040 kB
Rss:              353164 kB
Pss:              166169 kB
Swap:                  0 kB
[...]
7f33df965000-7f4f1cdcc000 rw-s 00000000 00:04 454722576                  /SYSV00000000 (deleted)
Size:          114250140 kB
Rss:             5587224 kB
Pss:             3856206 kB
Swap:                  0 kB
[...]
51652180

    Notice that the sum is not the same as the one reported before; maybe
because I took them in different points of time while redacting this
mail. So this confirms that a process' RSS value includes shms and mmaps,
at least the resident part. In the case of the mmaps, the resident part
must be the part that currently sits on the cache; in the case of the
shms, I suppose it's the part that has ever been used. An internal tool
tels me that currently 24GiB of that shm is in use, but only 5 are
reported as part of that process' RSS. Maybe is that process' used part?

    And now I reach to what I find more confusing (uninteresting values
removed):

$ cat /proc/meminfo 
MemTotal:       396895176 kB
MemFree:           989392 kB
Buffers :            8448 kB
Cached:         344059556 kB
SwapTotal:        8385788 kB
SwapFree:               0 kB
Mapped:         147188944 kB
Shmem:          109114792 kB
CommitLimit:    206833376 kB
Committed_AS:   349194180 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      1222960 kB
VmallocChunk: 34157188704 kB

    Again, values might vary due to timing. Mapped clearly includes Shmem
but not mmaps; in theory 36GiB are 'pure' (not shm'ed, not mmap'ed)
process memory, close to what I calculated before. Again, this is not
segregated, which again makes us wonder why. Probably it's more like "It
doesn't make sense to do it".

    Last but definitely not least, Committed_AS is 333GiB, close to the
total mem. man proc says it's «The amount of memory presently allocated
on the system. The committed memory is a sum of all of the memory which
has been allocated by processes, even if it has not been "used" by them
as of yet». What is not clear is if this counts or not mmaps (I think it
doesn't, or it would be either 5TiB or 50TiB, depending on whether you
count each attachment to each shm) and/or/neither shms (once, multiple
times?). In a rough calculation, the 83 procs using the same 108GiB shm
account for 9TiB, so at least it's not counting it multiple times.

    While we're at it, I would like to know what VmallocTotal (32TiB) is
accounting. The explanation in man proc («Total size of vmalloc memory
area.», where vmalloc seems to be a kernel internal function to «allocate
a contiguous memory region in the virtual address space») means not much
for me. At some point I thought it should be the sum of all VSSs, but
that clocks at 50TiB for me, so it isn't. Maybe I should just ignore it.

    Short version:

* Why 'pure' mmalloc'ed memory is ever reported? Does it make sense to
  talk about it?

* Why shms shows up in cache? What does cache currently mean/hold?

* What does the RSS value means for the shms in each proc's smaps file?
  And for mmaps?

* Is my conclusion about Shmem being counted into Mapped correct?

* What is actually counted in Committed_AS? Does it count shms or mmaps?
  How?

* What is VmallocTotal?

    Thanks in advance, first for reaching the end of this longish mail,
and second if you ever give any clues about any of these questions.
Cheers,

	-- Marcos.

--
[1] http://vger.kernel.org/vger-lists.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: committed memory, mmaps and shms
  2015-03-11 18:10 committed memory, mmaps and shms Marcos Dione
@ 2015-03-11 20:10 ` Martin Steigerwald
  2015-03-12 12:40 ` Michal Hocko
  1 sibling, 0 replies; 11+ messages in thread
From: Martin Steigerwald @ 2015-03-11 20:10 UTC (permalink / raw)
  To: Marcos Dione; +Cc: linux-kernel, marcos-david.dione

Hi Marcos,

Am Mittwoch, 11. März 2015, 19:10:44 schrieb Marcos Dione:
>     Hi everybody. First, I hope this is the right list for such
> questions;  I searched in the list of lists[1] for a MM specific one,
> but didn't find any. Second, I'm not subscribed, so please CC me and my
> other address when answering.

Some pointers:

http://linux-mm.org/LinuxKernelMailingLists

http://linux-mm.org/

>     I'm trying to figure out how Linux really accounts for memory, both
> globally and for each individual process. Most user's first approach  to
> memory monitoring is running free (no pun intended):
> 
> $ free
>              total       used       free     shared    buffers    
> cached Mem:     396895176  395956332     938844          0       8972 
> 356409952 -/+ buffers/cache:   39537408  357357768
> Swap:      8385788    8385788          0

free -h is nice here.

As to your questions, its late here, I did lots of computer stuff, and 
there are MM devs that may have better answers as well. Short: I am not in 
the mood at the moment to think myself into it.

But I am interested in answers as well :)

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: committed memory, mmaps and shms
  2015-03-11 18:10 committed memory, mmaps and shms Marcos Dione
  2015-03-11 20:10 ` Martin Steigerwald
@ 2015-03-12 12:40 ` Michal Hocko
  2015-03-12 14:54   ` Marcos Dione
  2015-03-12 15:09   ` Michal Hocko
  1 sibling, 2 replies; 11+ messages in thread
From: Michal Hocko @ 2015-03-12 12:40 UTC (permalink / raw)
  To: Marcos Dione; +Cc: linux-kernel, marcos-david.dione, linux-mm

[CCing MM maling list]

On Wed 11-03-15 19:10:44, Marcos Dione wrote:
> 
>     Hi everybody. First, I hope this is the right list for such
> questions;  I searched in the list of lists[1] for a MM specific one, but
> didn't find any. Second, I'm not subscribed, so please CC me and my other
> address when answering.
> 
>     I'm trying to figure out how Linux really accounts for memory, both
> globally and for each individual process. Most user's first approach  to
> memory monitoring is running free (no pun intended):
> 
> $ free
>              total       used       free     shared    buffers     cached
> Mem:     396895176  395956332     938844          0       8972  356409952
> -/+ buffers/cache:   39537408  357357768
> Swap:      8385788    8385788          0
> 
>     This reports 378GiB of RAM, 377 used; of those 8MiB in buffers,
> 339GiB in cache, leaving only 38Gib for processes (for some reason this

I am not sure I understand your math here. 339G in the cache should be
reclaimable (be careful about the shmem though). It is the rest which
might be harder to reclaim.

> value is not displayed, which should probably be a warning to what is to
> come); and 1GiB free. So far all seems good.
> 
>     Now, this machine has (at least) a 108 GiB shm. All this memory is
> clearly counted as cache. This is my first surprise. shms are not cache
> of anything on disk, but spaces of shared memory (duh); at most, their
> pages can end up in swap, but not in a file somewhere. Maybe I'm not
> correctly interpreting the meaning of (what is accounted as) cache.

shmem (tmpfs) is a in memory filesystem. Pages backing shmem mappings
are maintained in the page cache. Their backing storage is swap as you
said. So from a conceptual point of vew this makes a lot of sense. 
I can completely understand why this might be confusing for users,
though. The value simply means something else.
I think it would make more sense to add something like easily
reclaimable chache to the output of free (pagecache-shmem-dirty
basically). That would give an admin a better view on immediatelly
re-usable memory.

I will skip over the following section but keep it here for the mm
mailing list (TL;DR right now).

>     The next tool in the toolbox is ps:
> 
> $ ps ux | grep 27595
> USER       PID %CPU %MEM        VSZ      RSS TTY STAT START   TIME COMMAND
> osysops  27595 49.5 12.7 5506723020 50525312   ?   Sl 05:20 318:02 otf_be v2.9.0.13 : FQ_E08AS FQ_E08-FQDSIALT #1 [processing daemon lib, msg type: undefined]
> 
>     This process is not only attached to that shm, it's also attached to 
> 5TiB of mmap'ed files (128 LMDB databases), for a total of 5251GiB. For
> context, know that another 9 processes do the same. This tells me that
> shms and mmaps are counted as part of their virtual size, which makes
> sense. Of those, only 48GiB are resident... but a couple of paragraphs
> before I said that there were only 38GiB used by processes. Clearly some
> part of each individual process' RSS also counts at least some part of
> the mmaps. /proc/27595/smaps has more info:
> 
> $ cat /proc/27595/smaps | awk 'BEGIN { count= 0; } /Rss/ { count = count + $2; print } /Pss/ { print } /Swap/ { print } /^Size/ { print } /-/ { print } END { print count }'
> [...]
> 7f2987e92000-7f3387e92000 rw-s 00000000 fc:11 3225448420                 /instant/LMDBMedium_0000000000/data.mdb
> Size:           41943040 kB
> Rss:              353164 kB
> Pss:              166169 kB
> Swap:                  0 kB
> [...]
> 7f33df965000-7f4f1cdcc000 rw-s 00000000 00:04 454722576                  /SYSV00000000 (deleted)
> Size:          114250140 kB
> Rss:             5587224 kB
> Pss:             3856206 kB
> Swap:                  0 kB
> [...]
> 51652180
> 
>     Notice that the sum is not the same as the one reported before; maybe
> because I took them in different points of time while redacting this
> mail. So this confirms that a process' RSS value includes shms and mmaps,
> at least the resident part. In the case of the mmaps, the resident part
> must be the part that currently sits on the cache; in the case of the
> shms, I suppose it's the part that has ever been used. An internal tool
> tels me that currently 24GiB of that shm is in use, but only 5 are
> reported as part of that process' RSS. Maybe is that process' used part?
> 
>     And now I reach to what I find more confusing (uninteresting values
> removed):
> 
> $ cat /proc/meminfo 
> MemTotal:       396895176 kB
> MemFree:           989392 kB
> Buffers :            8448 kB
> Cached:         344059556 kB
> SwapTotal:        8385788 kB
> SwapFree:               0 kB
> Mapped:         147188944 kB
> Shmem:          109114792 kB
> CommitLimit:    206833376 kB
> Committed_AS:   349194180 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed:      1222960 kB
> VmallocChunk: 34157188704 kB
> 
>     Again, values might vary due to timing. Mapped clearly includes Shmem
> but not mmaps; in theory 36GiB are 'pure' (not shm'ed, not mmap'ed)
> process memory, close to what I calculated before. Again, this is not
> segregated, which again makes us wonder why. Probably it's more like "It
> doesn't make sense to do it".
> 
>     Last but definitely not least, Committed_AS is 333GiB, close to the
> total mem. man proc says it's «The amount of memory presently allocated
> on the system. The committed memory is a sum of all of the memory which
> has been allocated by processes, even if it has not been "used" by them
> as of yet». What is not clear is if this counts or not mmaps (I think it
> doesn't, or it would be either 5TiB or 50TiB, depending on whether you
> count each attachment to each shm) and/or/neither shms (once, multiple
> times?). In a rough calculation, the 83 procs using the same 108GiB shm
> account for 9TiB, so at least it's not counting it multiple times.
> 
>     While we're at it, I would like to know what VmallocTotal (32TiB) is
> accounting. The explanation in man proc («Total size of vmalloc memory
> area.», where vmalloc seems to be a kernel internal function to «allocate
> a contiguous memory region in the virtual address space») means not much
> for me. At some point I thought it should be the sum of all VSSs, but
> that clocks at 50TiB for me, so it isn't. Maybe I should just ignore it.
> 
>     Short version:
> 
> * Why 'pure' mmalloc'ed memory is ever reported? Does it make sense to
>   talk about it?

This is simply private anonymous memory. And you can see it as such in
/proc/<pid>/[s]maps

> * Why shms shows up in cache? What does cache currently mean/hold?

Explained above I hope (it is an in-memory filesystem).

> * What does the RSS value means for the shms in each proc's smaps file?
>   And for mmaps?

The amount of shmem backed pages mapped in to the user address space.

> * Is my conclusion about Shmem being counted into Mapped correct?

Mapped will tell you how much page cache is mapped via pagetable to a
process. So it is a subset of pagecache. same as Shmem is a subset. Note
that shmem doesn't have to be mapped anywhere (e.g. simply read a file
on tmpfs filesystem - it will be in the pagecache but not mapped).

> * What is actually counted in Committed_AS? Does it count shms or mmaps?
>   How?

This depends on the overcommit configuration. See
Documentation/sysctl/vm.txt for more information.

> * What is VmallocTotal?

Vmalloc areas are used by _kernel_ to map larger physically
non-contiguous memory areas. More on that e.g. here
http://www.makelinux.net/books/lkd2/ch11lev1sec5. You can safely ignore
it.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: committed memory, mmaps and shms
  2015-03-12 12:40 ` Michal Hocko
@ 2015-03-12 14:54   ` Marcos Dione
  2015-03-12 15:35     ` Michal Hocko
  2015-03-12 15:09   ` Michal Hocko
  1 sibling, 1 reply; 11+ messages in thread
From: Marcos Dione @ 2015-03-12 14:54 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, marcos-david.dione, linux-mm

On Thu, Mar 12, 2015 at 08:40:53AM -0400, Michal Hocko wrote:
> [CCing MM maling list]

    Shall we completely migrate the rest of the conversation there?

> On Wed 11-03-15 19:10:44, Marcos Dione wrote:
> > 
> >     Hi everybody. First, I hope this is the right list for such
> > questions;  I searched in the list of lists[1] for a MM specific one, but
> > didn't find any. Second, I'm not subscribed, so please CC me and my other
> > address when answering.
> > 
> >     I'm trying to figure out how Linux really accounts for memory, both
> > globally and for each individual process. Most user's first approach  to
> > memory monitoring is running free (no pun intended):
> > 
> > $ free
> >              total       used       free     shared    buffers     cached
> > Mem:     396895176  395956332     938844          0       8972  356409952
> > -/+ buffers/cache:   39537408  357357768
> > Swap:      8385788    8385788          0
> > 
> >     This reports 378GiB of RAM, 377 used; of those 8MiB in buffers,
> > 339GiB in cache, leaving only 38Gib for processes (for some reason this
> 
> I am not sure I understand your math here. 339G in the cache should be
> reclaimable (be careful about the shmem though). It is the rest which
> might be harder to reclaim.

    These 38GiB I mention is the rest of 378 available minus 339 in
cache. To me this difference represents the sum of the resident
anonymous memory malloc'ed by all processes. Unless there's some othr
kind of pages accounted in 'Used'.

> shmem (tmpfs) is a in memory filesystem. Pages backing shmem mappings
> are maintained in the page cache. Their backing storage is swap as you
> said. So from a conceptual point of vew this makes a lot of sense. 

    Now it's completely clear, thanks.

> > * Why 'pure' mmalloc'ed memory is ever reported? Does it make sense to
> >   talk about it?
> 
> This is simply private anonymous memory. And you can see it as such in
> /proc/<pid>/[s]maps

    Yes, but my question was more on the lines of 'why free or
/proc/meminfo do not show it'. Maybe it's just that it's difficult to
define (like I said, "sum of resident anonymous..." &c) or nobody really
cares about this. Maybe I shouldn't either.

> > * What does the RSS value means for the shms in each proc's smaps file?
> >   And for mmaps?
> 
> The amount of shmem backed pages mapped in to the user address space.

    Perfect.

> > * Is my conclusion about Shmem being counted into Mapped correct?
> 
> Mapped will tell you how much page cache is mapped via pagetable to a
> process. So it is a subset of pagecache. same as Shmem is a subset. Note
> that shmem doesn't have to be mapped anywhere (e.g. simply read a file
> on tmpfs filesystem - it will be in the pagecache but not mapped).
> 
> > * What is actually counted in Committed_AS? Does it count shms or mmaps?
> >   How?
> 
> This depends on the overcommit configuration. See
> Documentation/sysctl/vm.txt for more information.

    I understand what /proc/sys/vm/overcommit_memory is for; what I
don't understand is what exactly counted in the Committed_AS line in
/proc/meminfo. I also read Documentation/vm/overcommit-accounting and
even mm/mmap.c, but I'm still in the dark here.

> > * What is VmallocTotal?
> 
> Vmalloc areas are used by _kernel_ to map larger physically
> non-contiguous memory areas. More on that e.g. here
> http://www.makelinux.net/books/lkd2/ch11lev1sec5. You can safely ignore
> it.

    It's already forgotten, thanks :) Cheers,

	-- Marcos.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: committed memory, mmaps and shms
  2015-03-12 12:40 ` Michal Hocko
  2015-03-12 14:54   ` Marcos Dione
@ 2015-03-12 15:09   ` Michal Hocko
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2015-03-12 15:09 UTC (permalink / raw)
  To: Marcos Dione; +Cc: linux-kernel, marcos-david.dione, linux-mm

On Thu 12-03-15 08:40:53, Michal Hocko wrote:
[...]
> I think it would make more sense to add something like easily
> reclaimable chache to the output of free (pagecache-shmem-dirty
> basically). That would give an admin a better view on immediatelly
> re-usable memory.

Ohh, I have just learned that /proc/meminfo already provides such an
information. It's MemAvailable and should give you an idea about how
much memory is re-usable.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: committed memory, mmaps and shms
  2015-03-12 14:54   ` Marcos Dione
@ 2015-03-12 15:35     ` Michal Hocko
  2015-03-12 16:56       ` Marcos Dione
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2015-03-12 15:35 UTC (permalink / raw)
  To: Marcos Dione; +Cc: linux-kernel, marcos-david.dione, linux-mm

On Thu 12-03-15 11:54:22, Marcos Dione wrote:
> On Thu, Mar 12, 2015 at 08:40:53AM -0400, Michal Hocko wrote:
> > [CCing MM maling list]
> 
>     Shall we completely migrate the rest of the conversation there?

It is usually better to keep lkml on the cc list for a larger audience.
 
> > On Wed 11-03-15 19:10:44, Marcos Dione wrote:
[...]
> > > $ free
> > >              total       used       free     shared    buffers     cached
> > > Mem:     396895176  395956332     938844          0       8972  356409952
> > > -/+ buffers/cache:   39537408  357357768
> > > Swap:      8385788    8385788          0
> > > 
> > >     This reports 378GiB of RAM, 377 used; of those 8MiB in buffers,
> > > 339GiB in cache, leaving only 38Gib for processes (for some reason this
> > 
> > I am not sure I understand your math here. 339G in the cache should be
> > reclaimable (be careful about the shmem though). It is the rest which
> > might be harder to reclaim.
> 
>     These 38GiB I mention is the rest of 378 available minus 339 in
> cache. To me this difference represents the sum of the resident
> anonymous memory malloc'ed by all processes. Unless there's some othr
> kind of pages accounted in 'Used'.

The kernel needs memory as well for its internal data structures
(stacks, page tables, slab objects, memory used by drivers and what not).
 
> > shmem (tmpfs) is a in memory filesystem. Pages backing shmem mappings
> > are maintained in the page cache. Their backing storage is swap as you
> > said. So from a conceptual point of vew this makes a lot of sense. 
> 
>     Now it's completely clear, thanks.
> 
> > > * Why 'pure' mmalloc'ed memory is ever reported? Does it make sense to
> > >   talk about it?
> > 
> > This is simply private anonymous memory. And you can see it as such in
> > /proc/<pid>/[s]maps
> 
>     Yes, but my question was more on the lines of 'why free or
> /proc/meminfo do not show it'. Maybe it's just that it's difficult to
> define (like I said, "sum of resident anonymous..." &c) or nobody really
> cares about this. Maybe I shouldn't either.

meminfo is exporting this information as AnonPages.

[...]
> > > * What is actually counted in Committed_AS? Does it count shms or mmaps?
> > >   How?
> > 
> > This depends on the overcommit configuration. See
> > Documentation/sysctl/vm.txt for more information.
> 
>     I understand what /proc/sys/vm/overcommit_memory is for; what I
> don't understand is what exactly counted in the Committed_AS line in
> /proc/meminfo.

It accounts all the address space reservations - e.g. mmap(len), len
will get added. The things are slightly more complicated but start
looking at callers of security_vm_enough_memory_mm should give you an
idea what everything is included.
How is this number used depends on the overcommit mode.
__vm_enough_memory would give you a better picture.

> I also read Documentation/vm/overcommit-accounting

What would help you to understand it better?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: committed memory, mmaps and shms
  2015-03-12 15:35     ` Michal Hocko
@ 2015-03-12 16:56       ` Marcos Dione
  2015-03-13 14:09         ` Michal Hocko
  2015-03-13 14:58         ` Marcos Dione
  0 siblings, 2 replies; 11+ messages in thread
From: Marcos Dione @ 2015-03-12 16:56 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, marcos-david.dione, linux-mm

On Thu, Mar 12, 2015 at 11:35:13AM -0400, Michal Hocko wrote:
> > > On Wed 11-03-15 19:10:44, Marcos Dione wrote:
> [...]
> > > > $ free
> > > >              total       used       free     shared    buffers     cached
> > > > Mem:     396895176  395956332     938844          0       8972  356409952
> > > > -/+ buffers/cache:   39537408  357357768
> > > > Swap:      8385788    8385788          0
> > > > 
> > > >     This reports 378GiB of RAM, 377 used; of those 8MiB in buffers,
> > > > 339GiB in cache, leaving only 38Gib for processes (for some reason this
> > > 
> > > I am not sure I understand your math here. 339G in the cache should be
> > > reclaimable (be careful about the shmem though). It is the rest which
> > > might be harder to reclaim.
> > 
> >     These 38GiB I mention is the rest of 378 available minus 339 in
> > cache. To me this difference represents the sum of the resident
> > anonymous memory malloc'ed by all processes. Unless there's some othr
> > kind of pages accounted in 'Used'.
> 
> The kernel needs memory as well for its internal data structures
> (stacks, page tables, slab objects, memory used by drivers and what not).

    Are those in or out of the total memory reported by free? I had the
impression the were out. 396895176 accounts only for 378.5GiB of the 384
available in the machine; I assumed the missing 5.5 was kernel memory.

> >     Yes, but my question was more on the lines of 'why free or
> > /proc/meminfo do not show it'. Maybe it's just that it's difficult to
> > define (like I said, "sum of resident anonymous..." &c) or nobody really
> > cares about this. Maybe I shouldn't either.
> 
> meminfo is exporting this information as AnonPages.

    I think that what I'm trying to do is figure out what each value
represents and where it's incuded, as if to make a graph like this
(fields in /proc/meminfo between []'s; dots are inactive, plus signs
active):

 RAM                            swap                          other (mmaps)
|------------------------------|-----------------------------|-------------...
|.| kernel [Slab+KernelStack+PageTables+?]
  |.| buffers [Buffers]
    | .  . . .  ..   .| swap cached (not necesarily like this, but you get the idea) (I'm assuming that it only includes anon pages, shms and private mmaps) [SwapCached]
    |++..| resident annon (malloc'ed) [AnonPages/Active/Inactive(anon)]
         |+++....+++........| cache [Cached/Active/Inactive(file)]
         |+++...| (resident?) shms [Shmem]
                |+++..| resident mmaps
                      |.....| other fs cache
                            |..| free [MemFree]
                               |.............| used swap [SwapTotal-SwapFree] 
                                             |...............| swap free [SwapFree]

    Note that there are no details on how the swap is used between anon
pages, shm and others; neither about mmaps; except in /proc/<pid>/smaps.
If someone is really interested in that, it would have to poll an
interesting amount of files, but definitely doable. Just cat'ing one of
these files for a process with 128 mmaps and 1 shm as before gave these
times:

real    0m0.802s
user    0m0.004s
sys     0m0.244s

> >     I understand what /proc/sys/vm/overcommit_memory is for; what I
> > don't understand is what exactly counted in the Committed_AS line in
> > /proc/meminfo.
> 
> It accounts all the address space reservations - e.g. mmap(len), len
> will get added. The things are slightly more complicated but start
> looking at callers of security_vm_enough_memory_mm should give you an
> idea what everything is included.
> How is this number used depends on the overcommit mode.
> __vm_enough_memory would give you a better picture.
> 
> > I also read Documentation/vm/overcommit-accounting
> 
> What would help you to understand it better?

    I think that after this dip in terminology I should go back to it
and try again to figure it out myself :) Of course findings will be
posted. Cheers,

	-- Marcos.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: committed memory, mmaps and shms
  2015-03-12 16:56       ` Marcos Dione
@ 2015-03-13 14:09         ` Michal Hocko
  2015-03-13 16:04           ` Marcos Dione
  2015-03-13 14:58         ` Marcos Dione
  1 sibling, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2015-03-13 14:09 UTC (permalink / raw)
  To: Marcos Dione; +Cc: linux-kernel, marcos-david.dione, linux-mm

On Thu 12-03-15 13:56:00, Marcos Dione wrote:
> On Thu, Mar 12, 2015 at 11:35:13AM -0400, Michal Hocko wrote:
> > > > On Wed 11-03-15 19:10:44, Marcos Dione wrote:
> > [...]
> > > > > $ free
> > > > >              total       used       free     shared    buffers     cached
> > > > > Mem:     396895176  395956332     938844          0       8972  356409952
> > > > > -/+ buffers/cache:   39537408  357357768
> > > > > Swap:      8385788    8385788          0
> > > > > 
> > > > >     This reports 378GiB of RAM, 377 used; of those 8MiB in buffers,
> > > > > 339GiB in cache, leaving only 38Gib for processes (for some reason this
> > > > 
> > > > I am not sure I understand your math here. 339G in the cache should be
> > > > reclaimable (be careful about the shmem though). It is the rest which
> > > > might be harder to reclaim.
> > > 
> > >     These 38GiB I mention is the rest of 378 available minus 339 in
> > > cache. To me this difference represents the sum of the resident
> > > anonymous memory malloc'ed by all processes. Unless there's some othr
> > > kind of pages accounted in 'Used'.
> > 
> > The kernel needs memory as well for its internal data structures
> > (stacks, page tables, slab objects, memory used by drivers and what not).
> 
>     Are those in or out of the total memory reported by free? I had the
> impression the were out. 396895176 accounts only for 378.5GiB of the 384
> available in the machine; I assumed the missing 5.5 was kernel memory.

I haven't checked the code of `free' but I would expect this to be part
of `used'.
 
> > >     Yes, but my question was more on the lines of 'why free or
> > > /proc/meminfo do not show it'. Maybe it's just that it's difficult to
> > > define (like I said, "sum of resident anonymous..." &c) or nobody really
> > > cares about this. Maybe I shouldn't either.
> > 
> > meminfo is exporting this information as AnonPages.
> 
>     I think that what I'm trying to do is figure out what each value
> represents and where it's incuded, as if to make a graph like this
> (fields in /proc/meminfo between []'s; dots are inactive, plus signs
> active):
> 
>  RAM                            swap                          other (mmaps)
> |------------------------------|-----------------------------|-------------...
> |.| kernel [Slab+KernelStack+PageTables+?]
>   |.| buffers [Buffers]
>     | .  . . .  ..   .| swap cached (not necesarily like this, but you get the idea) (I'm assuming that it only includes anon pages, shms and private mmaps) [SwapCached]
>     |++..| resident annon (malloc'ed) [AnonPages/Active/Inactive(anon)]
>          |+++....+++........| cache [Cached/Active/Inactive(file)]
>          |+++...| (resident?) shms [Shmem]
>                 |+++..| resident mmaps
>                       |.....| other fs cache
>                             |..| free [MemFree]
>                                |.............| used swap [SwapTotal-SwapFree] 
>                                              |...............| swap free [SwapFree]
> 
>     Note that there are no details on how the swap is used between anon
> pages, shm and others; neither about mmaps; except in /proc/<pid>/smaps.

Well, the memory management subsystem is rather complex and it is not
really trivial to match all the possible combinations into simple
counters.

I would be interested in the particular usecase where you want the
specific information and it is important outside of debugging purposes.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: committed memory, mmaps and shms
  2015-03-12 16:56       ` Marcos Dione
  2015-03-13 14:09         ` Michal Hocko
@ 2015-03-13 14:58         ` Marcos Dione
  2015-06-03 16:26           ` Marcos Dione
  1 sibling, 1 reply; 11+ messages in thread
From: Marcos Dione @ 2015-03-13 14:58 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, marcos-david.dione, linux-mm

On Thu, Mar 12, 2015 at 01:56:00PM -0300, Marcos Dione wrote:
> On Thu, Mar 12, 2015 at 11:35:13AM -0400, Michal Hocko wrote:
> > > > On Wed 11-03-15 19:10:44, Marcos Dione wrote:
>     I think that what I'm trying to do is figure out what each value
> represents and where it's incuded, as if to make a graph like this
> (fields in /proc/meminfo between []'s; dots are inactive, plus signs
> active):
> 
>  RAM                            swap                          other (mmaps)
> |------------------------------|-----------------------------|-------------...
> |.| kernel [Slab+KernelStack+PageTables+?]
>   |.| buffers [Buffers]
>     | .  . . .  ..   .| swap cached (not necesarily like this, but you get the idea) (I'm assuming that it only includes anon pages, shms and private mmaps) [SwapCached]
>     |++..| resident annon (malloc'ed) [AnonPages/Active/Inactive(anon)]
>          |+++....+++........| cache [Cached/Active/Inactive(file)]
>          |+++...| (resident?) shms [Shmem]
>                 |+++..| resident mmaps
>                       |.....| other fs cache
>                             |..| free [MemFree]
>                                |.............| used swap [SwapTotal-SwapFree] 
>                                              |...............| swap free [SwapFree]

    Did I get this right so far?

> > >     I understand what /proc/sys/vm/overcommit_memory is for; what I
> > > don't understand is what exactly counted in the Committed_AS line in
> > > /proc/meminfo.
> > 
> > It accounts all the address space reservations - e.g. mmap(len), len
> > will get added. The things are slightly more complicated but start
> > looking at callers of security_vm_enough_memory_mm should give you an
> > idea what everything is included.
> > How is this number used depends on the overcommit mode.
> > __vm_enough_memory would give you a better picture.
> > 
> > > I also read Documentation/vm/overcommit-accounting
> > 
> > What would help you to understand it better?

    I think it's mostly a language barrier. The doc talks about of how
the kernel handles the memory, but leaves userland people 'watching from
outside the fence'. From the sysadmin and non-kernel developer (that not
necesarily knows all the kinds of things that can be done with
malloc/mmap/shem/&c) point of view, this is what I think the doc refers
to:

> How It Works
> ------------
> 
> The overcommit is based on the following rules
> 
> For a file backed map

    mmaps. are there more?

>     SHARED or READ-only	-	0 cost (the file is the map not swap)
>     PRIVATE WRITABLE	-	size of mapping per instance
> 
> For an anonymous 

    malloc'ed memory

> or /dev/zero map

    hmmm, (read only?) mmap'ing on top of /dev/zero?

>     SHARED			-	size of mapping

    a shared anonymous memory is a shm?

>     PRIVATE READ-only	-	0 cost (but of little use)
>     PRIVATE WRITABLE	-	size of mapping per instance

    I can't translate these two terms, unless the latter is the one
refering specifically to mmalloc's. I wonder how could create several
intances of the 'same' mapping in that case. forks?

> Additional accounting
>     Pages made writable copies by mmap

    Hmmm, copy-on-write pages for when you write in a shared mmap? I'm
wild guessing here, even when what I say doesn't make any sense.

>     shmfs memory drawn from the same pool

    Beats me.

> Status
> ------

    This section goes back mostly to userland terminology.

> o	We account mmap memory mappings
> o	We account mprotect changes in commit
> o	We account mremap changes in size
> o	We account brk

    This I know is part of the implementation of malloc.

> o	We account munmap
> o	We report the commit status in /proc
> o	Account and check on fork
> o	Review stack handling/building on exec
> o	SHMfs accounting
> o	Implement actual limit enforcement
> 
> To Do
> -----
> o	Account ptrace pages (this is hard)

    I know ptrace, and this seems to hint that ptrace also uses a good
amount of pages, but in normal operation I can ignore this.

    In summary, so far:

* only private writable mmaps are counted 'once per instance', which I
assume it means that if the same process uses the 'same' mmap twice (two
instances), then in gets counted twice, beacuase each instance is
separated from the other.

* malloc'ed and shared memory, again once per instance.

* those two things I couldn't figure out.

    Now it seems too simple! What I'm missing? :) Cheers,

	-- Marcos.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: committed memory, mmaps and shms
  2015-03-13 14:09         ` Michal Hocko
@ 2015-03-13 16:04           ` Marcos Dione
  0 siblings, 0 replies; 11+ messages in thread
From: Marcos Dione @ 2015-03-13 16:04 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, marcos-david.dione, linux-mm

On Fri, Mar 13, 2015 at 03:09:58PM +0100, Michal Hocko wrote:
> Well, the memory management subsystem is rather complex and it is not
> really trivial to match all the possible combinations into simple
> counters.

    Yes, I imagine.

> I would be interested in the particular usecase where you want the
> specific information and it is important outside of debugging purposes.

    Well, now it's more sheer curiosity than anything else, except for
the Commited_AS, which is directly related to work. I personalyy prefer
to a) have a full picture in my head and b) have it documented somwhere,
if at least in this thread. 

	-- Marcos.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: committed memory, mmaps and shms
  2015-03-13 14:58         ` Marcos Dione
@ 2015-06-03 16:26           ` Marcos Dione
  0 siblings, 0 replies; 11+ messages in thread
From: Marcos Dione @ 2015-06-03 16:26 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, marcos-david.dione, linux-mm

On Fri, Mar 13, 2015 at 11:58:51AM -0300, Marcos Dione wrote:
> On Thu, Mar 12, 2015 at 01:56:00PM -0300, Marcos Dione wrote:
> > On Thu, Mar 12, 2015 at 11:35:13AM -0400, Michal Hocko wrote:
> > > On Wed 11-03-15 19:10:44, Marcos Dione wrote:
> > > I also read Documentation/vm/overcommit-accounting
> > 
> > What would help you to understand it better?
> 
>     I think it's mostly a language barrier. The doc talks about of how
> the kernel handles the memory, but leaves userland people 'watching from
> outside the fence'. From the sysadmin and non-kernel developer (that not
> necesarily knows all the kinds of things that can be done with
> malloc/mmap/shem/&c) point of view, this is what I think the doc refers
> to:
> 
> > How It Works
> > ------------
> > 
> > The overcommit is based on the following rules
> > 
> > For a file backed map
> 
>     mmaps. are there more?

    answering myself: yes, code maps behave like this.

> >     SHARED or READ-only	-	0 cost (the file is the map not swap)
> >     PRIVATE WRITABLE	-	size of mapping per instance

    code is not writable, so only private writable mmaps are left. I
wonder why shared writable are accounted.

> > For an anonymous 
> 
>     malloc'ed memory
> 
> > or /dev/zero map
> 
>     hmmm, (read only?) mmap'ing on top of /dev/zero?
> 
> >     SHARED			-	size of mapping
> 
>     a shared anonymous memory is a shm?
> 
> >     PRIVATE READ-only	-	0 cost (but of little use)
> >     PRIVATE WRITABLE	-	size of mapping per instance
> 
>     I can't translate these two terms, unless the latter is the one
> refering specifically to mmalloc's. I wonder how could create several
> intances of the 'same' mapping in that case. forks?
> 
> > Additional accounting
> >     Pages made writable copies by mmap
> 
>     Hmmm, copy-on-write pages for when you write in a shared mmap? I'm
> wild guessing here, even when what I say doesn't make any sense.
> 
> >     shmfs memory drawn from the same pool
> 
>     Beats me.
[...]
>     Now it seems too simple! What I'm missing? :) Cheers,

    untrue, I'm still in the dark on what those mean. maybe someone can
translate those terms to userland terms? malloc, shm, mmap, code maps?
probably I'm missing some.

    cheers,

        -- Marcos.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-06-03 16:34 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-11 18:10 committed memory, mmaps and shms Marcos Dione
2015-03-11 20:10 ` Martin Steigerwald
2015-03-12 12:40 ` Michal Hocko
2015-03-12 14:54   ` Marcos Dione
2015-03-12 15:35     ` Michal Hocko
2015-03-12 16:56       ` Marcos Dione
2015-03-13 14:09         ` Michal Hocko
2015-03-13 16:04           ` Marcos Dione
2015-03-13 14:58         ` Marcos Dione
2015-06-03 16:26           ` Marcos Dione
2015-03-12 15:09   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).