LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	"Rik van Riel" <riel@surriel.com>,
	Christoph Lameter <cl@linux.com>,
	"Vladimir Davydov" <vdavydov.dev@gmail.com>,
	Cgroups <cgroups@vger.kernel.org>
Subject: Re: [PATCH v3 0/7] mm: reparent slab memory on cgroup removal
Date: Tue, 14 May 2019 20:04:07 +0000	[thread overview]
Message-ID: <20190514200402.GE12629@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <CALvZod4GscZjob8bfCcfhsMh0sco16r4yfOaRU69WnNO7MRrpw@mail.gmail.com>

On Tue, May 14, 2019 at 12:22:08PM -0700, Shakeel Butt wrote:
> From: Roman Gushchin <guro@fb.com>
> Date: Mon, May 13, 2019 at 1:22 PM
> To: Shakeel Butt
> Cc: Andrew Morton, Linux MM, LKML, Kernel Team, Johannes Weiner,
> Michal Hocko, Rik van Riel, Christoph Lameter, Vladimir Davydov,
> Cgroups
> 
> > On Fri, May 10, 2019 at 05:32:15PM -0700, Shakeel Butt wrote:
> > > From: Roman Gushchin <guro@fb.com>
> > > Date: Wed, May 8, 2019 at 1:30 PM
> > > To: Andrew Morton, Shakeel Butt
> > > Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
> > > <kernel-team@fb.com>, Johannes Weiner, Michal Hocko, Rik van Riel,
> > > Christoph Lameter, Vladimir Davydov, <cgroups@vger.kernel.org>, Roman
> > > Gushchin
> > >
> > > > # Why do we need this?
> > > >
> > > > We've noticed that the number of dying cgroups is steadily growing on most
> > > > of our hosts in production. The following investigation revealed an issue
> > > > in userspace memory reclaim code [1], accounting of kernel stacks [2],
> > > > and also the mainreason: slab objects.
> > > >
> > > > The underlying problem is quite simple: any page charged
> > > > to a cgroup holds a reference to it, so the cgroup can't be reclaimed unless
> > > > all charged pages are gone. If a slab object is actively used by other cgroups,
> > > > it won't be reclaimed, and will prevent the origin cgroup from being reclaimed.
> > > >
> > > > Slab objects, and first of all vfs cache, is shared between cgroups, which are
> > > > using the same underlying fs, and what's even more important, it's shared
> > > > between multiple generations of the same workload. So if something is running
> > > > periodically every time in a new cgroup (like how systemd works), we do
> > > > accumulate multiple dying cgroups.
> > > >
> > > > Strictly speaking pagecache isn't different here, but there is a key difference:
> > > > we disable protection and apply some extra pressure on LRUs of dying cgroups,
> > >
> > > How do you apply extra pressure on dying cgroups? cgroup-v2 does not
> > > have memory.force_empty.
> >
> > I mean the following part of get_scan_count():
> >         /*
> >          * If the cgroup's already been deleted, make sure to
> >          * scrape out the remaining cache.
> >          */
> >         if (!scan && !mem_cgroup_online(memcg))
> >                 scan = min(lruvec_size, SWAP_CLUSTER_MAX);
> >
> > It seems to work well, so that pagecache alone doesn't pin too many
> > dying cgroups. The price we're paying is some excessive IO here,
> 
> Thanks for the explanation. However for this to work, something still
> needs to trigger the memory pressure until then we will keep the
> zombies around. BTW the get_scan_count() is getting really creepy. It
> needs a refactor soon.

Sure, but that's true for all sorts of memory.
Re get_scan_count(): for sure, yeah, it's way too hairy now.

> 
> > which can be avoided had we be able to recharge the pagecache.
> >
> 
> Are you looking into this? Do you envision a mount option which will
> tell the filesystem is shared and do recharging on the offlining of
> the origin memcg?

Not really working on it now, but thinking of what to do here long-term.
One of the ideas I have (just an idea for now) is to move memcg pointer
from individual pages to the inode level. It can bring more opportunities
in terms of recharging and reparenting, but I'm not sure how complex it is
and what are possible downsides.

Do you have any plans or ideas here?

> 
> > Btw, thank you very much for looking into the patchset. I'll address
> > all comments and send v4 soon.
> >
> 
> You are most welcome.

Thanks!

  reply	other threads:[~2019-05-14 20:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190508202458.550808-1-guro@fb.com>
2019-05-11  0:32 ` Shakeel Butt
2019-05-13 20:21   ` Roman Gushchin
2019-05-14 19:22     ` Shakeel Butt
2019-05-14 20:04       ` Roman Gushchin [this message]
     [not found] ` <20190508202458.550808-2-guro@fb.com>
2019-05-11  0:32   ` [PATCH v3 1/7] mm: postpone kmem_cache memcg pointer initialization to memcg_link_cache() Shakeel Butt
     [not found] ` <20190508202458.550808-3-guro@fb.com>
2019-05-11  0:33   ` [PATCH v3 2/7] mm: generalize postponed non-root kmem_cache deactivation Shakeel Butt
     [not found] ` <20190508202458.550808-4-guro@fb.com>
2019-05-11  0:33   ` [PATCH v3 3/7] mm: introduce __memcg_kmem_uncharge_memcg() Shakeel Butt
     [not found] ` <20190508202458.550808-6-guro@fb.com>
2019-05-11  0:33   ` [PATCH v3 5/7] mm: rework non-root kmem_cache lifecycle management Shakeel Butt
     [not found] ` <20190508202458.550808-7-guro@fb.com>
2019-05-11  0:34   ` [PATCH v3 6/7] mm: reparent slab memory on cgroup removal Shakeel Butt
     [not found] ` <20190508202458.550808-8-guro@fb.com>
2019-05-11  0:34   ` [PATCH v3 7/7] mm: fix /proc/kpagecgroup interface for slab pages Shakeel Butt
     [not found] ` <20190508202458.550808-5-guro@fb.com>
2019-05-11  0:33   ` [PATCH v3 4/7] mm: unify SLAB and SLUB page accounting Shakeel Butt
2019-05-13 18:01   ` Christopher Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190514200402.GE12629@tower.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    --subject='Re: [PATCH v3 0/7] mm: reparent slab memory on cgroup removal' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).