LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] memcg: css_tryget_online cleanups
@ 2020-02-21 19:59 Shakeel Butt
  2020-02-22  1:10 ` Roman Gushchin
  0 siblings, 1 reply; 3+ messages in thread
From: Shakeel Butt @ 2020-02-21 19:59 UTC (permalink / raw)
  To: Johannes Weiner, Roman Gushchin
  Cc: Michal Hocko, Andrew Morton, linux-mm, cgroups, linux-kernel,
	Shakeel Butt

Currently multiple locations in memcg code, css_tryget_online() is being
used. However it doesn't matter whether the cgroup is online for the
callers. Online used to matter when we had reparenting on offlining and
we needed a way to prevent new ones from showing up.

The failure case for couple of these css_tryget_online usage is to
fallback to root_mem_cgroup which kind of make bypassing the memcg
limits possible for some workloads. For example creating an inotify
group in a subcontainer and then deleting that container after moving the
process to a different container will make all the event objects
allocated for that group to the root_mem_cgroup. So, using
css_tryget_online() is dangerous for such cases.

Two locations still use the online version. The swapin of offlined
memcg's pages and the memcg kmem cache creation. The kmem cache indeed
needs the online version as the kernel does the reparenting of memcg
kmem caches. For the swapin case, it has been left for later as the
fallback is not really that concerning.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 63bb6a2aab81..75fa8123909e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -656,7 +656,7 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
 	 */
 	__mem_cgroup_remove_exceeded(mz, mctz);
 	if (!soft_limit_excess(mz->memcg) ||
-	    !css_tryget_online(&mz->memcg->css))
+	    !css_tryget(&mz->memcg->css))
 		goto retry;
 done:
 	return mz;
@@ -962,7 +962,8 @@ struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
 		return NULL;
 
 	rcu_read_lock();
-	if (!memcg || !css_tryget_online(&memcg->css))
+	/* Page should not get uncharged and freed memcg under us. */
+	if (!memcg || WARN_ON(!css_tryget(&memcg->css)))
 		memcg = root_mem_cgroup;
 	rcu_read_unlock();
 	return memcg;
@@ -975,10 +976,13 @@ EXPORT_SYMBOL(get_mem_cgroup_from_page);
 static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
 {
 	if (unlikely(current->active_memcg)) {
-		struct mem_cgroup *memcg = root_mem_cgroup;
+		struct mem_cgroup *memcg;
 
 		rcu_read_lock();
-		if (css_tryget_online(&current->active_memcg->css))
+		/* current->active_memcg must hold a ref. */
+		if (WARN_ON(!css_tryget(&current->active_memcg->css)))
+			memcg = root_mem_cgroup;
+		else
 			memcg = current->active_memcg;
 		rcu_read_unlock();
 		return memcg;
@@ -6703,7 +6707,7 @@ void mem_cgroup_sk_alloc(struct sock *sk)
 		goto out;
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg->tcpmem_active)
 		goto out;
-	if (css_tryget_online(&memcg->css))
+	if (css_tryget(&memcg->css))
 		sk->sk_memcg = memcg;
 out:
 	rcu_read_unlock();
-- 
2.25.0.265.gbab2e86ba0-goog


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] memcg: css_tryget_online cleanups
  2020-02-21 19:59 [PATCH] memcg: css_tryget_online cleanups Shakeel Butt
@ 2020-02-22  1:10 ` Roman Gushchin
  2020-02-22  1:49   ` Shakeel Butt
  0 siblings, 1 reply; 3+ messages in thread
From: Roman Gushchin @ 2020-02-22  1:10 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Andrew Morton, linux-mm, cgroups,
	linux-kernel

On Fri, Feb 21, 2020 at 11:59:19AM -0800, Shakeel Butt wrote:
> Currently multiple locations in memcg code, css_tryget_online() is being
> used. However it doesn't matter whether the cgroup is online for the
> callers. Online used to matter when we had reparenting on offlining and
> we needed a way to prevent new ones from showing up.
> 
> The failure case for couple of these css_tryget_online usage is to
> fallback to root_mem_cgroup which kind of make bypassing the memcg
> limits possible for some workloads. For example creating an inotify
> group in a subcontainer and then deleting that container after moving the
> process to a different container will make all the event objects
> allocated for that group to the root_mem_cgroup. So, using
> css_tryget_online() is dangerous for such cases.
> 
> Two locations still use the online version. The swapin of offlined
> memcg's pages and the memcg kmem cache creation. The kmem cache indeed
> needs the online version as the kernel does the reparenting of memcg
> kmem caches. For the swapin case, it has been left for later as the
> fallback is not really that concerning.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>

Hello, Shakeel!

> ---
>  mm/memcontrol.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 63bb6a2aab81..75fa8123909e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -656,7 +656,7 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
>  	 */
>  	__mem_cgroup_remove_exceeded(mz, mctz);
>  	if (!soft_limit_excess(mz->memcg) ||
> -	    !css_tryget_online(&mz->memcg->css))
> +	    !css_tryget(&mz->memcg->css))

Looks good.

>  		goto retry;
>  done:
>  	return mz;
> @@ -962,7 +962,8 @@ struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
>  		return NULL;
>  
>  	rcu_read_lock();
> -	if (!memcg || !css_tryget_online(&memcg->css))
> +	/* Page should not get uncharged and freed memcg under us. */
> +	if (!memcg || WARN_ON(!css_tryget(&memcg->css)))

I'm slightly worried about this WARN_ON().
As I understand the idea is that the caller must own the page and make
sure that page->memcg remains intact. Do we really need this?

Also, I'd go with WARN_ON_ONCE() to limit the dmesg flow in the case
if something will go wrong.

>  		memcg = root_mem_cgroup;
>  	rcu_read_unlock();
>  	return memcg;
> @@ -975,10 +976,13 @@ EXPORT_SYMBOL(get_mem_cgroup_from_page);
>  static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
>  {
>  	if (unlikely(current->active_memcg)) {
> -		struct mem_cgroup *memcg = root_mem_cgroup;
> +		struct mem_cgroup *memcg;
>  
>  		rcu_read_lock();
> -		if (css_tryget_online(&current->active_memcg->css))
> +		/* current->active_memcg must hold a ref. */

Hm, does it?
memalloc_use_memcg() isn't touching the memcg's reference counter.
And if it does hold a reference, why can't we just do css_get()?

> +		if (WARN_ON(!css_tryget(&current->active_memcg->css)))
> +			memcg = root_mem_cgroup;

Btw, if css_tryget() fails here, what does it mean?
I'd s/WARN_ON/WARN_ON_ONCE too.

> +		else
>  			memcg = current->active_memcg;
>  		rcu_read_unlock();
>  		return memcg;
> @@ -6703,7 +6707,7 @@ void mem_cgroup_sk_alloc(struct sock *sk)
>  		goto out;
>  	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg->tcpmem_active)
>  		goto out;
> -	if (css_tryget_online(&memcg->css))
> +	if (css_tryget(&memcg->css))

So it can be offline, right? Makes sense.

>  		sk->sk_memcg = memcg;
>  out:
>  	rcu_read_unlock();
> -- 
> 2.25.0.265.gbab2e86ba0-goog
> 

Overall I have to admit it all is quite tricky. I had a patchset doing
a similar cleanup (but not only in the mm code), but dropped it after
Tejun showed me some edge cases, when it would cause a regression.

So I really think it's a valuable work, but we need to be careful here.

Thank you!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] memcg: css_tryget_online cleanups
  2020-02-22  1:10 ` Roman Gushchin
@ 2020-02-22  1:49   ` Shakeel Butt
  0 siblings, 0 replies; 3+ messages in thread
From: Shakeel Butt @ 2020-02-22  1:49 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Andrew Morton, Linux MM, Cgroups, LKML

On Fri, Feb 21, 2020 at 5:10 PM Roman Gushchin <guro@fb.com> wrote:
>
> On Fri, Feb 21, 2020 at 11:59:19AM -0800, Shakeel Butt wrote:
> > Currently multiple locations in memcg code, css_tryget_online() is being
> > used. However it doesn't matter whether the cgroup is online for the
> > callers. Online used to matter when we had reparenting on offlining and
> > we needed a way to prevent new ones from showing up.
> >
> > The failure case for couple of these css_tryget_online usage is to
> > fallback to root_mem_cgroup which kind of make bypassing the memcg
> > limits possible for some workloads. For example creating an inotify
> > group in a subcontainer and then deleting that container after moving the
> > process to a different container will make all the event objects
> > allocated for that group to the root_mem_cgroup. So, using
> > css_tryget_online() is dangerous for such cases.
> >
> > Two locations still use the online version. The swapin of offlined
> > memcg's pages and the memcg kmem cache creation. The kmem cache indeed
> > needs the online version as the kernel does the reparenting of memcg
> > kmem caches. For the swapin case, it has been left for later as the
> > fallback is not really that concerning.
> >
> > Signed-off-by: Shakeel Butt <shakeelb@google.com>
>
> Hello, Shakeel!
>
> > ---
> >  mm/memcontrol.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 63bb6a2aab81..75fa8123909e 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -656,7 +656,7 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
> >        */
> >       __mem_cgroup_remove_exceeded(mz, mctz);
> >       if (!soft_limit_excess(mz->memcg) ||
> > -         !css_tryget_online(&mz->memcg->css))
> > +         !css_tryget(&mz->memcg->css))
>
> Looks good.
>
> >               goto retry;
> >  done:
> >       return mz;
> > @@ -962,7 +962,8 @@ struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
> >               return NULL;
> >
> >       rcu_read_lock();
> > -     if (!memcg || !css_tryget_online(&memcg->css))
> > +     /* Page should not get uncharged and freed memcg under us. */
> > +     if (!memcg || WARN_ON(!css_tryget(&memcg->css)))
>
> I'm slightly worried about this WARN_ON().
> As I understand the idea is that the caller must own the page and make
> sure that page->memcg remains intact.

Yes you are correct.

> Do we really need this?

There are no current such users, maybe just the warning in the comment
is enough and use css_get(). I don't have any strong opinion. I will
at least convert the warning to once and wait for comments from
others.

>
> Also, I'd go with WARN_ON_ONCE() to limit the dmesg flow in the case
> if something will go wrong.
>
> >               memcg = root_mem_cgroup;
> >       rcu_read_unlock();
> >       return memcg;
> > @@ -975,10 +976,13 @@ EXPORT_SYMBOL(get_mem_cgroup_from_page);
> >  static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
> >  {
> >       if (unlikely(current->active_memcg)) {
> > -             struct mem_cgroup *memcg = root_mem_cgroup;
> > +             struct mem_cgroup *memcg;
> >
> >               rcu_read_lock();
> > -             if (css_tryget_online(&current->active_memcg->css))
> > +             /* current->active_memcg must hold a ref. */
>
> Hm, does it?
> memalloc_use_memcg() isn't touching the memcg's reference counter.
> And if it does hold a reference, why can't we just do css_get()?

The callers of the memalloc_use_memcg() should already have the refcnt
of the memcg elevated. I should add that to the comment description of
memalloc_use_memcg().

>
> > +             if (WARN_ON(!css_tryget(&current->active_memcg->css)))
> > +                     memcg = root_mem_cgroup;
>
> Btw, if css_tryget() fails here, what does it mean?
> I'd s/WARN_ON/WARN_ON_ONCE too.
>

If css_tryget() fails, it means someone is using memalloc_use_memcg()
without holding the reference to the memcg. Converting to once makes
sense.

> > +             else
> >                       memcg = current->active_memcg;
> >               rcu_read_unlock();
> >               return memcg;
> > @@ -6703,7 +6707,7 @@ void mem_cgroup_sk_alloc(struct sock *sk)
> >               goto out;
> >       if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg->tcpmem_active)
> >               goto out;
> > -     if (css_tryget_online(&memcg->css))
> > +     if (css_tryget(&memcg->css))
>
> So it can be offline, right? Makes sense.
>

Actually we got the memcg from the current just few lines above within
rcu lock. memcg can not go offline here, right?

> >               sk->sk_memcg = memcg;
> >  out:
> >       rcu_read_unlock();
> > --
> > 2.25.0.265.gbab2e86ba0-goog
> >
>
> Overall I have to admit it all is quite tricky. I had a patchset doing
> a similar cleanup (but not only in the mm code), but dropped it after
> Tejun showed me some edge cases, when it would cause a regression.
>
> So I really think it's a valuable work, but we need to be careful here.
>

Totally agreed.

Shakeel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-02-22  1:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-21 19:59 [PATCH] memcg: css_tryget_online cleanups Shakeel Butt
2020-02-22  1:10 ` Roman Gushchin
2020-02-22  1:49   ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).