LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Roman Gushchin <guro@fb.com>, Alexei Starovoitov <ast@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>, Kernel Team <Kernel-team@fb.com>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jolsa@redhat.com" <jolsa@redhat.com>
Subject: Re: [PATCH v2 bpf-next 1/4] bpf: decouple the lifetime of cgroup_bpf from cgroup itself
Date: Thu, 23 May 2019 05:33:45 +0000	[thread overview]
Message-ID: <530bba0f-9e13-5308-fc93-d0dab0c56fcc@fb.com> (raw)
In-Reply-To: <20190522232051.2938491-2-guro@fb.com>



On 5/22/19 4:20 PM, Roman Gushchin wrote:
> Currently the lifetime of bpf programs attached to a cgroup is bound
> to the lifetime of the cgroup itself. It means that if a user
> forgets (or intentionally avoids) to detach a bpf program before
> removing the cgroup, it will stay attached up to the release of the
> cgroup. Since the cgroup can stay in the dying state (the state
> between being rmdir()'ed and being released) for a very long time, it
> leads to a waste of memory. Also, it blocks a possibility to implement
> the memcg-based memory accounting for bpf objects, because a circular
> reference dependency will occur. Charged memory pages are pinning the
> corresponding memory cgroup, and if the memory cgroup is pinning
> the attached bpf program, nothing will be ever released.
> 
> A dying cgroup can not contain any processes, so the only chance for
> an attached bpf program to be executed is a live socket associated
> with the cgroup. So in order to release all bpf data early, let's
> count associated sockets using a new percpu refcounter. On cgroup
> removal the counter is transitioned to the atomic mode, and as soon
> as it reaches 0, all bpf programs are detached.
> 
> The reference counter is not socket specific, and can be used for any
> other types of programs, which can be executed from a cgroup-bpf hook
> outside of the process context, had such a need arise in the future.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: jolsa@redhat.com

The logic looks sound to me. With one nit below,
Acked-by: Yonghong Song <yhs@fb.com>

> ---
>   include/linux/bpf-cgroup.h |  8 ++++++--
>   include/linux/cgroup.h     | 18 ++++++++++++++++++
>   kernel/bpf/cgroup.c        | 25 ++++++++++++++++++++++---
>   kernel/cgroup/cgroup.c     | 11 ++++++++---
>   4 files changed, 54 insertions(+), 8 deletions(-)
> 
[...]
> @@ -167,7 +178,12 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
>    */
>   #define	NR ARRAY_SIZE(cgrp->bpf.effective)
>   	struct bpf_prog_array __rcu *arrays[NR] = {};
> -	int i;
> +	int ret, i;
> +
> +	ret = percpu_ref_init(&cgrp->bpf.refcnt, cgroup_bpf_release, 0,
> +			      GFP_KERNEL);
> +	if (ret)
> +		return -ENOMEM;
Maybe return "ret" here instead of -ENOMEM. Currently, percpu_ref_init
only return error code is -ENOMEM. But in the future, it could
change?

>   
>   	for (i = 0; i < NR; i++)
>   		INIT_LIST_HEAD(&cgrp->bpf.progs[i]);
> @@ -183,6 +199,9 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
>   cleanup:
>   	for (i = 0; i < NR; i++)
>   		bpf_prog_array_free(arrays[i]);
> +
> +	percpu_ref_exit(&cgrp->bpf.refcnt);
> +
>   	return -ENOMEM;
>   }
>   
[...]

  reply	other threads:[~2019-05-23  5:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-22 23:20 [PATCH v2 bpf-next 0/4] cgroup bpf auto-detachment Roman Gushchin
2019-05-22 23:20 ` [PATCH v2 bpf-next 1/4] bpf: decouple the lifetime of cgroup_bpf from cgroup itself Roman Gushchin
2019-05-23  5:33   ` Yonghong Song [this message]
2019-05-22 23:20 ` [PATCH v2 bpf-next 2/4] selftests/bpf: convert test_cgrp2_attach2 example into kselftest Roman Gushchin
2019-05-23  5:39   ` Yonghong Song
2019-05-22 23:20 ` [PATCH v2 bpf-next 3/4] selftests/bpf: enable all available cgroup v2 controllers Roman Gushchin
2019-05-23  5:45   ` Yonghong Song
2019-05-22 23:20 ` [PATCH v2 bpf-next 4/4] selftests/bpf: add auto-detach test Roman Gushchin
2019-05-23  5:47   ` Yonghong Song
2019-05-23 17:58     ` Roman Gushchin
2019-05-23 23:06       ` Yonghong Song
2019-05-23  5:52   ` Yonghong Song
2019-05-23  5:17 ` [PATCH v2 bpf-next 0/4] cgroup bpf auto-detachment Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=530bba0f-9e13-5308-fc93-d0dab0c56fcc@fb.com \
    --to=yhs@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=guro@fb.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@fomichev.me \
    --cc=tj@kernel.org \
    --subject='Re: [PATCH v2 bpf-next 1/4] bpf: decouple the lifetime of cgroup_bpf from cgroup itself' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).