LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Alban Crequy <alban@kinvolk.io>
To: Y Song <ys114321@gmail.com>
Cc: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
netdev <netdev@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
"Linux Containers" <containers@lists.linux-foundation.org>,
cgroups@vger.kernel.org, "Tejun Heo" <tj@kernel.org>,
"Iago López Galeiras" <iago@kinvolk.io>
Subject: Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
Date: Fri, 25 May 2018 16:21:50 +0100 [thread overview]
Message-ID: <CADZs7q4xd1CwGULvYe2-Y2aYpwhiiw3upF=mAK0ve_-jrk1yFg@mail.gmail.com> (raw)
In-Reply-To: <CAH3MdRVdfw52atavT3KL8MpPw7zDM_hR6aUcqDP1PogLn_sH+w@mail.gmail.com>
On Wed, May 23, 2018 at 4:34 AM Y Song <ys114321@gmail.com> wrote:
> I did a quick prototyping and the above interface seems working fine.
Thanks! I gave your kernel patch & userspace program a try and it works for
me on cgroup-v2.
Also, I found out how to get my containers to use both cgroup-v1 and
cgroup-v2 (by enabling systemd's hybrid cgroup mode and docker's
'--exec-opt native.cgroupdriver=systemd' option). So I should be able to
use the BPF helper function without having to add support for all the
cgroup-v1 hierarchies.
> The kernel change:
> ===============
> [yhs@localhost bpf-next]$ git diff
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 97446bbe2ca5..669b7383fddb 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1976,7 +1976,8 @@ union bpf_attr {
> FN(fib_lookup), \
> FN(sock_hash_update), \
> FN(msg_redirect_hash), \
> - FN(sk_redirect_hash),
> + FN(sk_redirect_hash), \
> + FN(get_current_cgroup_id),
> /* integer value in 'imm' field of BPF_CALL instruction selects which
helper
> * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ce2cbbff27e4..e11e3298f911 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -493,6 +493,21 @@ static const struct bpf_func_proto
> bpf_current_task_under_cgroup_proto = {
> .arg2_type = ARG_ANYTHING,
> };
> +BPF_CALL_0(bpf_get_current_cgroup_id)
> +{
> + struct cgroup *cgrp = task_dfl_cgroup(current);
> + if (!cgrp)
> + return -EINVAL;
> +
> + return cgrp->kn->id.id;
> +}
> +
> +static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
> + .func = bpf_get_current_cgroup_id,
> + .gpl_only = false,
> + .ret_type = RET_INTEGER,
> +};
> +
> BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
> const void *, unsafe_ptr)
> {
> @@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
> struct bpf_prog *prog)
> return &bpf_get_prandom_u32_proto;
> case BPF_FUNC_probe_read_str:
> return &bpf_probe_read_str_proto;
> + case BPF_FUNC_get_current_cgroup_id:
> + return &bpf_get_current_cgroup_id_proto;
> default:
> return NULL;
> }
> The following program can be used to print out a cgroup id given a cgroup
path.
> [yhs@localhost cg]$ cat get_cgroup_id.c
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> int main(int argc, char **argv)
> {
> int dirfd, err, flags, mount_id, fhsize;
> struct file_handle *fhp;
> char *pathname;
> if (argc != 2) {
> printf("usage: %s <cgroup_path>\n", argv[0]);
> return 1;
> }
> pathname = argv[1];
> dirfd = AT_FDCWD;
> flags = 0;
> fhsize = sizeof(*fhp);
> fhp = malloc(fhsize);
> if (!fhp)
> return 1;
> err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
> if (err >= 0) {
> printf("error\n");
> return 1;
> }
> fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
> fhp = realloc(fhp, fhsize);
> if (!fhp)
> return 1;
> err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
> if (err < 0)
> perror("name_to_handle_at");
> else {
> int i;
> printf("dir = %s, mount_id = %d\n", pathname, mount_id);
> printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
> fhp->handle_type);
> if (fhp->handle_bytes != 8)
> return 1;
> printf("cgroup_id = 0x%llx\n", *(unsigned long long
*)fhp->f_handle);
> }
> return 0;
> }
> [yhs@localhost cg]$
> Given a cgroup path, the user can get cgroup_id and use it in their bpf
> program for filtering purpose.
> I run a simple program t.c
> int main() { while(1) sleep(1); return 0; }
> in the cgroup v2 directory /home/yhs/tmp/yhs
> none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel)
> $ ./get_cgroup_id /home/yhs/tmp/yhs
> dir = /home/yhs/tmp/yhs, mount_id = 124
> handle_bytes = 8, handle_type = 1
> cgroup_id = 0x1000006b2
> // the below command to get cgroup_id from the kernel for the
> // process compiled with t.c and ran under /home/yhs/tmp/yhs:
> $ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid'
> PID TID COMM FUNC -
> 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2
> 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2
> 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2
> ^C[yhs@localhost tools]$
> The kernel and user space cgid matches. Will provide a
> formal patch later.
> On Mon, May 21, 2018 at 5:24 PM, Y Song <ys114321@gmail.com> wrote:
> > On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
> >>>
> >>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
> >>> +{
> >>> + // TODO: pick the correct hierarchy instead of the mem
controller
> >>> + struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
> >>> +
> >>> + if (unlikely(!cgrp))
> >>> + return -EINVAL;
> >>> + if (unlikely(hierarchy))
> >>> + return -EINVAL;
> >>> + if (unlikely(flags))
> >>> + return -EINVAL;
> >>> +
> >>> + return cgrp->kn->id.ino;
> >>
> >> ino only is not enough to identify cgroup. It needs generation number
too.
> >> I don't quite see how hierarchy and flags can be used in the future.
> >> Also why limit it to memcg?
> >>
> >> How about something like this instead:
> >>
> >> BPF_CALL_2(bpf_get_current_cgroup_id)
> >> {
> >> struct cgroup *cgrp = task_dfl_cgroup(current);
> >>
> >> return cgrp->kn->id.id;
> >> }
> >> The user space can use fhandle api to get the same 64-bit id.
> >
> > I think this should work. This will also be useful to bcc as user
> > space can encode desired id
> > in the bpf program and compared that id to the current cgroup id, so we
can have
> > cgroup level tracing (esp. stat collection) support. To cope with
> > cgroup hierarchy, user can use
> > cgroup-array based approach or explicitly compare against multiple
cgroup id's.
next prev parent reply other threads:[~2018-05-25 15:22 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-13 17:33 Alban Crequy
2018-05-14 19:38 ` Y Song
2018-05-21 13:52 ` Alban Crequy
2018-05-21 16:26 ` Alexei Starovoitov
2018-05-22 0:24 ` Y Song
2018-05-23 3:33 ` Y Song
2018-05-23 3:35 ` Alexei Starovoitov
2018-05-23 4:31 ` Y Song
2018-05-23 8:57 ` Daniel Borkmann
2018-05-25 15:21 ` Alban Crequy [this message]
2018-05-25 16:28 ` Y Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CADZs7q4xd1CwGULvYe2-Y2aYpwhiiw3upF=mAK0ve_-jrk1yFg@mail.gmail.com' \
--to=alban@kinvolk.io \
--cc=alexei.starovoitov@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=iago@kinvolk.io \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tj@kernel.org \
--cc=ys114321@gmail.com \
--subject='Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).