LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Y Song <ys114321@gmail.com>
To: Alban Crequy <alban@kinvolk.io>
Cc: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Linux Containers" <containers@lists.linux-foundation.org>,
	cgroups@vger.kernel.org, "Tejun Heo" <tj@kernel.org>,
	"Iago López Galeiras" <iago@kinvolk.io>
Subject: Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
Date: Fri, 25 May 2018 09:28:22 -0700	[thread overview]
Message-ID: <CAH3MdRWcLa7hkohdU74TPj2HinWzA=QTHAHd+T7WM4GDMOQ-Kg@mail.gmail.com> (raw)
In-Reply-To: <CADZs7q4xd1CwGULvYe2-Y2aYpwhiiw3upF=mAK0ve_-jrk1yFg@mail.gmail.com>

On Fri, May 25, 2018 at 8:21 AM, Alban Crequy <alban@kinvolk.io> wrote:
> On Wed, May 23, 2018 at 4:34 AM Y Song <ys114321@gmail.com> wrote:
>
>> I did a quick prototyping and the above interface seems working fine.
>
> Thanks! I gave your kernel patch & userspace program a try and it works for
> me on cgroup-v2.
>
> Also, I found out how to get my containers to use both cgroup-v1 and
> cgroup-v2 (by enabling systemd's hybrid cgroup mode and docker's
> '--exec-opt native.cgroupdriver=systemd' option). So I should be able to
> use the BPF helper function without having to add support for all the
> cgroup-v1 hierarchies.

Great. Will submit a formal patch soon.

>
>> The kernel change:
>> ===============
>
>> [yhs@localhost bpf-next]$ git diff
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 97446bbe2ca5..669b7383fddb 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1976,7 +1976,8 @@ union bpf_attr {
>>          FN(fib_lookup),                 \
>>          FN(sock_hash_update),           \
>>          FN(msg_redirect_hash),          \
>> -       FN(sk_redirect_hash),
>> +       FN(sk_redirect_hash),           \
>> +       FN(get_current_cgroup_id),
>
>>   /* integer value in 'imm' field of BPF_CALL instruction selects which
> helper
>>    * function eBPF program intends to call
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ce2cbbff27e4..e11e3298f911 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -493,6 +493,21 @@ static const struct bpf_func_proto
>> bpf_current_task_under_cgroup_proto = {
>>          .arg2_type      = ARG_ANYTHING,
>>   };
>
>> +BPF_CALL_0(bpf_get_current_cgroup_id)
>> +{
>> +       struct cgroup *cgrp = task_dfl_cgroup(current);
>> +       if (!cgrp)
>> +               return -EINVAL;
>> +
>> +       return cgrp->kn->id.id;
>> +}
>> +
>> +static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>> +       .func           = bpf_get_current_cgroup_id,
>> +       .gpl_only       = false,
>> +       .ret_type       = RET_INTEGER,
>> +};
>> +
>>   BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
>>             const void *, unsafe_ptr)
>>   {
>> @@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
>> struct bpf_prog *prog)
>>                  return &bpf_get_prandom_u32_proto;
>>          case BPF_FUNC_probe_read_str:
>>                  return &bpf_probe_read_str_proto;
>> +       case BPF_FUNC_get_current_cgroup_id:
>> +               return &bpf_get_current_cgroup_id_proto;
>>          default:
>>                  return NULL;
>>          }
>
>> The following program can be used to print out a cgroup id given a cgroup
> path.
>> [yhs@localhost cg]$ cat get_cgroup_id.c
>> #define _GNU_SOURCE
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>
>> int main(int argc, char **argv)
>> {
>>      int dirfd, err, flags, mount_id, fhsize;
>>      struct file_handle *fhp;
>>      char *pathname;
>
>>      if (argc != 2) {
>>          printf("usage: %s <cgroup_path>\n", argv[0]);
>>          return 1;
>>      }
>
>>      pathname = argv[1];
>>      dirfd = AT_FDCWD;
>>      flags = 0;
>
>>      fhsize = sizeof(*fhp);
>>      fhp = malloc(fhsize);
>>      if (!fhp)
>>          return 1;
>
>>      err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
>>      if (err >= 0) {
>>          printf("error\n");
>>          return 1;
>>      }
>
>>      fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
>>      fhp = realloc(fhp, fhsize);
>>      if (!fhp)
>>          return 1;
>
>>      err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
>>      if (err < 0)
>>          perror("name_to_handle_at");
>>      else {
>>          int i;
>
>>          printf("dir = %s, mount_id = %d\n", pathname, mount_id);
>>          printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
>>              fhp->handle_type);
>>          if (fhp->handle_bytes != 8)
>>              return 1;
>
>>          printf("cgroup_id = 0x%llx\n", *(unsigned long long
> *)fhp->f_handle);
>>      }
>
>>      return 0;
>> }
>> [yhs@localhost cg]$
>
>> Given a cgroup path, the user can get cgroup_id and use it in their bpf
>> program for filtering purpose.
>
>> I run a simple program t.c
>>     int main() { while(1) sleep(1); return 0; }
>> in the cgroup v2 directory /home/yhs/tmp/yhs
>>     none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel)
>
>> $ ./get_cgroup_id /home/yhs/tmp/yhs
>> dir = /home/yhs/tmp/yhs, mount_id = 124
>> handle_bytes = 8, handle_type = 1
>> cgroup_id = 0x1000006b2
>
>> // the below command to get cgroup_id from the kernel for the
>> // process compiled with t.c and ran under /home/yhs/tmp/yhs:
>> $ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid'
>> PID     TID     COMM            FUNC             -
>> 4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
>> 4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
>> 4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
>> ^C[yhs@localhost tools]$
>
>> The kernel and user space cgid matches. Will provide a
>> formal patch later.
>
>
>
>
>> On Mon, May 21, 2018 at 5:24 PM, Y Song <ys114321@gmail.com> wrote:
>> > On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
>> > <alexei.starovoitov@gmail.com> wrote:
>> >> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
>> >>>
>> >>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
>> >>> +{
>> >>> +     // TODO: pick the correct hierarchy instead of the mem
> controller
>> >>> +     struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
>> >>> +
>> >>> +     if (unlikely(!cgrp))
>> >>> +             return -EINVAL;
>> >>> +     if (unlikely(hierarchy))
>> >>> +             return -EINVAL;
>> >>> +     if (unlikely(flags))
>> >>> +             return -EINVAL;
>> >>> +
>> >>> +     return cgrp->kn->id.ino;
>> >>
>> >> ino only is not enough to identify cgroup. It needs generation number
> too.
>> >> I don't quite see how hierarchy and flags can be used in the future.
>> >> Also why limit it to memcg?
>> >>
>> >> How about something like this instead:
>> >>
>> >> BPF_CALL_2(bpf_get_current_cgroup_id)
>> >> {
>> >>         struct cgroup *cgrp = task_dfl_cgroup(current);
>> >>
>> >>         return cgrp->kn->id.id;
>> >> }
>> >> The user space can use fhandle api to get the same 64-bit id.
>> >
>> > I think this should work. This will also be useful to bcc as user
>> > space can encode desired id
>> > in the bpf program and compared that id to the current cgroup id, so we
> can have
>> > cgroup level tracing (esp. stat collection) support. To cope with
>> > cgroup hierarchy, user can use
>> > cgroup-array based approach or explicitly compare against multiple
> cgroup id's.

      reply	other threads:[~2018-05-25 16:29 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-13 17:33 Alban Crequy
2018-05-14 19:38 ` Y Song
2018-05-21 13:52   ` Alban Crequy
2018-05-21 16:26 ` Alexei Starovoitov
2018-05-22  0:24   ` Y Song
2018-05-23  3:33     ` Y Song
2018-05-23  3:35       ` Alexei Starovoitov
2018-05-23  4:31         ` Y Song
2018-05-23  8:57           ` Daniel Borkmann
2018-05-25 15:21       ` Alban Crequy
2018-05-25 16:28         ` Y Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH3MdRWcLa7hkohdU74TPj2HinWzA=QTHAHd+T7WM4GDMOQ-Kg@mail.gmail.com' \
    --to=ys114321@gmail.com \
    --cc=alban@kinvolk.io \
    --cc=alexei.starovoitov@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=iago@kinvolk.io \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tj@kernel.org \
    --subject='Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).