Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v6 bpf-next 03/11] bpf: Introduce bpf timers.
Date: Wed, 14 Jul 2021 16:59:53 -0700	[thread overview]
Message-ID: <CAEf4BzbJb3q0=LwHs_JXXB2a7wsY=rCF7E+nxsM2SgcC6KK8jA@mail.gmail.com> (raw)
In-Reply-To: <20210714010519.37922-4-alexei.starovoitov@gmail.com>

On Tue, Jul 13, 2021 at 6:05 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Introduce 'struct bpf_timer { __u64 :64; __u64 :64; };' that can be embedded
> in hash/array/lru maps as a regular field and helpers to operate on it:
>
> // Initialize the timer.
> // First 4 bits of 'flags' specify clockid.
> // Only CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTTIME are allowed.
> long bpf_timer_init(struct bpf_timer *timer, struct bpf_map *map, int flags);
>
> // Configure the timer to call 'callback_fn' static function.
> long bpf_timer_set_callback(struct bpf_timer *timer, void *callback_fn);
>
> // Arm the timer to expire 'nsec' nanoseconds from the current time.
> long bpf_timer_start(struct bpf_timer *timer, u64 nsec, u64 flags);
>
> // Cancel the timer and wait for callback_fn to finish if it was running.
> long bpf_timer_cancel(struct bpf_timer *timer);
>
> Here is how BPF program might look like:
> struct map_elem {
>     int counter;
>     struct bpf_timer timer;
> };
>
> struct {
>     __uint(type, BPF_MAP_TYPE_HASH);
>     __uint(max_entries, 1000);
>     __type(key, int);
>     __type(value, struct map_elem);
> } hmap SEC(".maps");
>
> static int timer_cb(void *map, int *key, struct map_elem *val);
> /* val points to particular map element that contains bpf_timer. */
>
> SEC("fentry/bpf_fentry_test1")
> int BPF_PROG(test1, int a)
> {
>     struct map_elem *val;
>     int key = 0;
>
>     val = bpf_map_lookup_elem(&hmap, &key);
>     if (val) {
>         bpf_timer_init(&val->timer, &hmap, CLOCK_REALTIME);
>         bpf_timer_set_callback(&val->timer, timer_cb);
>         bpf_timer_start(&val->timer, 1000 /* call timer_cb2 in 1 usec */, 0);
>     }
> }
>
> This patch adds helper implementations that rely on hrtimers
> to call bpf functions as timers expire.
> The following patches add necessary safety checks.
>
> Only programs with CAP_BPF are allowed to use bpf_timer.
>
> The amount of timers used by the program is constrained by
> the memcg recorded at map creation time.
>
> The bpf_timer_init() helper needs explicit 'map' argument because inner maps
> are dynamic and not known at load time. While the bpf_timer_set_callback() is
> receiving hidden 'aux->prog' argument supplied by the verifier.
>
> The prog pointer is needed to do refcnting of bpf program to make sure that
> program doesn't get freed while the timer is armed. This approach relies on
> "user refcnt" scheme used in prog_array that stores bpf programs for
> bpf_tail_call. The bpf_timer_set_callback() will increment the prog refcnt which is
> paired with bpf_timer_cancel() that will drop the prog refcnt. The
> ops->map_release_uref is responsible for cancelling the timers and dropping
> prog refcnt when user space reference to a map reaches zero.
> This uref approach is done to make sure that Ctrl-C of user space process will
> not leave timers running forever unless the user space explicitly pinned a map
> that contained timers in bpffs.
>
> bpf_timer_init() and bpf_timer_set_callback() will return -EPERM if map doesn't
> have user references (is not held by open file descriptor from user space and
> not pinned in bpffs).
>
> The bpf_map_delete_elem() and bpf_map_update_elem() operations cancel
> and free the timer if given map element had it allocated.
> "bpftool map update" command can be used to cancel timers.
>
> The 'struct bpf_timer' is explicitly __attribute__((aligned(8))) because
> '__u64 :64' has 1 byte alignment of 8 byte padding.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  include/linux/bpf.h            |   3 +
>  include/uapi/linux/bpf.h       |  73 ++++++++
>  kernel/bpf/helpers.c           | 325 +++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c          | 109 +++++++++++
>  kernel/trace/bpf_trace.c       |   2 +-
>  scripts/bpf_doc.py             |   2 +
>  tools/include/uapi/linux/bpf.h |  73 ++++++++
>  7 files changed, 586 insertions(+), 1 deletion(-)
>

[...]

> +       if (!atomic64_read(&(map->usercnt))) {
> +               /* maps with timers must be either held by user space
> +                * or pinned in bpffs.
> +                */
> +               ret = -EPERM;
> +               goto out;
> +       }
> +       /* allocate hrtimer via map_kmalloc to use memcg accounting */
> +       t = bpf_map_kmalloc_node(map, sizeof(*t), GFP_ATOMIC, NUMA_NO_NODE);

I wonder if it would make sense to use map->numa_node here to keep map
value and timer data in the same NUMA node?

> +       if (!t) {
> +               ret = -ENOMEM;
> +               goto out;
> +       }

[...]

> +
> +/* This function is called by map_delete/update_elem for individual element.
> + * By ops->map_release_uref when the user space reference to a map reaches zero
> + * and by ops->map_free when the kernel reference reaches zero.

is ops->map_free part still valid?

> + */
> +void bpf_timer_cancel_and_free(void *val)
> +{
> +       struct bpf_timer_kern *timer = val;
> +       struct bpf_hrtimer *t;
> +

[...]

  reply	other threads:[~2021-07-15  0:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14  1:05 [PATCH v6 bpf-next 00/11] bpf: Introduce BPF timers Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 01/11] bpf: Prepare bpf_prog_put() to be called from irq context Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 02/11] bpf: Factor out bpf_spin_lock into helpers Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 03/11] bpf: Introduce bpf timers Alexei Starovoitov
2021-07-14 23:59   ` Andrii Nakryiko [this message]
2021-07-15  0:43     ` Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 04/11] bpf: Add map side support for " Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 05/11] bpf: Prevent pointer mismatch in bpf_timer_init Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 06/11] bpf: Remember BTF of inner maps Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 07/11] bpf: Relax verifier recursion check Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 08/11] bpf: Implement verifier support for validation of async callbacks Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 09/11] bpf: Teach stack depth check about " Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 10/11] selftests/bpf: Add bpf_timer test Alexei Starovoitov
2021-07-14  1:05 ` [PATCH v6 bpf-next 11/11] selftests/bpf: Add a test with bpf_timer in inner map Alexei Starovoitov
2021-07-14 23:59 ` [PATCH v6 bpf-next 00/11] bpf: Introduce BPF timers Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzbJb3q0=LwHs_JXXB2a7wsY=rCF7E+nxsM2SgcC6KK8jA@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --subject='Re: [PATCH v6 bpf-next 03/11] bpf: Introduce bpf timers.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).