Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH bpf] bpf: use get_file_rcu() instead of get_file() for task_file iterator
@ 2020-08-17 17:42 Yonghong Song
2020-08-17 18:27 ` Josef Bacik
0 siblings, 1 reply; 3+ messages in thread
From: Yonghong Song @ 2020-08-17 17:42 UTC (permalink / raw)
To: bpf, netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Josef Bacik
With latest `bpftool prog` command, we observed the following kernel
panic.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD dfe894067 P4D dfe894067 PUD deb663067 PMD 0
Oops: 0010 [#1] SMP
CPU: 9 PID: 6023 ...
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0000:ffffc900002b8f18 EFLAGS: 00010286
RAX: ffff8883a405f400 RBX: ffff888e46a6bf00 RCX: 000000008020000c
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883a405f400
RBP: ffff888e46a6bf50 R08: 0000000000000000 R09: ffffffff81129600
R10: ffff8883a405f300 R11: 0000160000000000 R12: 0000000000002710
R13: 000000e9494b690c R14: 0000000000000202 R15: 0000000000000009
FS: 00007fd9187fe700(0000) GS:ffff888e46a40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000de5d33002 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
rcu_core+0x1a4/0x440
__do_softirq+0xd3/0x2c8
irq_exit+0x9d/0xa0
smp_apic_timer_interrupt+0x68/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0033:0x47ce80
Code: Bad RIP value.
RSP: 002b:00007fd9187fba40 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000002 RBX: 00007fd931789160 RCX: 000000000000010c
RDX: 00007fd9308cdfb4 RSI: 00007fd9308cdfb4 RDI: 00007ffedd1ea0a8
RBP: 00007fd9187fbab0 R08: 000000000000000e R09: 000000000000002a
R10: 0000000000480210 R11: 00007fd9187fc570 R12: 00007fd9316cc400
R13: 0000000000000118 R14: 00007fd9308cdfb4 R15: 00007fd9317a9380
After further analysis, the bug is triggered by
Commit eaaacd23910f ("bpf: Add task and task/file iterator targets")
which introduced task_file bpf iterator, which traverses all open file
descriptors for all tasks in the current namespace.
The latest `bpftool prog` calls a task_file bpf program to traverse
all files in the system in order to associate processes with progs/maps, etc.
When traversing files for a given task, rcu read_lock is taken to
access all files in a file_struct. But it used get_file() to grab
a file, which is not right. It is possible file->f_count is 0 and
get_file() will unconditionally increase it.
Later put_file() may cause all kind of issues with the above
as one of sympotoms.
The failure can be reproduced with the following steps in a few seconds:
$ cat t.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define N 10000
int fd[N];
int main() {
int i;
for (i = 0; i < N; i++) {
fd[i] = open("./note.txt", 'r');
if (fd[i] < 0) {
fprintf(stderr, "failed\n");
return -1;
}
}
for (i = 0; i < N; i++)
close(fd[i]);
return 0;
}
$ gcc -O2 t.c
$ cat run.sh
#/bin/bash
for i in {1..100}
do
while true; do ./a.out; done &
done
$ ./run.sh
$ while true; do bpftool prog >& /dev/null; done
This patch used get_file_rcu() which only grabs a file if the
file->f_count is not zero. This is to ensure the file pointer
is always valid. The above reproducer did not fail for more
than 30 minutes.
Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets")
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
kernel/bpf/task_iter.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 232df29793e9..f21b5e1e4540 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -178,10 +178,11 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info,
f = fcheck_files(curr_files, curr_fd);
if (!f)
continue;
+ if (!get_file_rcu(f))
+ continue;
/* set info->fd */
info->fd = curr_fd;
- get_file(f);
rcu_read_unlock();
return f;
}
--
2.24.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH bpf] bpf: use get_file_rcu() instead of get_file() for task_file iterator
2020-08-17 17:42 [PATCH bpf] bpf: use get_file_rcu() instead of get_file() for task_file iterator Yonghong Song
@ 2020-08-17 18:27 ` Josef Bacik
2020-08-17 21:49 ` Alexei Starovoitov
0 siblings, 1 reply; 3+ messages in thread
From: Josef Bacik @ 2020-08-17 18:27 UTC (permalink / raw)
To: Yonghong Song, bpf, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
On 8/17/20 1:42 PM, Yonghong Song wrote:
> With latest `bpftool prog` command, we observed the following kernel
> panic.
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor instruction fetch in kernel mode
> #PF: error_code(0x0010) - not-present page
> PGD dfe894067 P4D dfe894067 PUD deb663067 PMD 0
> Oops: 0010 [#1] SMP
> CPU: 9 PID: 6023 ...
> RIP: 0010:0x0
> Code: Bad RIP value.
> RSP: 0000:ffffc900002b8f18 EFLAGS: 00010286
> RAX: ffff8883a405f400 RBX: ffff888e46a6bf00 RCX: 000000008020000c
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883a405f400
> RBP: ffff888e46a6bf50 R08: 0000000000000000 R09: ffffffff81129600
> R10: ffff8883a405f300 R11: 0000160000000000 R12: 0000000000002710
> R13: 000000e9494b690c R14: 0000000000000202 R15: 0000000000000009
> FS: 00007fd9187fe700(0000) GS:ffff888e46a40000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffffffffd6 CR3: 0000000de5d33002 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <IRQ>
> rcu_core+0x1a4/0x440
> __do_softirq+0xd3/0x2c8
> irq_exit+0x9d/0xa0
> smp_apic_timer_interrupt+0x68/0x120
> apic_timer_interrupt+0xf/0x20
> </IRQ>
> RIP: 0033:0x47ce80
> Code: Bad RIP value.
> RSP: 002b:00007fd9187fba40 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000002 RBX: 00007fd931789160 RCX: 000000000000010c
> RDX: 00007fd9308cdfb4 RSI: 00007fd9308cdfb4 RDI: 00007ffedd1ea0a8
> RBP: 00007fd9187fbab0 R08: 000000000000000e R09: 000000000000002a
> R10: 0000000000480210 R11: 00007fd9187fc570 R12: 00007fd9316cc400
> R13: 0000000000000118 R14: 00007fd9308cdfb4 R15: 00007fd9317a9380
>
> After further analysis, the bug is triggered by
> Commit eaaacd23910f ("bpf: Add task and task/file iterator targets")
> which introduced task_file bpf iterator, which traverses all open file
> descriptors for all tasks in the current namespace.
> The latest `bpftool prog` calls a task_file bpf program to traverse
> all files in the system in order to associate processes with progs/maps, etc.
> When traversing files for a given task, rcu read_lock is taken to
> access all files in a file_struct. But it used get_file() to grab
> a file, which is not right. It is possible file->f_count is 0 and
> get_file() will unconditionally increase it.
> Later put_file() may cause all kind of issues with the above
> as one of sympotoms.
>
> The failure can be reproduced with the following steps in a few seconds:
> $ cat t.c
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
>
> #define N 10000
> int fd[N];
> int main() {
> int i;
>
> for (i = 0; i < N; i++) {
> fd[i] = open("./note.txt", 'r');
> if (fd[i] < 0) {
> fprintf(stderr, "failed\n");
> return -1;
> }
> }
> for (i = 0; i < N; i++)
> close(fd[i]);
>
> return 0;
> }
> $ gcc -O2 t.c
> $ cat run.sh
> #/bin/bash
> for i in {1..100}
> do
> while true; do ./a.out; done &
> done
> $ ./run.sh
> $ while true; do bpftool prog >& /dev/null; done
>
> This patch used get_file_rcu() which only grabs a file if the
> file->f_count is not zero. This is to ensure the file pointer
> is always valid. The above reproducer did not fail for more
> than 30 minutes.
>
> Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets")
> Suggested-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Also verified that the problem no longer happens in production. Thanks,
Josef
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH bpf] bpf: use get_file_rcu() instead of get_file() for task_file iterator
2020-08-17 18:27 ` Josef Bacik
@ 2020-08-17 21:49 ` Alexei Starovoitov
0 siblings, 0 replies; 3+ messages in thread
From: Alexei Starovoitov @ 2020-08-17 21:49 UTC (permalink / raw)
To: Josef Bacik
Cc: Yonghong Song, bpf, Network Development, Alexei Starovoitov,
Daniel Borkmann, Kernel Team
On Mon, Aug 17, 2020 at 11:27 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On 8/17/20 1:42 PM, Yonghong Song wrote:
> > With latest `bpftool prog` command, we observed the following kernel
> > panic.
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> >
> > This patch used get_file_rcu() which only grabs a file if the
> > file->f_count is not zero. This is to ensure the file pointer
> > is always valid. The above reproducer did not fail for more
> > than 30 minutes.
> >
> > Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets")
> > Suggested-by: Josef Bacik <josef@toxicpanda.com>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Applied. Thanks
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-08-17 21:49 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-17 17:42 [PATCH bpf] bpf: use get_file_rcu() instead of get_file() for task_file iterator Yonghong Song
2020-08-17 18:27 ` Josef Bacik
2020-08-17 21:49 ` Alexei Starovoitov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).