Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
  2020-08-18 22:23 [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
@ 2020-08-18 22:23 ` Yonghong Song
  2020-08-19  0:05   ` Alexei Starovoitov
  2020-08-18 22:23 ` [PATCH bpf v2 2/3] bpf: avoid visit same object multiple times Yonghong Song
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Yonghong Song @ 2020-08-18 22:23 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Paul E . McKenney

In our production system, we observed rcu stalls when
'bpftool prog` is running.
  rcu: INFO: rcu_sched self-detected stall on CPU
  rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
  \x09(t=21031 jiffies g=2534773 q=179750)
  NMI backtrace for cpu 7
  CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G        W         5.8.0-00004-g68bfc7f8c1b4 #6
  Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
  Call Trace:
  <IRQ>
  dump_stack+0x57/0x70
  nmi_cpu_backtrace.cold+0x14/0x53
  ? lapic_can_unplug_cpu.cold+0x39/0x39
  nmi_trigger_cpumask_backtrace+0xb7/0xc7
  rcu_dump_cpu_stacks+0xa2/0xd0
  rcu_sched_clock_irq.cold+0x1ff/0x3d9
  ? tick_nohz_handler+0x100/0x100
  update_process_times+0x5b/0x90
  tick_sched_timer+0x5e/0xf0
  __hrtimer_run_queues+0x12a/0x2a0
  hrtimer_interrupt+0x10e/0x280
  __sysvec_apic_timer_interrupt+0x51/0xe0
  asm_call_on_stack+0xf/0x20
  </IRQ>
  sysvec_apic_timer_interrupt+0x6f/0x80
  asm_sysvec_apic_timer_interrupt+0x12/0x20
  RIP: 0010:task_file_seq_get_next+0x71/0x220
  Code: 00 00 8b 53 1c 49 8b 7d 00 89 d6 48 8b 47 20 44 8b 18 41 39 d3 76 75 48 8b 4f 20 8b 01 39 d0 76 61 41 89 d1 49 39 c1 48 19 c0 <48> 8b 49 08 21 d0 48 8d 04 c1 4c 8b 08 4d 85 c9 74 46 49 8b 41 38
  RSP: 0018:ffffc90006223e10 EFLAGS: 00000297
  RAX: ffffffffffffffff RBX: ffff888f0d172388 RCX: ffff888c8c07c1c0
  RDX: 00000000000f017b RSI: 00000000000f017b RDI: ffff888c254702c0
  RBP: ffffc90006223e68 R08: ffff888be2a1c140 R09: 00000000000f017b
  R10: 0000000000000002 R11: 0000000000100000 R12: ffff888f23c24118
  R13: ffffc90006223e60 R14: ffffffff828509a0 R15: 00000000ffffffff
  task_file_seq_next+0x52/0xa0
  bpf_seq_read+0xb9/0x320
  vfs_read+0x9d/0x180
  ksys_read+0x5f/0xe0
  do_syscall_64+0x38/0x60
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f8815f4f76e
  Code: c0 e9 f6 fe ff ff 55 48 8d 3d 76 70 0a 00 48 89 e5 e8 36 06 02 00 66 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 52 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
  RSP: 002b:00007fff8f9df578 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
  RAX: ffffffffffffffda RBX: 000000000170b9c0 RCX: 00007f8815f4f76e
  RDX: 0000000000001000 RSI: 00007fff8f9df5b0 RDI: 0000000000000007
  RBP: 00007fff8f9e05f0 R08: 0000000000000049 R09: 0000000000000010
  R10: 00007f881601fa40 R11: 0000000000000246 R12: 00007fff8f9e05a8
  R13: 00007fff8f9e05a8 R14: 0000000001917f90 R15: 000000000000e22e

Note that `bpftool prog` actually calls a task_file bpf iterator
program to establish an association between prog/map/link/btf anon
files and processes.

In the case where the above rcu stall occured, we had a process
having 1587 tasks and each task having roughly 81305 files.
This implied 129 million bpf prog invocations. Unfortunwtely none of
these files are prog/map/link/btf files so bpf iterator/prog needs
to traverse all these files and not able to return to user space
since there are no seq_file buffer overflow.

This patch fixed the issue in bpf_seq_read() to limit the number
of visited objects. If the maximum number of visited objects is
reached, no more objects will be visited in the current syscall.
If there is nothing written in the seq_file buffer, -EAGAIN will
return to the user so user can try again.

The maximum number of visited objects is set at 1 million.
In our Intel Xeon D-2191 2.3GHZ 18-core server, bpf_seq_read()
visiting 1 million files takes around 0.18 seconds.

We did not use cond_resched() since for some iterators, e.g.,
netlink iterator, where rcu read_lock critical section spans between
consecutive seq_ops->next(), which makes impossible to do cond_resched()
in the key while loop of function bpf_seq_read().

Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/bpf_iter.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index b6715964b685..8faa2ce89396 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -67,6 +67,9 @@ static void bpf_iter_done_stop(struct seq_file *seq)
 	iter_priv->done_stop = true;
 }
 
+/* maximum visited objects before bailing out */
+#define MAX_ITER_OBJECTS	1000000
+
 /* bpf_seq_read, a customized and simpler version for bpf iterator.
  * no_llseek is assumed for this file.
  * The following are differences from seq_read():
@@ -79,7 +82,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
 {
 	struct seq_file *seq = file->private_data;
 	size_t n, offs, copied = 0;
-	int err = 0;
+	int err = 0, num_objs = 0;
 	void *p;
 
 	mutex_lock(&seq->lock);
@@ -135,6 +138,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
 	while (1) {
 		loff_t pos = seq->index;
 
+		num_objs++;
 		offs = seq->count;
 		p = seq->op->next(seq, p, &seq->index);
 		if (pos == seq->index) {
@@ -153,6 +157,15 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
 		if (seq->count >= size)
 			break;
 
+		if (num_objs >= MAX_ITER_OBJECTS) {
+			if (offs == 0) {
+				err = -EAGAIN;
+				seq->op->stop(seq, p);
+				goto done;
+			}
+			break;
+		}
+
 		err = seq->op->show(seq, p);
 		if (err > 0) {
 			bpf_iter_dec_seq_num(seq);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators
@ 2020-08-18 22:23 Yonghong Song
  2020-08-18 22:23 ` [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator Yonghong Song
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Yonghong Song @ 2020-08-18 22:23 UTC (permalink / raw)
  To: bpf, netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Patch #1 fixed a rcu stall warning when traversing large number
of tasks/files without overflowing seq_file buffer. The method
is to control the number of visited objects in bpf_seq_read()
so all bpf iterators will benefit.

Patch #2 calculated tid properly in a namespace in order to avoid
visiting the name task multiple times.

Patch #3 handled read() error code EAGAIN properly in bpftool
bpf_iter userspace code to collect pids. The change is needed
due to Patch #1.

Yonghong Song (3):
  bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
  bpf: avoid visit same object multiple times
  bpftool: handle EAGAIN error code properly in pids collection

 kernel/bpf/bpf_iter.c    | 15 ++++++++++++++-
 kernel/bpf/task_iter.c   |  3 ++-
 tools/bpf/bpftool/pids.c |  2 ++
 3 files changed, 18 insertions(+), 2 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf v2 2/3] bpf: avoid visit same object multiple times
  2020-08-18 22:23 [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
  2020-08-18 22:23 ` [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator Yonghong Song
@ 2020-08-18 22:23 ` Yonghong Song
  2020-08-18 22:23 ` [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection Yonghong Song
  2020-08-18 22:28 ` [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
  3 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2020-08-18 22:23 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Rik van Riel

Currently when traversing all tasks, the next tid
is always increased by one. This may result in
visiting the same task multiple times in a
pid namespace.

This patch fixed the issue by seting the next
tid as pid_nr_ns(pid, ns) + 1, similar to
funciton next_tgid().

Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/task_iter.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index f21b5e1e4540..99af4cea1102 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -29,8 +29,9 @@ static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
 
 	rcu_read_lock();
 retry:
-	pid = idr_get_next(&ns->idr, tid);
+	pid = find_ge_pid(*tid, ns);
 	if (pid) {
+		*tid = pid_nr_ns(pid, ns);
 		task = get_pid_task(pid, PIDTYPE_PID);
 		if (!task) {
 			++*tid;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection
  2020-08-18 22:23 [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
  2020-08-18 22:23 ` [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator Yonghong Song
  2020-08-18 22:23 ` [PATCH bpf v2 2/3] bpf: avoid visit same object multiple times Yonghong Song
@ 2020-08-18 22:23 ` Yonghong Song
  2020-08-18 22:30   ` Andrii Nakryiko
  2020-08-18 22:28 ` [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
  3 siblings, 1 reply; 9+ messages in thread
From: Yonghong Song @ 2020-08-18 22:23 UTC (permalink / raw)
  To: bpf, netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

When the error code is EAGAIN, the kernel signals the user
space should retry the read() operation for bpf iterators.
Let us do it.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/bpf/bpftool/pids.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/bpf/bpftool/pids.c b/tools/bpf/bpftool/pids.c
index e3b116325403..df7d8ec76036 100644
--- a/tools/bpf/bpftool/pids.c
+++ b/tools/bpf/bpftool/pids.c
@@ -134,6 +134,8 @@ int build_obj_refs_table(struct obj_refs_table *table, enum bpf_obj_type type)
 	while (true) {
 		ret = read(fd, buf, sizeof(buf));
 		if (ret < 0) {
+			if (errno == EAGAIN)
+				continue;
 			err = -errno;
 			p_err("failed to read PID iterator output: %d", err);
 			goto out;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators
  2020-08-18 22:23 [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
                   ` (2 preceding siblings ...)
  2020-08-18 22:23 ` [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection Yonghong Song
@ 2020-08-18 22:28 ` Yonghong Song
  3 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2020-08-18 22:28 UTC (permalink / raw)
  To: bpf, netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team



On 8/18/20 3:23 PM, Yonghong Song wrote:
> Patch #1 fixed a rcu stall warning when traversing large number
> of tasks/files without overflowing seq_file buffer. The method
> is to control the number of visited objects in bpf_seq_read()
> so all bpf iterators will benefit.
> 
> Patch #2 calculated tid properly in a namespace in order to avoid
> visiting the name task multiple times.
> 
> Patch #3 handled read() error code EAGAIN properly in bpftool
> bpf_iter userspace code to collect pids. The change is needed
> due to Patch #1.

Sorry. Missed the changelog below.

Changelogs:
   - v1 -> v2:
     . do ratelimiting in bpf_seq_read() with a counter.
       cond_resched() doesn't work as some iterators (e.g., tcp,
       netlink, etc.) has rcu read critical section across
       consecutive seq_ops->next() functions.

> 
> Yonghong Song (3):
>    bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
>    bpf: avoid visit same object multiple times
>    bpftool: handle EAGAIN error code properly in pids collection
> 
>   kernel/bpf/bpf_iter.c    | 15 ++++++++++++++-
>   kernel/bpf/task_iter.c   |  3 ++-
>   tools/bpf/bpftool/pids.c |  2 ++
>   3 files changed, 18 insertions(+), 2 deletions(-)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection
  2020-08-18 22:23 ` [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection Yonghong Song
@ 2020-08-18 22:30   ` Andrii Nakryiko
  0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2020-08-18 22:30 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Tue, Aug 18, 2020 at 3:24 PM Yonghong Song <yhs@fb.com> wrote:
>
> When the error code is EAGAIN, the kernel signals the user
> space should retry the read() operation for bpf iterators.
> Let us do it.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---

LGTM.

Acked-by: Andrii Nakryiko <andriin@fb.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
  2020-08-18 22:23 ` [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator Yonghong Song
@ 2020-08-19  0:05   ` Alexei Starovoitov
  2020-08-19  0:30     ` Yonghong Song
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2020-08-19  0:05 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Paul E . McKenney

On Tue, Aug 18, 2020 at 03:23:09PM -0700, Yonghong Song wrote:
> 
> We did not use cond_resched() since for some iterators, e.g.,
> netlink iterator, where rcu read_lock critical section spans between
> consecutive seq_ops->next(), which makes impossible to do cond_resched()
> in the key while loop of function bpf_seq_read().

but after this patch we can, right?

>  
> +/* maximum visited objects before bailing out */
> +#define MAX_ITER_OBJECTS	1000000
> +
>  /* bpf_seq_read, a customized and simpler version for bpf iterator.
>   * no_llseek is assumed for this file.
>   * The following are differences from seq_read():
> @@ -79,7 +82,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>  {
>  	struct seq_file *seq = file->private_data;
>  	size_t n, offs, copied = 0;
> -	int err = 0;
> +	int err = 0, num_objs = 0;
>  	void *p;
>  
>  	mutex_lock(&seq->lock);
> @@ -135,6 +138,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>  	while (1) {
>  		loff_t pos = seq->index;
>  
> +		num_objs++;
>  		offs = seq->count;
>  		p = seq->op->next(seq, p, &seq->index);
>  		if (pos == seq->index) {
> @@ -153,6 +157,15 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>  		if (seq->count >= size)
>  			break;
>  
> +		if (num_objs >= MAX_ITER_OBJECTS) {
> +			if (offs == 0) {
> +				err = -EAGAIN;
> +				seq->op->stop(seq, p);
> +				goto done;
> +			}
> +			break;
> +		}
> +

should this block be after op->show() and error processing?
Otherwise bpf_iter_inc_seq_num() will be incorrectly incremented?

>  		err = seq->op->show(seq, p);
>  		if (err > 0) {
>  			bpf_iter_dec_seq_num(seq);

After op->stop() we can do cond_resched() in all cases,
since rhashtable walk does rcu_unlock in stop() callback, right?
I think copy_to_user() and mutex_unlock() don't do cond_resched()
equivalent work.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
  2020-08-19  0:05   ` Alexei Starovoitov
@ 2020-08-19  0:30     ` Yonghong Song
  2020-08-19  0:49       ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Yonghong Song @ 2020-08-19  0:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Paul E . McKenney



On 8/18/20 5:05 PM, Alexei Starovoitov wrote:
> On Tue, Aug 18, 2020 at 03:23:09PM -0700, Yonghong Song wrote:
>>
>> We did not use cond_resched() since for some iterators, e.g.,
>> netlink iterator, where rcu read_lock critical section spans between
>> consecutive seq_ops->next(), which makes impossible to do cond_resched()
>> in the key while loop of function bpf_seq_read().
> 
> but after this patch we can, right?

We can do cond_resched() after seq->op->stop(). See more below.

> 
>>   
>> +/* maximum visited objects before bailing out */
>> +#define MAX_ITER_OBJECTS	1000000
>> +
>>   /* bpf_seq_read, a customized and simpler version for bpf iterator.
>>    * no_llseek is assumed for this file.
>>    * The following are differences from seq_read():
>> @@ -79,7 +82,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>>   {
>>   	struct seq_file *seq = file->private_data;
>>   	size_t n, offs, copied = 0;
>> -	int err = 0;
>> +	int err = 0, num_objs = 0;
>>   	void *p;
>>   
>>   	mutex_lock(&seq->lock);
>> @@ -135,6 +138,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>>   	while (1) {
>>   		loff_t pos = seq->index;
>>   
>> +		num_objs++;
>>   		offs = seq->count;
>>   		p = seq->op->next(seq, p, &seq->index);
>>   		if (pos == seq->index) {
>> @@ -153,6 +157,15 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>>   		if (seq->count >= size)
>>   			break;
>>   
>> +		if (num_objs >= MAX_ITER_OBJECTS) {
>> +			if (offs == 0) {
>> +				err = -EAGAIN;
>> +				seq->op->stop(seq, p);
>> +				goto done;
>> +			}
>> +			break;
>> +		}
>> +
> 
> should this block be after op->show() and error processing?
> Otherwise bpf_iter_inc_seq_num() will be incorrectly incremented?

The purpose of op->next() is to calculate the "next" object position,
stored in the seq private data. So for next read() syscall, start()
will try to fetch the data based on the info in seq private data.

This is true for conditions "if (seq->count >= size) break"
in the above so next op->start() can try to locate the correct
object. The same is for this -EAGAIN thing.

> 
>>   		err = seq->op->show(seq, p);
>>   		if (err > 0) {
>>   			bpf_iter_dec_seq_num(seq);
> 
> After op->stop() we can do cond_resched() in all cases,
> since rhashtable walk does rcu_unlock in stop() callback, right?

Yes, we can. I am thinking since we return to user space,
cond_resched() might not be needed since returning to user space
will trigger some kind of scheduling. This patch fixed
the rcu stall issue. But if my understanding is incorrect,
I am happy to add cond_reched().

> I think copy_to_user() and mutex_unlock() don't do cond_resched()
> equivalent work.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
  2020-08-19  0:30     ` Yonghong Song
@ 2020-08-19  0:49       ` Alexei Starovoitov
  0 siblings, 0 replies; 9+ messages in thread
From: Alexei Starovoitov @ 2020-08-19  0:49 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Paul E . McKenney

On Tue, Aug 18, 2020 at 05:30:37PM -0700, Yonghong Song wrote:
> 
> 
> On 8/18/20 5:05 PM, Alexei Starovoitov wrote:
> > On Tue, Aug 18, 2020 at 03:23:09PM -0700, Yonghong Song wrote:
> > > 
> > > We did not use cond_resched() since for some iterators, e.g.,
> > > netlink iterator, where rcu read_lock critical section spans between
> > > consecutive seq_ops->next(), which makes impossible to do cond_resched()
> > > in the key while loop of function bpf_seq_read().
> > 
> > but after this patch we can, right?
> 
> We can do cond_resched() after seq->op->stop(). See more below.
> 
> > 
> > > +/* maximum visited objects before bailing out */
> > > +#define MAX_ITER_OBJECTS	1000000
> > > +
> > >   /* bpf_seq_read, a customized and simpler version for bpf iterator.
> > >    * no_llseek is assumed for this file.
> > >    * The following are differences from seq_read():
> > > @@ -79,7 +82,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
> > >   {
> > >   	struct seq_file *seq = file->private_data;
> > >   	size_t n, offs, copied = 0;
> > > -	int err = 0;
> > > +	int err = 0, num_objs = 0;
> > >   	void *p;
> > >   	mutex_lock(&seq->lock);
> > > @@ -135,6 +138,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
> > >   	while (1) {
> > >   		loff_t pos = seq->index;
> > > +		num_objs++;
> > >   		offs = seq->count;
> > >   		p = seq->op->next(seq, p, &seq->index);
> > >   		if (pos == seq->index) {
> > > @@ -153,6 +157,15 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
> > >   		if (seq->count >= size)
> > >   			break;
> > > +		if (num_objs >= MAX_ITER_OBJECTS) {
> > > +			if (offs == 0) {
> > > +				err = -EAGAIN;
> > > +				seq->op->stop(seq, p);
> > > +				goto done;
> > > +			}
> > > +			break;
> > > +		}
> > > +
> > 
> > should this block be after op->show() and error processing?
> > Otherwise bpf_iter_inc_seq_num() will be incorrectly incremented?
> 
> The purpose of op->next() is to calculate the "next" object position,
> stored in the seq private data. So for next read() syscall, start()
> will try to fetch the data based on the info in seq private data.
> 
> This is true for conditions "if (seq->count >= size) break"
> in the above so next op->start() can try to locate the correct
> object. The same is for this -EAGAIN thing.
> 
> > 
> > >   		err = seq->op->show(seq, p);
> > >   		if (err > 0) {
> > >   			bpf_iter_dec_seq_num(seq);
> > 
> > After op->stop() we can do cond_resched() in all cases,
> > since rhashtable walk does rcu_unlock in stop() callback, right?
> 
> Yes, we can. I am thinking since we return to user space,
> cond_resched() might not be needed since returning to user space
> will trigger some kind of scheduling. This patch fixed
> the rcu stall issue. But if my understanding is incorrect,
> I am happy to add cond_reched().

ahh. you're correct on both counts. Applied all three patches to bpf tree. thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-19  0:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-18 22:23 [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
2020-08-18 22:23 ` [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator Yonghong Song
2020-08-19  0:05   ` Alexei Starovoitov
2020-08-19  0:30     ` Yonghong Song
2020-08-19  0:49       ` Alexei Starovoitov
2020-08-18 22:23 ` [PATCH bpf v2 2/3] bpf: avoid visit same object multiple times Yonghong Song
2020-08-18 22:23 ` [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection Yonghong Song
2020-08-18 22:30   ` Andrii Nakryiko
2020-08-18 22:28 ` [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).