Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Hillf Danton <hdanton@sina.com>,
	syzbot <syzbot+a9b681dcbc06eb2bca04@syzkaller.appspotmail.com>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	syzkaller-bugs@googlegroups.com, eric.dumazet@gmail.com
Subject: Re: [syzbot] INFO: task hung in __lru_add_drain_all
Date: Mon, 06 Sep 2021 01:36:56 +0200	[thread overview]
Message-ID: <87k0jua92f.ffs@tglx> (raw)
In-Reply-To: <20210903111011.2811-1-hdanton@sina.com>

Hillf,

On Fri, Sep 03 2021 at 19:10, Hillf Danton wrote:
>
> See if ksoftirqd is preventing bound workqueue work from running.

What?

> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -521,6 +521,7 @@ asmlinkage __visible void __softirq_entr
>  	bool in_hardirq;
>  	__u32 pending;
>  	int softirq_bit;
> +	bool is_ksoftirqd = __this_cpu_read(ksoftirqd) == current;
>  
>  	/*
>  	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
> @@ -565,6 +566,8 @@ restart:
>  		}
>  		h++;
>  		pending >>= softirq_bit;
> +		if (is_ksoftirqd && in_task())

Can you please explain how this would ever be true?

 #define in_task()	(!(in_nmi() | in_hardirq() | in_serving_softirq()))

in_task() is guaranteed to be false here, because in_serving_softirq()
is guaranteed to be true simply because this is the softirq processing
context.

> +			cond_resched();

___do_softirq() returns after 2 msec of softirq processing whether it is
invoked on return from interrupt or in ksoftirqd context. On return from
interrupt this wakes ksoftirqd and returns. In ksoftirqd this is a
rescheduling point.

But that only works when the action handlers, e.g. net_rx_action(),
behave well and respect that limit as well.

net_rx_action() has it's own time limit: netdev_budget_usecs

That defaults to: 2 * USEC_PER_SEC / HZ 

The config has HZ=100, so this loop should terminate after

    2 * 1e6 / 100 = 20000us = 20ms

The provided C-reproducer does not change that default.

But again this loop can only terminate if napi_poll() and the
subsequently invoked callchain behaves well.

So instead of sending obviously useless "debug" patches, why are you not
grabbing the kernel config and the reproducer and figure out what the
root cause is?

Enable tracing, add some trace_printks and let ftrace_dump_on_oops spill
it out when the problem triggers. That will pinpoint the issue.

Thanks,

        tglx



  parent reply	other threads:[~2021-09-05 23:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210903111011.2811-1-hdanton@sina.com>
2021-09-03 12:04 ` syzbot
2021-09-05 23:36 ` Thomas Gleixner [this message]
     [not found] <20210904005650.2914-1-hdanton@sina.com>
2021-09-04  2:26 ` syzbot
     [not found] ` <20210904080739.3026-1-hdanton@sina.com>
2021-09-04  9:18   ` syzbot
     [not found]   ` <20210904104951.3084-1-hdanton@sina.com>
2021-09-04 17:04     ` Paul E. McKenney
2021-09-05 23:55   ` Thomas Gleixner
2021-09-03  8:32 syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k0jua92f.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=eric.dumazet@gmail.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=syzbot+a9b681dcbc06eb2bca04@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --subject='Re: [syzbot] INFO: task hung in __lru_add_drain_all' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).